global: partially migrates from REANA RTFD

* Closes reanahub/reana#355.
diegodelemos · Jul 28, 2020 · 7ca0842 · 7ca0842
1 parent b17749c
commit 7ca0842
Show file tree

Hide file tree

Showing 6 changed files with 224 additions and 0 deletions.
diff --git a/docs/advanced-usage/containers/docker/index.md b/docs/advanced-usage/containers/docker/index.md
@@ -0,0 +1,86 @@
+# Docker
+
+## Using an existing environment
+
+Sometimes you can use an already-existing container environment prepared by others. For example [`python:3.8`](https://hub.docker.com/_/python) for Python programs or [`gitlab-registry.cern.ch/cms-cloud/cmssw-docker/cc7-cms`](https://gitlab.cern.ch/cms-cloud/cmssw-docker/container_registry) for CMS Offline Software framework. In this case you simply specify the container name and the version number in your workflow specification and you are good to go. This is usually the case when your code does not have to be compiled, for example Python scripts or ROOT macros.
+
+Note also REANA offers a set of containers that can server as examples about how to containerise popular analysis environments such as:
+
+- ROOT (see [reana-env-root6](https://github.com/reanahub/reana-env-root6))
+- Jupyter (see [reana-env-jupyter](https://github.com/reanahub/reana-env-jupyter))
+- AliPhysics (see [reana-env-aliphysics](https://github.com/reanahub/reana-env-aliphysics))
+
+## Building your own environment
+
+Other times you may need to build your own container, for example to add a certain library on top of Python 2.7. This is the most typical use case that we’ll address below.
+
+This is usually the case when your code needs to be compiled, for example C++ analysis.
+
+If you need to create your own environment, this can be achieved by means of providing a particular `Dockerfile`:
+
+```Dockerfile
+# Start from the Python 2.7 base image:
+FROM python:2.7
+
+# Install HFtools:
+RUN apt-get -y update && \
+    apt-get -y install \
+       python-pip \
+       zip && \
+    apt-get autoremove -y && \
+    apt-get clean -y
+RUN pip install hftools
+
+# Mount our code:
+ADD code /code
+WORKDIR /code
+```
+
+You can build this customised analysis environment image and give it some name, for example `johndoe/myenv`:
+
+```console
+$ docker build -f environment/myenv/Dockerfile -t johndoe/myenv .
+```
+
+and push the created image to the DockerHub image registry:
+
+```console
+$ docker push johndoe/myenv
+```
+
+## Supporting arbitrary user IDs
+
+In the Docker container ecosystem, the processes run in the containers by default use the root user identity. However, this may not be secure. If you want to improve the security in your environment you can set up your own user under which identity the processes will run.
+
+In order for processes to run under any user identity and still be able to write to shared workspaces, we use a GID=0 technique [as used by OpenShift](https://docs.openshift.com/container-platform/3.11/creating_images/guidelines.html#openshift-specific-guidelines):
+
+- UID: you can use any user ID you want;
+- GID: your should add your user to group with GID=0 (the root group)
+
+This will ensure the writable access to workspace directories managed by the REANA platform.
+
+For example, you can create the user `johndoe` with `UID=501` and add the user to `GID=0` by adding the following commands at the end of the previous `Dockerfile`:
+
+```Dockerfile
+# Setup user and permissions
+RUN adduser johndoe -u 501 --disabled-password --gecos ""
+RUN usermod -a -G 0 johndoe
+USER johndoe
+```
+
+## Testing the environment
+
+We now have a containerised image representing our computational environment that we can use to run our analysis in another replicated environment.
+
+We should test the containerised environment to ensure it works properly, for example whether all the necessary libraries are present:
+
+```console
+$ docker run -i -t --rm johndoe/myenv /bin/bash
+container> python -V
+Python 2.7.15
+container> python mycode.py < mydata.csv > /tmp/mydata.tmp
+```
+
+## Multiple environments
+
+Note that various steps of the analysis can run in various environments; the data filtering step on a big cloud having data selection libraries installed, the data plotting step in a local environment containing only the preferred graphing system of choice. You can prepare several different environments for your analysis if needed.
diff --git a/docs/advanced-usage/containers/index.md b/docs/advanced-usage/containers/index.md
@@ -0,0 +1,5 @@
+# Container technologies
+
+The environment is encapsulated by means of “containers” such as:
+
+- [Docker](docker)
diff --git a/docs/development/debugging/index.md b/docs/development/debugging/index.md
@@ -1 +1,37 @@
 # Debugging
+
+Deploy REANA in debug mode:
+
+**1.** Create the cluster with the REANA source code mounted:
+
+```console
+$ reana-dev cluster-create --mode=debug
+```
+
+**2.** Build the component you want to debug with debugging dependencies:
+
+```console
+$ reana-dev docker-build -b DEBUG=1 -c reana-server
+```
+
+**3.** Deploy:
+
+```console
+$ reana-dev cluster-deploy --mode=debug
+```
+
+**4.** Add a breakpoint (using [wdb](https://github.com/Kozea/wdb)) in the [list workflows endpoint](https://github.com/reanahub/reana-server/blob/94421a4cf4effb8370ec7eaabfa03a72d2edb53f/reana_server/rest/workflows.py#L197) and call `reana-client list`:
+
+```diff
+    """
++   import wdb; wdb.set_trace()
+    try:
+```
+
+**6.** Open the debugging session:
+
+```console
+$ firefox http://localhost:31984
+```
+
+![Debugging](../../images/debugging.png)
diff --git a/docs/images/debugging.png b/docs/images/debugging.png
diff --git a/docs/running-workflows/supported-systems/index.md b/docs/running-workflows/supported-systems/index.md
@@ -0,0 +1,10 @@
+# Supported systems
+
+| Engine            | Parametrised? | Parallel execution? | Partial execution? |
+|-------------------|---------------|---------------------|--------------------|
+| [CWL](cwl)        | yes           | yes                 | no                 |
+| [Serial](serial)  | yes           | no                  | yes                |
+| [Yadage](yadage)  | yes           | yes                 | no                 |
+
+(1) The vanilla workflow system may support the feature, but not when run
+    via REANA environment.
diff --git a/docs/running-workflows/what-is-workflow/index.md b/docs/running-workflows/what-is-workflow/index.md
@@ -1 +1,88 @@
 # What is a workflow?
+
+Workflows describe which computational steps were taken to run an analysis.
+
+## Simple workflows
+
+Let us assume that our analysis is run in two stages, firstly a data filtering stage and secondly a data plotting stage. A hypothetical example:
+
+```console
+$ python ./code/mycode.py \
+    < ./data/mydata.csv > ./workspace/mydata.tmp
+$ python ./code/mycode.py --plot myparameter=myvalue \
+    < ./workspace/mydata.tmp > ./results/myplot.png
+```
+
+Note how we call a given sequence of commands to produce our desired output plots. In order to capture this sequence of commands in a “runnable” or “actionable” manner, we can write a short shell script run.sh and make it parametrisable:
+
+```console
+$ ./run.sh --myparameter myvalue
+```
+
+In this case you will want to use the [Serial](running-workflows/supported-systems/serial) workflow engine of REANA. The engine permits to express the workflow as a sequence of commands:
+
+```console
+    START
+     |
+     |
+     V
++--------+
+| filter |  <-- mydata.csv
++--------+
+     |
+     | mydata.tmp
+     |
+     V
++--------+
+|  plot  |  <-- myparameter=myvalue
++--------+
+     |
+     | plot.png
+     V
+    STOP
+```
+
+Note that you can run different commands in different computing environments, but they must be run in a linear sequential manner.
+
+The sequential workflow pattern will usually cover only simple computational workflow needs.
+
+## Complex workflows
+
+For advanced workflow needs we may want to run certain commands in parallel in a sort of map-reduce fashion. There are [many workflow systems](https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems) that are dedicated to expressing complex computational schemata in a structured manner. REANA supports several, such as [CWL](running-workflows/supported-systems/cwl) and [Yadage](running-workflows/supported-systems/yadage).
+
+The workflow systems enable to express the computational steps in the form of [Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Directed_acyclic_graph) permitting advanced computational scenarios.
+
+```console
+              START
+               |
+               |
+        +------+----------+
+       /       |           \
+      /        V            \
++--------+  +--------+  +--------+
+| filter |  | filter |  | filter |   <-- mydata
++--------+  +--------+  +--------+
+        \       |       /
+         \      |      /
+          \     |     /
+           \    |    /
+            \   |   /
+             \  |  /
+              \ | /
+            +-------+
+            | merge |
+            +-------+
+                |
+                | mydata.tmp
+                |
+                V
+            +--------+
+            |  plot  |  <-- myparameter=myvalue
+            +--------+
+                |
+                | plot.png
+                V
+               STOP
+```
+
+You can take inspiration from the existing [examples](https://github.com/reanahub/?q=reana-demo).