Skip to content

Commit

Permalink
global: partially migrates from REANA RTFD
Browse files Browse the repository at this point in the history
  • Loading branch information
Diego Rodriguez committed Jul 28, 2020
1 parent b17749c commit 7ca0842
Show file tree
Hide file tree
Showing 6 changed files with 224 additions and 0 deletions.
86 changes: 86 additions & 0 deletions docs/advanced-usage/containers/docker/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Docker

## Using an existing environment

Sometimes you can use an already-existing container environment prepared by others. For example [`python:3.8`](https://hub.docker.com/_/python) for Python programs or [`gitlab-registry.cern.ch/cms-cloud/cmssw-docker/cc7-cms`](https://gitlab.cern.ch/cms-cloud/cmssw-docker/container_registry) for CMS Offline Software framework. In this case you simply specify the container name and the version number in your workflow specification and you are good to go. This is usually the case when your code does not have to be compiled, for example Python scripts or ROOT macros.

Note also REANA offers a set of containers that can server as examples about how to containerise popular analysis environments such as:

- ROOT (see [reana-env-root6](https://github.com/reanahub/reana-env-root6))
- Jupyter (see [reana-env-jupyter](https://github.com/reanahub/reana-env-jupyter))
- AliPhysics (see [reana-env-aliphysics](https://github.com/reanahub/reana-env-aliphysics))

## Building your own environment

Other times you may need to build your own container, for example to add a certain library on top of Python 2.7. This is the most typical use case that we’ll address below.

This is usually the case when your code needs to be compiled, for example C++ analysis.

If you need to create your own environment, this can be achieved by means of providing a particular `Dockerfile`:

```Dockerfile
# Start from the Python 2.7 base image:
FROM python:2.7

# Install HFtools:
RUN apt-get -y update && \
apt-get -y install \
python-pip \
zip && \
apt-get autoremove -y && \
apt-get clean -y
RUN pip install hftools

# Mount our code:
ADD code /code
WORKDIR /code
```

You can build this customised analysis environment image and give it some name, for example `johndoe/myenv`:

```console
$ docker build -f environment/myenv/Dockerfile -t johndoe/myenv .
```

and push the created image to the DockerHub image registry:

```console
$ docker push johndoe/myenv
```

## Supporting arbitrary user IDs

In the Docker container ecosystem, the processes run in the containers by default use the root user identity. However, this may not be secure. If you want to improve the security in your environment you can set up your own user under which identity the processes will run.

In order for processes to run under any user identity and still be able to write to shared workspaces, we use a GID=0 technique [as used by OpenShift](https://docs.openshift.com/container-platform/3.11/creating_images/guidelines.html#openshift-specific-guidelines):

- UID: you can use any user ID you want;
- GID: your should add your user to group with GID=0 (the root group)

This will ensure the writable access to workspace directories managed by the REANA platform.

For example, you can create the user `johndoe` with `UID=501` and add the user to `GID=0` by adding the following commands at the end of the previous `Dockerfile`:

```Dockerfile
# Setup user and permissions
RUN adduser johndoe -u 501 --disabled-password --gecos ""
RUN usermod -a -G 0 johndoe
USER johndoe
```

## Testing the environment

We now have a containerised image representing our computational environment that we can use to run our analysis in another replicated environment.

We should test the containerised environment to ensure it works properly, for example whether all the necessary libraries are present:

```console
$ docker run -i -t --rm johndoe/myenv /bin/bash
container> python -V
Python 2.7.15
container> python mycode.py < mydata.csv > /tmp/mydata.tmp
```

## Multiple environments

Note that various steps of the analysis can run in various environments; the data filtering step on a big cloud having data selection libraries installed, the data plotting step in a local environment containing only the preferred graphing system of choice. You can prepare several different environments for your analysis if needed.
5 changes: 5 additions & 0 deletions docs/advanced-usage/containers/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Container technologies

The environment is encapsulated by means of “containers” such as:

- [Docker](docker)
36 changes: 36 additions & 0 deletions docs/development/debugging/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,37 @@
# Debugging

Deploy REANA in debug mode:

**1.** Create the cluster with the REANA source code mounted:

```console
$ reana-dev cluster-create --mode=debug
```

**2.** Build the component you want to debug with debugging dependencies:

```console
$ reana-dev docker-build -b DEBUG=1 -c reana-server
```

**3.** Deploy:

```console
$ reana-dev cluster-deploy --mode=debug
```

**4.** Add a breakpoint (using [wdb](https://github.com/Kozea/wdb)) in the [list workflows endpoint](https://github.com/reanahub/reana-server/blob/94421a4cf4effb8370ec7eaabfa03a72d2edb53f/reana_server/rest/workflows.py#L197) and call `reana-client list`:

```diff
"""
+ import wdb; wdb.set_trace()
try:
```

**6.** Open the debugging session:

```console
$ firefox http://localhost:31984
```

![Debugging](../../images/debugging.png)
Binary file added docs/images/debugging.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions docs/running-workflows/supported-systems/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Supported systems

| Engine | Parametrised? | Parallel execution? | Partial execution? |
|-------------------|---------------|---------------------|--------------------|
| [CWL](cwl) | yes | yes | no |
| [Serial](serial) | yes | no | yes |
| [Yadage](yadage) | yes | yes | no |

(1) The vanilla workflow system may support the feature, but not when run
via REANA environment.
87 changes: 87 additions & 0 deletions docs/running-workflows/what-is-workflow/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,88 @@
# What is a workflow?

Workflows describe which computational steps were taken to run an analysis.

## Simple workflows

Let us assume that our analysis is run in two stages, firstly a data filtering stage and secondly a data plotting stage. A hypothetical example:

```console
$ python ./code/mycode.py \
< ./data/mydata.csv > ./workspace/mydata.tmp
$ python ./code/mycode.py --plot myparameter=myvalue \
< ./workspace/mydata.tmp > ./results/myplot.png
```

Note how we call a given sequence of commands to produce our desired output plots. In order to capture this sequence of commands in a “runnable” or “actionable” manner, we can write a short shell script run.sh and make it parametrisable:

```console
$ ./run.sh --myparameter myvalue
```

In this case you will want to use the [Serial](running-workflows/supported-systems/serial) workflow engine of REANA. The engine permits to express the workflow as a sequence of commands:

```console
START
|
|
V
+--------+
| filter | <-- mydata.csv
+--------+
|
| mydata.tmp
|
V
+--------+
| plot | <-- myparameter=myvalue
+--------+
|
| plot.png
V
STOP
```

Note that you can run different commands in different computing environments, but they must be run in a linear sequential manner.

The sequential workflow pattern will usually cover only simple computational workflow needs.

## Complex workflows

For advanced workflow needs we may want to run certain commands in parallel in a sort of map-reduce fashion. There are [many workflow systems](https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems) that are dedicated to expressing complex computational schemata in a structured manner. REANA supports several, such as [CWL](running-workflows/supported-systems/cwl) and [Yadage](running-workflows/supported-systems/yadage).

The workflow systems enable to express the computational steps in the form of [Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Directed_acyclic_graph) permitting advanced computational scenarios.

```console
START
|
|
+------+----------+
/ | \
/ V \
+--------+ +--------+ +--------+
| filter | | filter | | filter | <-- mydata
+--------+ +--------+ +--------+
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
+-------+
| merge |
+-------+
|
| mydata.tmp
|
V
+--------+
| plot | <-- myparameter=myvalue
+--------+
|
| plot.png
V
STOP
```

You can take inspiration from the existing [examples](https://github.com/reanahub/?q=reana-demo).

0 comments on commit 7ca0842

Please sign in to comment.