Skip to content

Commit

Permalink
global: partially migrates from REANA RTFD
Browse files Browse the repository at this point in the history
  • Loading branch information
Diego Rodriguez authored and tiborsimko committed Aug 7, 2020
1 parent 63672f8 commit 2fb3366
Show file tree
Hide file tree
Showing 7 changed files with 223 additions and 1 deletion.
86 changes: 86 additions & 0 deletions docs/advanced-usage/containers/docker/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Docker

## Using an existing environment

In order to run your analysis, you can use a pre-existing container environment created by a third party. For example [`python:3.8`](https://hub.docker.com/_/python) for Python programs or [`gitlab-registry.cern.ch/cms-cloud/cmssw-docker/cc7-cms`](https://gitlab.cern.ch/cms-cloud/cmssw-docker/container_registry) for CMS Offline Software framework. In this case you simply specify the container name and the version number in your workflow specification and you are good to go. This is usually the case when your code does not have to be compiled, for example Python scripts or ROOT macros.

Note also that REANA offers a set of containers that can serve as examples about how to containerise popular analysis environments such as:

- ROOT (see [reana-env-root6](https://github.com/reanahub/reana-env-root6))
- Jupyter (see [reana-env-jupyter](https://github.com/reanahub/reana-env-jupyter))
- AliPhysics (see [reana-env-aliphysics](https://github.com/reanahub/reana-env-aliphysics))

## Building your own environment

Other times you may need to build your own container, for example to add a certain library on top of Python 2.7. This is the most typical use case that we’ll address below.

This is usually the case when your code needs to be compiled, for example C++ analysis.

If you need to create your own environment, this can be achieved by means of providing a particular `Dockerfile`:

```Dockerfile
# Start from the Python 2.7 base image:
FROM python:2.7

# Install HFtools:
RUN apt-get -y update && \
apt-get -y install \
python-pip \
zip && \
apt-get autoremove -y && \
apt-get clean -y
RUN pip install hftools

# Mount our code:
ADD code /code
WORKDIR /code
```

You can build this customised analysis environment image and give it some name, for example `johndoe/myenv`:

```console
$ docker build -f environment/myenv/Dockerfile -t johndoe/myenv .
```

and push the created image to the DockerHub image registry:

```console
$ docker push johndoe/myenv
```

## Supporting arbitrary user IDs

In the Docker container ecosystem, the processes run in the containers by default, uses the root user identity. However, this may not be secure. If you want to improve the security in your environment you can set up your own user under which identity the processes will run.

In order for processes to run under any user identity and still be able to write to shared workspaces, we use a GID=0 technique [as used by OpenShift](https://docs.openshift.com/container-platform/3.11/creating_images/guidelines.html#openshift-specific-guidelines):

- UID: you can use any user ID you want;
- GID: your should add your user to group with GID=0 (the root group)

This will ensure the writable access to workspace directories managed by the REANA platform.

For example, you can create the user `johndoe` with `UID=501` and add the user to `GID=0` by adding the following commands at the end of the previous `Dockerfile`:

```Dockerfile
# Setup user and permissions
RUN adduser johndoe -u 501 --disabled-password --gecos ""
RUN usermod -a -G 0 johndoe
USER johndoe
```

## Testing the environment

We now have a containerised image representing our computational environment that we can use to run our analysis in another replicated environment.

We should test the containerised environment to ensure it works properly, for example whether all the necessary libraries are present:

```console
$ docker run -i -t --rm johndoe/myenv /bin/bash
container> python -V
Python 2.7.15
container> python mycode.py < mydata.csv > /tmp/mydata.tmp
```

## Multiple environments

Note that various steps of your analysis can run in various environments; for instance, the step to perform the data filtering on a big cloud, having data selection libraries installed, or the step to build the data plotting in a local environment, containing only the preferred graphing system of choice. You can prepare several different environments for your analysis if needed.
3 changes: 3 additions & 0 deletions docs/advanced-usage/containers/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Containers

- [Docker](docker)
36 changes: 36 additions & 0 deletions docs/development/debugging/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,37 @@
# Debugging

Deploy REANA in debug mode:

**1.** Create the cluster with the REANA source code mounted:

```console
$ reana-dev cluster-create --mode=debug
```

**2.** Build the component you want to debug with debugging dependencies:

```console
$ reana-dev docker-build -b DEBUG=1 -c reana-server
```

**3.** Deploy:

```console
$ reana-dev cluster-deploy --mode=debug
```

**4.** Add a breakpoint (using [wdb](https://github.com/Kozea/wdb)) in the [list workflows endpoint](https://github.com/reanahub/reana-server/blob/94421a4cf4effb8370ec7eaabfa03a72d2edb53f/reana_server/rest/workflows.py#L197) and call `reana-client list`:

```diff
"""
+ import wdb; wdb.set_trace()
try:
```

**5.** Open the debugging session:

```console
$ firefox http://localhost:31984
```

![Debugging](../../images/debugging.png)
Binary file added docs/images/debugging.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,6 @@
**Get in touch**

- Discuss [on Forum](https://forum.reana.io/)
- Chat [on Mattermost](https://mattermost.web.cern.ch/it-dep/channels/reana) or [on Gitter](https://gitter.im/reanahub/reana)
- Chat [on Mattermost](https://mattermost.web.cern.ch/it-dep/channels/reana) or [on Gitter](https://gitter.im/reanahub/reana)
- Follow us [on Twitter](https://twitter.com/reanahub)
- Collaborate [on GitHub](https://github.com/reanahub)
10 changes: 10 additions & 0 deletions docs/running-workflows/supported-systems/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Supported systems

| Engine | Parametrised? | Parallel execution? | Partial execution? |
| ---------------- | ------------- | ------------------- | ------------------ |
| [CWL](cwl) | yes | yes | no (1) |
| [Serial](serial) | yes | no | yes |
| [Yadage](yadage) | yes | yes | no (1) |

(1) The vanilla workflow system may support the feature, but not when run
via REANA environment.
87 changes: 87 additions & 0 deletions docs/running-workflows/what-is-workflow/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,88 @@
# What is a workflow?

Workflows describe which computational steps were taken to run an analysis.

## Simple workflows

Let us assume that our analysis is run in two stages, firstly a data filtering stage and secondly a data plotting stage. A hypothetical example:

```console
$ python ./code/mycode.py \
< ./data/mydata.csv > ./workspace/mydata.tmp
$ python ./code/mycode.py --plot myparameter=myvalue \
< ./workspace/mydata.tmp > ./results/myplot.png
```

Note how we call a given sequence of commands to produce our desired output plots. In order to capture this sequence of commands in a “runnable” or “actionable” manner, we can write a short shell script `run.sh` and make it parametrisable:

```console
$ ./run.sh --myparameter myvalue
```

In this case you will want to use the [Serial](../supported-systems/serial) workflow engine of REANA. The engine permits to express the workflow as a sequence of commands:

```console
START
|
|
V
+--------+
| filter | <-- mydata.csv
+--------+
|
| mydata.tmp
|
V
+--------+
| plot | <-- myparameter=myvalue
+--------+
|
| plot.png
V
STOP
```

Note that you can run different commands in different computing environments, but they must be run in a linear sequential manner.

The sequential workflow pattern will usually cover only simple computational workflow needs.

## Complex workflows

For advanced workflow needs we may want to run certain commands in parallel in a sort of map-reduce fashion. There are [many workflow systems](https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems) that are dedicated to expressing complex computational schemata in a structured manner. REANA supports several, such as [CWL](../supported-systems/cwl) and [Yadage](../supported-systems/yadage).

The workflow systems enable to express the computational steps in the form of [Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Directed_acyclic_graph) permitting advanced computational scenarios.

```console
START
|
|
+------+----------+
/ | \
/ V \
+--------+ +--------+ +--------+
| filter | | filter | | filter | <-- mydata
+--------+ +--------+ +--------+
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
+-------+
| merge |
+-------+
|
| mydata.tmp
|
V
+--------+
| plot | <-- myparameter=myvalue
+--------+
|
| plot.png
V
STOP
```

You can take inspiration from the existing [examples](https://github.com/reanahub/?q=reana-demo).

0 comments on commit 2fb3366

Please sign in to comment.