Minor user's manual update

Barski-lab · Jul 12, 2020 · 4a9e73b · 4a9e73b
1 parent d21269c
commit 4a9e73b
Show file tree

Hide file tree

Showing 6 changed files with 34 additions and 44 deletions.
diff --git a/docs/images/trigger_1.jpg b/docs/images/trigger_1.jpg
diff --git a/docs/images/trigger_2.jpg b/docs/images/trigger_2.jpg
diff --git a/docs/index.rst b/docs/index.rst
@@ -40,4 +40,4 @@ Michael Kotliar, Andrey V Kartashov, Artem Barski, CWL-Airflow: a lightweight pi
    How it works <readme/how_it_works.md>
    How to install <readme/how_to_install.md>
    How to use <readme/how_to_use.md>
-   What if something doesn't work <readme/what_if_something_doesnt_work.md>
+   What if it doesn't work <readme/what_if_it_doesnt_work.md.md>
diff --git a/docs/readme/how_it_works.md b/docs/readme/how_it_works.md
@@ -4,7 +4,7 @@ The CWL-Airflow package extends Airflow’s functionality with the ability to pa
 
 Previously, in order to execute CWLDAG a file describing workflow-specific input parameters in JSON or YAML format should have been placed in the special jobs folder. In the current version we removed the necessity for the jobs folder, as the new CWLDAGs can be easily triggered with the required input parameters through the REST API, Airflow Webserver or command line interface. In case someone needs to monitor a special folder for the new job files added, it can be easily implemented as a separate standard for Airflow DAG.
 
-![CWL-Airflow diagram](../images/scheme.jpg)
+![](../images/scheme.jpg)
 
 To add a new workflow, one should simply write a small python script (see example below) and place it into the DAGs folder. Only two parameters are required to initialize a new CWLDAG: path to the **workflow** file and **dag_id**.
 ```python

diff --git a/docs/readme/how_to_use.md b/docs/readme/how_to_use.md
@@ -4,7 +4,7 @@
 
 Before using **CWL-airflow** it should be configured with `cwl-airflow init`
 
-```sh
+```
 $ cwl-airflow init --help
 
 usage: cwl-airflow init [-h] [--home HOME] [--config CONFIG]
@@ -16,31 +16,42 @@ optional arguments:
   --config CONFIG  Set path to Airflow configuration file. Default: first try
                    AIRFLOW_CONFIG then '[airflow home]/airflow.cfg'
 ```
-**Init command will run the following steps:**
-- Call `airflow initdb` for the specified `--home` and `--config` parameters
-- Update `airflow.cfg` to hide paused DAGs, skip loading example DAGs and **do not** pause newly added DAGs 
+
+**Init command will run the following steps** for the specified `--home` and `--config` parameters:
+- Call `airflow initdb`
+- Update `airflow.cfg` to hide paused DAGs, skip loading example DAGs and **do not** pause newly created DAGs 
 - Add new connection `process_report` to report DAG's execution progress and results to `http://localhost:3070` (URL is currently hardcoded)
 - Put **clean_dag_run.py** into the DAGs folder (later its functions will be moved to API)
 
-**Optionally**, you can add the following configuration parameters into your **airflow.cfg** file
+**Optionally**, you can update your **airflow.cfg** with `[cwl]` section setting the following configuration parameters:
 
 ```ini
 [cwl]
-# Temp folder to keep intermediate workflow execution data. Default: AIRFLOW_HOME/cwl_tmp_folder
+
+# Temp folder to keep intermediate workflow execution data.
+# Default: AIRFLOW_HOME/cwl_tmp_folder
 tmp_folder =
-# Output folder to save workflow execution results. Default: AIRFLOW_HOME/cwl_outputs_folder
+
+# Output folder to save workflow execution results.
+# Default: AIRFLOW_HOME/cwl_outputs_folder
 outputs_folder = 
-# Folder to keep pickled workflows for fast workflow loading. Default: AIRFLOW_HOME/cwl_pickle_folder
+
+# Folder to keep pickled workflows for fast workflow loading.
+# Default: AIRFLOW_HOME/cwl_pickle_folder
 pickle_folder = 
-# Boolean parameter to force using docker for workflow step execution. Default: True
+
+# Boolean parameter to force using docker for workflow step execution.
+# Default: True
 use_container = 
-# Boolean parameter to disable passing the current uid to "docker run --user". Default: False
+
+# Boolean parameter to disable passing the current user id to "docker run --user".
+# Default: False
 no_match_user = 
 ```
 
-## Adding a new pipeline
+## Adding a pipeline
 
-The easiest way to add a new pipeline to CWL-Airflow is to put a simple python script into your DAGs folder.
+The easiest way to add a new pipeline to CWL-Airflow is to put the following python script into your DAGs folder.
 ```python
  #!/usr/bin/env python3
 from cwl_airflow.extensions.cwldag import CWLDAG
@@ -49,38 +60,17 @@ dag = CWLDAG(
     dag_id="my_dag_name"
 )
 ```
-CWLDAG can be customized by providing `default_args` parameter to the constructor. Values from 
+As `CWLDAG` class was inherited from `DAG`, additional arguments, such as `default_args`, can be provided. The latter can include `cwl` section similar to the one from **airflow.cfg** file, but with lower priority.
 
-To submit new CWL descriptor and Job files to *cwl-airflow* run the following command
-```bash
-cwl-airflow submit WORKFLOW JOB
-```
+**After adding a new DAG**, Airflow Scheduler will load it (by default if happens **every 5 minutes**) and the DAG can be run.
 
-Optional parameters:
+## Executing a pipeline
+
+The most convenient way to **manually trigger** DAG execution is using Airflow Webserver. Input parameters can be set in the **job** field of the running configuration as it is displayed on the pictures below.
+
+![](../images/trigger_1.jpg)
+![](../images/trigger_2.jpg)
 
-| Flag | Description                                                        | Default           |
-|------|--------------------------------------------------------------------|-------------------|
-| -o   | path to the folder where all the output files                      | current directory |
-|      | should be moved after successful workflow execution, str           |                   |
-| -t   | path to the temporary folder for storing intermediate results, str | */tmp/cwlairflow* |
-| -u   | ID for DAG's unique identifier generation, str                     | random uuid       |
-| -r   | run submitted workflow with Airflow Scheduler, bool                | False             |
-
-Arguments `-o`, `-t` and `-u` doesn't overwrite the values from the Job file set in the fields
-*output_folder*, *tmp_folder* and *uid* correspondingly. The meaning of these fields is explaned in
-[How It Works](./how_it_works.md) section.
-
-The *submit* command will resolve all relative paths from Job file adding mandatory fields *workflow*, *output_folder*
-and *uid* (if not provided) and will copy Job file to the Jobs folder. The CWL descriptor file and all input files
-referenced in the Job file should not be moved or deleted while workflow is running. The *submit* command will **not** execute
-submitted workflow unless *-r* argument is provided. Otherwise, make sure that *Airflow Scheduler* (and optionally
-*Airflow Webserver*) is running. Note, that *-r* argument was added only to comply with the interface through which CWL
-community runs it's conformance tests. So it's more preferable to execute submitted workflow with
-*Airflow Scheduler*, especially if you are planning to use `LocalExecutor` instead of default `SequentialExecutor`.
-
-Depending on your Airflow configuration it may require some time for Airflow Scheduler
-and Webserver to pick up new DAGs. Consider using `cwl-airflow init -r 5 -w 4` to make Airflow Webserver react faster on all
-newly created DAGs.
 
 To start Airflow Scheduler (**don't** run it if *cwl-airflow submit* is used with *-r* argument)
 ```bash

diff --git a/docs/readme/what_if_something_doesnt_work.md → docs/readme/what_if_it_doesnt_work.md b/docs/readme/what_if_something_doesnt_work.md → docs/readme/what_if_it_doesnt_work.md
@@ -1,4 +1,4 @@
-# What if something doesn't work
+# What if is doesn't work
 
 
 ## CWL-airflow is not found