# Title

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * a
  

### Execution of external tasks

If a workflow is large, you can define parts of a workflow as **tasks** and execute them on remote servers, clusters, or task queues. The tasks are

* **self-contained** in that they contain all the required information to be executed anywhere.
* **generated and executed dynamically** in that a task would only be generated when all its dependencies have been met.
* **independent of workflows and other tasks**. Tasks are defined by the jobs they are performing and can be shared by different workflows if they happen to perform exactly the same function.
* **can be executed on remote hosts or task queues**. Tasks can be executed directly on a local or remote host, to task queues such as RQ, or be submitted to batch systems such as PBS/Torch, Slurm, and IBM LSF.
* **independent of file systems**. SoS automatically synchronize input, output files and specified files between local and remote systems so you can easily switch from one remote host to another.
* **support remote targets**. If input, depends, or output files are large, you also have the option to process remote targets directly without synchronizing them to local host.

Conceptually speaking, a **substep** consists of everything after the `input` statement. It can be repeated with subsets of input files or parameters defined by input options `group_by` or `for_each`. For example, if `bam_files` is a list of bam files,

```
[10]
input: bam_files, group_by=1
output: f"{_input}.bai"

run: expand=True
    samtools index {_input}
```

execute a shell script to process each bam file. This is done sequentially for each input file, and is performed by SoS.

You can easily specify part or all of a step process as **tasks**, by prepending the statements with a `task` keyword:


```
[10]
input: bam_files, group_by=1
output: f"{_input}.bai"

task:
run: expand=True
    samtools index {_input}
```

This statement declares the rest of the step process as a `task`. For each input file, a task will be created with an ID determined from task content and context (input and output files, variables etc). The task will be by default executed by a local `process` task queue where tasks are started as background processes.

The benefit of executing tasks externally is that the tasks are executed concurrently, on the local machine or a remote server, or be submitted to a task queue. For example, in the previous example, multiple tasks could be executed in parallel (but on the same machine) unless you specify it otherwise as follows

```
[10]
input: bam_files, group_by=1
output: f"{_input}.bai"

task: concurrent=False
run: expand=True
    samtools index {_input}
```

You can also use command

```
sos run myscript -q cluster
```
or use option `queue`
```
[10]
input: bam_files, group_by=1
output: f"{_input}.bai"

task: queue='cluster'
run: expand=True
    samtools index {_input}
```

to submit the commands to a cluster system to be executed on different computing nodes.

The following figure illustrates the task model of SoS

![job queue](../media/job_queue.svg )

Basically,
1. Tasks are part of step processes.
2. Tasks are managed by task engines, multiple task engines can be used for a single workflow.
3. Task engines generate task files, submit tasks, monitor task status, and return results to SoS workflow.
4. Remote task engines synchronize input files, translate and copy tasks to server, and start the tasks on the remote server.

<div class="bs-callout bs-callout-info" role="alert">
  <h4>The "None" queue</h4>
    <p>If you use <code>sos run -q None</code> from command line or <code>task: queue=None</code> in the script, the tasks will not be sent to any task engine and will be executed as regular step statements.</p>  
</div>

## Further reading

* 