# How to group input and output targets by their labels

* **Difficulty level**: easy
* **Time need to learn**: 10 minutes or less
* **Key points**:
  * `_input` or `_output` can be grouped by the **labels** of input and output targets
  * Use keyword arguments to specify labels of input or output arguments
  * `_input[name]` and `_output[name]` return subset of `_input` or `_output` with label `name`
  * Outputs returned from `output_from` and `named_output` can have their own labels  

## Named inputs and outputs

Let us first create a few temporary files as inputs of the examples

In [1]:
!touch a.txt b.txt ref.txt

 <div class="bs-callout bs-callout-primary" role="alert">
    <h4>Named inputs and outputs</h4>
    <p>Keyword arguments in input and output statements assign labels to the input or output files and allow access to subsets of inputs or outputs with these labels</p>
 </div>

in SoS, we usually specify one or more files as input of a SoS steps, and refer to them as variable `_input`:

In [2]:
input: 'a.txt', 'b.txt'
print(_input)

a.txt b.txt


Using keyword parameters, you can assign labels to these files and access them separately:

In [4]:
input: A='a.txt', B='b.txt'
print(f'input of the substep is {_input}')
print(f'input of the substep with label A is {_input["A"]}')
print(f'input of the substep with label B is {_input["B"]}')

input of the substep is a.txt b.txt
input of the substep with label A is a.txt
input of the substep with label B is b.txt


Note that although `_input['A']` and `_input['B']` are used to refer to subsets of `_input`, the variable `_input` can still be used and refers to all input files.

Named output works in a similar fashion. In the following workflow, the input files are labelled with `data` and the reference is labelled with `reference`. In the output statement, the `data` part of the input (`_input["data"]`) is used to generate results with label `result`.

In the following `print` statement,  `_input["reference"]`, `_output['result']` etc are used to obtain subsets of `_input` and `_output`. These subsets of inputs or outputs are usually called **named inputs** and **named outputs**.

In [12]:
input: data = ['a.txt', 'b.txt'], reference='ref.txt'
output: result=[x.with_suffix('.res') for x in _input["data"]]
_output.touch()                              

print(f'''\
Input of step is {_input} with labels {step_input.labels}

Input data is {_input["data"]}
Reference is {_input["reference"]}

Output is {_output}
Result of output is {_output['result']}
''')

Input of step is a.txt b.txt ref.txt with labels ['data', 'data', 'reference']

Input data is a.txt b.txt
Reference is ref.txt

Output is a.res b.res
Result of output is a.res b.res



## Slices of `sos_targets` with groups *

If a step has multiple substeps, variables `step_input` and `step_output` will consist of multiple groups. For example, in the following workflow, the `_output` of step `[10]` has named output `A` and `B`. The output of the entire step consists of 4 groups, which are retrieved by function `output_from(-1)` (`-1` means last step). The expression

```python
input: output_from(-1)['A']
```
obtains all targets with source `A`, including the groups, so `_input` of step `20` consists of only targets with source `A`.

In [5]:
%run -v0
[10]
input: for_each=dict(i=range(4))
output: A=f'a_{i}.txt', B=f'b_{i}.txt'
_output.touch()       

print(f'Output step is {_output} with labels {_output.labels}')

[20]
input: output_from(-1)['A']
print(f'input of substep is {_input}')

0,1,2,3,4
,default,Workflow ID  1d31d133dbc5cb66,Index  #3,completed  Ran for < 5 seconds


input of substep is a_0.txt
input of substep is a_1.txt
input of substep is a_2.txt
input of substep is a_3.txt


## Inheritance of target labels *

 <div class="bs-callout bs-callout-info" role="alert">
    <h4>Inherit and override target labels</h4>
    <p>Target lables are created and passed in SoS as follows</p>
    <ul>
        <li>Unnamed targets (targets specified with positional arguments) are labeled with step names</li>
        <li>Labels are stored in variables <code>_input</code>, <code>_output</code>, <code>step_input</code> and <code>step_output</code>, and are passed by default to next step, or through functions <code>named_output</code> and <code>output_from</code></li>
        <li>Keyword argument overrides default labels</li>
    </ul>
 </div>

The creation and inheritance of target labels follow a few rules. Firstly, unnamed targets are labeled with step names. This is usually not useful in the step itself, but can be useful when the outputs are passed to another step.

For example, in the following workflow, step `default` gets the outputs from step `A` and `B` using function `output_from(['A', 'B'])`. Because the default labels for output from steps `A` and `B` are `A` and `B` respectively, you can differentiate the inputs using `_input['A']` and `_input['B']`.

In [5]:
%run -v0
[A]
output: 'a.txt'
_output.touch()

[B]
output: 'b.txt'
_output.touch()

[default]
input: output_from(['A', 'B'])
print(f'Input from step A is {_input["A"]}')
print(f'Input from step B is {_input["B"]}')

Input from step A is a.txt
Input from step B is b.txt


However, if you use keyword arguments in the input statement, the default or inherited labels will be overridden:

In [6]:
%run -v0
[A]
output: 'a.txt'
_output.touch()

[B]
output: 'b.txt'
_output.touch()

[default]
input: a_out=output_from('A'), b_out=output_from('B')
print(f'Input from step A is {_input["a_out"]}')
print(f'Input from step B is {_input["b_out"]}')

Input from step A is a.txt
Input from step B is b.txt


## Further reading
* [How to use named output in data-flow style workflows](doc/user_guide/named_output.html)
* [How to execute workflow to generate specific output](doc/user_guide/target_oriented.html)