# How to repeat steps for different parameters

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * `for_each` runs the substep with different parameters
  

### Option `for_each` <a id="Option_for_each"></a>

Option `for_each` allows you to repeat step process for each value of a variable. For example,

In [1]:
!touch file1 file2
%run

method = ['m1', 'm2']
input: 'file1', 'file2', for_each='method'
print(f"{_index}: {_input} {_method}")

0,1,2,3,4
,default,Workflow ID  6c2844dc1a8b592c,Index  #1,completed  Ran for < 5 seconds


0: file1 file2 m1
1: file1 file2 m2


will repeat the step with each item of variable `method`

SoS automatically creates a loop variable `_method` for variable `method`, which assumes a slice of the variable at each iteration. 

Nested loops are also allowed. For example,

In [2]:
!touch file1 file2
%run
[0]
method = ['m1', 'm2']
pars = [1, 2]
input: 'file1', 'file2', for_each=['method', 'pars']
print(f"{_index}: _input={_input} _method={_method}, _pars={_pars}")

0,1,2,3,4
,default,Workflow ID  7ba8a0aced741c44,Index  #2,completed  Ran for < 5 seconds


0: _input=file1 file2 _method=m1, _pars=1
1: _input=file1 file2 _method=m2, _pars=1
3: _input=file1 file2 _method=m2, _pars=2
2: _input=file1 file2 _method=m1, _pars=2


If you would like to loop the process with several parameters, you can put them into the same level by 'var1,var2'. For example,

In [4]:
%run
!touch file1 file2

[0]
method = ['m1', 'm2']
pars = [1, 2]
input: 'file1', 'file2', for_each=['method,pars']
print(f"{_index}: _input={_input} _method={_method}, _pars={_pars}")

0,1,2,3,4
,default,Workflow ID  7ba8a0aced741c44,Index  #4,completed  Ran for < 5 seconds


0: _input=file1 file2 _method=m1, _pars=1
2: _input=file1 file2 _method=m1, _pars=2
3: _input=file1 file2 _method=m2, _pars=2
1: _input=file1 file2 _method=m2, _pars=1


The variable passed to option `for_each` can a sequence (`list`, `tuple`, `set`, etc), a Pandas `Series`, `Index`, or `DataFrame`. In the last case, each `_loop` variable presents a line in the dataframe and you can access single values using format `_loop["header"]`. For example

In [5]:
%preview data
%run
[0]
import pandas as pd
data = pd.DataFrame([(1, 2, 'Hello'), (2, 4, 'World')], columns=['A', 'B', 'C'])
input: for_each='data'
output: f"{_data['A']}_{_data['B']}_{_data['C']}.txt"
sh: expand=True
    touch {_output}

0,1,2,3,4
,default,Workflow ID  7ba8a0aced741c44,Index  #5,completed  Ran for < 5 seconds


0: _input=file1 file2 _method=m1, _pars=1
1: _input=file1 file2 _method=m2, _pars=1
2: _input=file1 file2 _method=m1, _pars=2
3: _input=file1 file2 _method=m2, _pars=2


If you would like define your own loop variable, or if the default loop variable does not work (e.g. loop through `obj.sequence` where `_obj.sequence` is not a valid variable name), you can use a dictionary syntax in the format of `{'varname': sequence}`. Mult-variable and nested loops can be specified in the format of `{'var1': seq1, 'var2': seq2}` (same level) and `[{'var1': seq1}, {'var2': seq2}]`. 

For example, the first example for this parameter can be written as

In [6]:
!touch file1 file2

input: 'file1', 'file2', for_each=dict(method=['m1', 'm2'])
print(f"{_index}: {_input} {method}")

0: file1 file2 m1
1: file1 file2 m2


and a latter example can be written as

In [7]:
!touch file1 file2
%run
[0]
input: 'file1', 'file2', 
   for_each=dict(method=['m1','m2'], pars=[1, 2])
print(f"{_index}: _input={_input} method={method}, pars={pars}")

0,1,2,3,4
,default,Workflow ID  7ba8a0aced741c44,Index  #6,completed  Ran for < 5 seconds


0: _input=file1 file2 _method=m1, _pars=1
1: _input=file1 file2 _method=m2, _pars=1
2: _input=file1 file2 _method=m1, _pars=2
3: _input=file1 file2 _method=m2, _pars=2


The dictionary syntax also supports multiple keys. This helps customizing groups of variables. For example in the script below we only care for situations where `n` is greater than `p`,  

In [8]:
!touch a.txt
%run
[1]
import itertools
parameter: n = [100, 300]
parameter: p = [50, 100, 200]
parameter: outfile = ['1.txt', '2.txt', '3.txt', '4.txt', '5.txt', '6.txt']
input: 'a.txt', for_each= {'_n,_p': [(_n,_p) for _n in n for _p in p if _n > _p]}
print(f"{_index} {outfile[_index]} {_n} {_p}")

0,1,2,3,4
,default,Workflow ID  7ba8a0aced741c44,Index  #7,completed  Ran for < 5 seconds


0: _input=file1 file2 _method=m1, _pars=1
1: _input=file1 file2 _method=m2, _pars=1
3: _input=file1 file2 _method=m2, _pars=2
2: _input=file1 file2 _method=m1, _pars=2


## Further reading

* 