# TransformerScheduler Class

## How to use it 

**Functions defined to be applied on rows of Pandas dataframes** can be wrapped in a TransformerScheduler object in order to be applied on a full Pandas dataframe in a specific order.

TransformerScheduler objects :
- are compatible with the scikit-learn API (they have fit and transform methods).
- can be integrated into an execution Pipeline. 
- allows the wrapped functions to be applied in multiprocessing.

A TransformerScheduler object is initialized with a functions_scheduler argument. The functions_scheduler argument is a **list of tuples** containing information about the desired pre-processing functions. Each tuple describe an individual function and should contain the following elements:
1. A function
2. A tuple with the function’s arguments (if no arguments are required, use None or an empty tuple)
3. A column(s) name list returned by the function (if no arguments are required, use None or an empty list)

#### Defining a TransformerScheduler object

```python
from melusine.utils.transformer_scheduler import TransformerScheduler

melusine_transformer = TransformerScheduler(
    functions_scheduler=[
        (my_function_1, (argument1, argument2), ['return_col_A']),
        (my_function_2, None, ['return_col_B', 'return_col_C'])
        (my_function_3, (argument1, ), None),
        mode='apply_by_multiprocessing',
        n_jobs=4)
    ])
```

#### Parameters

The other parameters of the TransformerScheduler class are:
- **mode** (optional): Define mode to apply function along a row axis (axis=1) If set to ‘apply_by_multiprocessing’, it uses multiprocessing tool to parallelize computation. Possible values are ‘apply’ (default) and ‘apply_by_multiprocessing’
- **n_jobs** (optional): Number of cores used for computation. Default value, 1. Possible values are integers ranging from 1 (default) to the number of cores available for computation

#### Applying the TransformerScheduler object

```python
df = melusine_transformer.fit_transform(df)
```

#### Chaining transformers in a scikit-learn pipeline

Once all the desired functions and transformers have been defined, transformers can be chained in a Scikit-Learn Pipeline. The code below describes the definition of a pipeline:

```python
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ('TransformerName1', TransformerObject1),
    ('TransformerName2', TransformerObject2),
    ('TransformerName3', TransformerObject3),
])

df = pipeline.fit_tranform(df)
```

## Example 

In [1]:
import pandas as pd
df_example = pd.DataFrame({'col': [1,2,3,4,5]})
df_example

Unnamed: 0,col
0,1
1,2
2,3
3,4
4,5


Let's define three **functions that are applied on rows of dataframes** :

In [2]:
def fonction_example_1(row):
    return row['col']+1

In [3]:
def fonction_example_x(row,x):
    return row['col']+x

In [4]:
def fonction_example_x_y(row,x,y):
    return row['col']+x, row['col']+y

In [5]:
from melusine.utils.transformer_scheduler import TransformerScheduler

transformer_example = TransformerScheduler(
    functions_scheduler=[
        (fonction_example_1, None, ['col_1']),
        (fonction_example_x, (2,), ['col_x']),
        (fonction_example_x_y, (3,4), ['col_x2', 'col_y'])
    ],
    mode='apply_by_multiprocessing',
    n_jobs=4
)

In [6]:
df_example = transformer_example.fit_transform(df_example)

In [7]:
df_example

Unnamed: 0,col,col_1,col_x,col_x2,col_y
0,1,2,3,4,5
1,2,3,4,5,6
2,3,4,5,6,7
3,4,5,6,7,8
4,5,6,7,8,9
