### Composite Transformers

+ Generally, training data contains diverse features such as numerical and categorical. 
+ Different feature types need to be processed with different transformers
+ So, we need a way to combine different feature transformers seamlessly

`sklearn.compose` has useful classes and methods to apply transformations on a subset of features and combine them
+ ColumnTransformer
+ TransformedTargetRegressor

#### ColumnTransformer
+ We can use the column transformer to apply different transformations on each of the different columns
+ The ColumnTransformer applies a set of transformers to columns of an array or `pandas.DataFrame`, concatenates the transformed outpute from different transformers into a single matrix.
+ Its useful for transforming heterogenous data by applying different transformers to separate subsets of features
+ ColumnTransformer serve a very different purpose compared to Pipelines, they're used to selectively apply different transformations to different columns whereas Pipelines are used for applying transformations step by step, sequentially.


Each tuple in the ColumnTransformer is of the following format: `(estimatorName, estimator(...), columnIndex)`

In [1]:
import numpy as np
import pandas as pd

# Example usage of the ColumnTransformer
X = np.array([[20.0, 11.2, 15.6, 13.0, 18.6, 16.4], 
            ['male', 'female', 'female', 'male', 'male', 'female']]).T
X

# Here the first column represents the age, and the second column is for the gender

array([['20.0', 'male'],
       ['11.2', 'female'],
       ['15.6', 'female'],
       ['13.0', 'male'],
       ['18.6', 'male'],
       ['16.4', 'female']], dtype='<U32')

In [5]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import MaxAbsScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer

# suppose we want to apply the MaxAbsScaler on the first column for age
# and we want to one-hot encode the second column for gender

column_trans = ColumnTransformer([('ageScaler', MaxAbsScaler(), [0]), 
                                ('genderEncoder', OneHotEncoder(dtype='int'), [1])])
column_trans.fit_transform(X)

array([[1.  , 0.  , 1.  ],
       [0.56, 1.  , 0.  ],
       [0.78, 1.  , 0.  ],
       [0.65, 0.  , 1.  ],
       [0.93, 0.  , 1.  ],
       [0.82, 1.  , 0.  ]])

#### Drop and Passthrough in ColumnTransformer

With "drop" and "passthrough" keywords, we can specify whether we just want to drop a column entirely during transformation or just skip transforming it and leave it as is.

In [17]:
testdf = pd.DataFrame({
    "A":[1, 2, np.nan],
    "B":[10, 20, 30],
    "C":[100, 200, 300],
    "D":[1000, 2000, 3000],
    "E":[10000, 20000, 30000]
})

In [18]:
testdf

Unnamed: 0,A,B,C,D,E
0,1.0,10,100,1000,10000
1,2.0,20,200,2000,20000
2,,30,300,3000,30000


+ Using passthrough, I can specify that i dont want these columns to be touched in the transformation, just leave them as they are in the final output.
+ Using drop, I can specify that i dont want these columns to be included in the final output, just drop them.

In [12]:
ct = ColumnTransformer([
    ("imputer", SimpleImputer(strategy="mean"), ["A"]), # impute this column and fill missing value
    ("do nothing", "passthrough", ["B", "C"]),          # just pass through these columns without doing anything
], remainder="drop")                                    # drop the remaining columns, don't include them 

In [13]:
ct.fit_transform(testdf)

array([[  1. ,  10. , 100. ],
       [  2. ,  20. , 200. ],
       [  1.5,  30. , 300. ]])

Another way of achieving the same effect

In [14]:
ct = ColumnTransformer([
    ("imputer", SimpleImputer(strategy="mean"), [0]), # impute this column and fill missing values
    ("dont include", "drop", [3, 4])                  # drop the columns at index 3, 4. dont include them
], remainder="passthrough")                           # passthrough the remaining columns

In [15]:
ct.fit_transform(testdf)

array([[  1. ,  10. , 100. ],
       [  2. ,  20. , 200. ],
       [  1.5,  30. , 300. ]])