## ColumnTransformer
Applies transformers to columns of an array or pandas DataFrame.<br>
This estimator allows different columns of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space. <br>
This is useful for to combine several feature extraction mechanisms or transformations into a single transformer.<br>

**Use ColumnTransformer to apply different preprocessing to different columns:**
<ul>
<li>select from DataFrame columns by name
<li>passthrough or drop unspecified columns
</ul>

In [1]:
import pandas as pd
df = pd.read_csv('http://bit.ly/kaggletrain', nrows=6)

In [2]:
cols = ['Fare', 'Embarked', 'Sex', 'Age']
X = df[cols]
X

Unnamed: 0,Fare,Embarked,Sex,Age
0,7.25,S,male,22.0
1,71.2833,C,female,38.0
2,7.925,S,female,26.0
3,53.1,S,female,35.0
4,8.05,S,male,35.0
5,8.4583,Q,male,


In [3]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.compose import make_column_transformer
# SimpleIMputer - completing missing values
# here, 'age' feature consists missing values. So use SimpleImputer, which by default 
# calculate mean of the column and assign it to missing values. 

In [4]:
ohe = OneHotEncoder()
imp = SimpleImputer()

In [5]:
# make_column_transformer is used if we want to perform multiple Scaling one after another. 
# ie., it is used to form a pipeline.
# Syntax: make_column_transformer((Object, [Columns]),(, Object, [Columns]))
# ie., make_column_transformer((transformer, columns) tuples of transformers)
ct = make_column_transformer(
    (ohe, ['Embarked', 'Sex']),  # apply OneHotEncoder to Embarked and Sex
    (imp, ['Age']),              # apply SimpleImputer to Age
    remainder='passthrough')     # include remaining column (Fare) in the output

In [6]:
# column order: Embarked (3 columns), Sex (2 columns), Age (1 column), Fare (1 column)
ct.fit_transform(X)

array([[ 0.    ,  0.    ,  1.    ,  0.    ,  1.    , 22.    ,  7.25  ],
       [ 1.    ,  0.    ,  0.    ,  1.    ,  0.    , 38.    , 71.2833],
       [ 0.    ,  0.    ,  1.    ,  1.    ,  0.    , 26.    ,  7.925 ],
       [ 0.    ,  0.    ,  1.    ,  1.    ,  0.    , 35.    , 53.1   ],
       [ 0.    ,  0.    ,  1.    ,  0.    ,  1.    , 35.    ,  8.05  ],
       [ 0.    ,  1.    ,  0.    ,  0.    ,  1.    , 31.2   ,  8.4583]])

In [7]:
# ColumnTransformer is used if we want to perform multiple Scaling one after another. 
# ie., it is used to form a pipeline.
# Syntax: ColumnTransformer([(any_name(optional), Object, [Columns]),(any_name(optional), Object, [Columns])])
# ie., ColumnTransformer(List of (name, transformer, columns) tuples of Transformers)

ct = ColumnTransformer([
    ('ohe',ohe, ['Embarked', 'Sex']),  # apply OneHotEncoder to Embarked and Sex
    ('impute',imp, ['Age'])],              # apply SimpleImputer to Age
    remainder='passthrough')     # include remaining column in the output
# remainder = 'passthrough' means leave the remaining columns as it is.
# remainder = 'drop' means drop all the other columns

In [8]:
# column order: Embarked (3 columns), Sex (2 columns), Age (1 column), Fare (1 column)
ct.fit_transform(X)

array([[ 0.    ,  0.    ,  1.    ,  0.    ,  1.    , 22.    ,  7.25  ],
       [ 1.    ,  0.    ,  0.    ,  1.    ,  0.    , 38.    , 71.2833],
       [ 0.    ,  0.    ,  1.    ,  1.    ,  0.    , 26.    ,  7.925 ],
       [ 0.    ,  0.    ,  1.    ,  1.    ,  0.    , 35.    , 53.1   ],
       [ 0.    ,  0.    ,  1.    ,  0.    ,  1.    , 35.    ,  8.05  ],
       [ 0.    ,  1.    ,  0.    ,  0.    ,  1.    , 31.2   ,  8.4583]])

## Note:
#### make_column_transformer and ColumnTransformer are same, but syntax slightly changes.
make_column_transformer((Object, [Columns]),( Object, [Columns])) <br>
ie., make_column_transformer((name, transformer, columns) tuples of transformers)

ColumnTransformer([(any_name(optional), Object, [Columns]).(any_name(optional), Object, [Columns])])<br>
ie., ColumnTransformer(List of (name, transformer, columns) tuples of Transformers) <br>
**name: str -- This allows the transformer and its parameters to be set using set_params and searched in grid search.**<br>
**which can be used to directly set the parameters of the estimators**


#### Differences:
In make_column_transformer, no 'names' should be passed and transformers are passed as tuples.
In ColumnTransformser, 'names' must be passes and transformers are passed as list of tuples.

## Example 2:

In [9]:
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import Normalizer
X = np.array([[0., 1., 2., 2.],
              [1., 1., 0., 1.]])

ct = ColumnTransformer(
    [("norm1", Normalizer(norm='l1'), [0, 1]),
     ("norm2", Normalizer(norm='l1'), slice(2, 4))])
# Normalizer scales each row of X to unit norm. 
# A separate scaling is applied for the two first and two last columns independently.

ct.fit_transform(X)

array([[0. , 1. , 0.5, 0.5],
       [0.5, 0.5, 0. , 1. ]])

## Setting parameters of the various estimators using their names and the parameter name separated by a ‘__’

In [9]:
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import Normalizer
X = np.array([[0., 1., 2., 2.],
              [1., 1., 0., 1.]])
ct = ColumnTransformer(
    [("norm1", Normalizer(), [0, 1])])

In [10]:
ct.get_params() # Normalizer is set to default parameters norm='l2'

{'n_jobs': None,
 'remainder': 'drop',
 'sparse_threshold': 0.3,
 'transformer_weights': None,
 'transformers': [('norm1', Normalizer(), [0, 1])],
 'verbose': False,
 'norm1': Normalizer(),
 'norm1__copy': True,
 'norm1__norm': 'l2'}

In [11]:
ct.set_params(norm1__norm='l1')

ColumnTransformer(transformers=[('norm1', Normalizer(norm='l1'), [0, 1])])

In [12]:
ct.get_params() # New parameters are set as norm='l1'

{'n_jobs': None,
 'remainder': 'drop',
 'sparse_threshold': 0.3,
 'transformer_weights': None,
 'transformers': [('norm1', Normalizer(norm='l1'), [0, 1])],
 'verbose': False,
 'norm1': Normalizer(norm='l1'),
 'norm1__copy': True,
 'norm1__norm': 'l1'}