### Column Transformer
When you need to apply different transforms to preprocess different columns in a dataset like using Ordinal Encoding for Ordinal Category features, One Hot Encoding for Nominal Features, Imputers for handling missing values. It becomes a hectic job if you do them one by one and join them in the end. 
So to solve this problem, you use Column Transformers.

In [1]:
import numpy as np
import pandas as pd

In [2]:
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder

In [4]:
df = pd.read_csv('Datasets/covid_toy.csv')

In [5]:
df.head()

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,Male,103.0,Mild,Kolkata,No
1,27,Male,100.0,Mild,Delhi,Yes
2,42,Male,101.0,Mild,Delhi,No
3,31,Female,98.0,Mild,Kolkata,No
4,65,Female,101.0,Mild,Mumbai,No


In [6]:
df.isnull().sum()  # find missing values

age           0
gender        0
fever        10
cough         0
city          0
has_covid     0
dtype: int64

Fever has 10 missing values

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df.drop(columns=['has_covid']),
                                                 df['has_covid'],
                                                test_size=0.2)

In [13]:
X_train

Unnamed: 0,age,gender,fever,cough,city
9,64,Female,101.0,Mild,Delhi
78,11,Male,100.0,Mild,Bangalore
53,83,Male,98.0,Mild,Delhi
23,80,Female,98.0,Mild,Delhi
48,66,Male,99.0,Strong,Bangalore
...,...,...,...,...,...
21,73,Male,98.0,Mild,Bangalore
39,50,Female,103.0,Mild,Kolkata
28,16,Male,104.0,Mild,Kolkata
84,69,Female,98.0,Strong,Mumbai


### Using Column Transformer

In [14]:
from sklearn.compose import ColumnTransformer

In [20]:
# In transformers=[], you pass in the transforms you want to apply to certain columns
# In remainder='passthrough' or 'drop' , it decides whether to keep other columns as it is or drop them
transformer = ColumnTransformer(transformers=[
    ('tnf1', SimpleImputer(), ['fever']),
    ('tnf2', OrdinalEncoder(categories=[['Mild', 'Strong']]), ['cough']),
    ('tnf3', OneHotEncoder(drop='first', sparse_output=False), ['gender','city'])
], remainder= 'passthrough' )

put the transformers in a tuple with the format:
('Name Given ', Transformer Used, [Column names])

In [22]:
X_train_new = transformer.fit_transform(X_train)

In [23]:
X_train_new.shape

(80, 7)

In [25]:
X_test_new = transformer.transform(X_test)

In [26]:
X_test_new.shape

(20, 7)