# <font color='orange'>Introduction </font>

 The ColumnTransformer is a useful tool in scikit-learn that allows you to apply different preprocessing steps to different columns of a dataset. This is particularly helpful when dealing with heterogeneous data, where different columns may require different types of preprocessing.

It is advancement over One-Hot Encoder(OHE). As in case of OHE if we want to apply OHE on some of the columns then we have to seperate those columns and then apply OHE and then again stack it Horiontally . This drawback of OHE is overcome by column Transformer


# <font color='orange'>Practical </font>

In [17]:
#importing libraries
import pandas as pd 
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder

In [18]:
#importing data 
data = pd.read_csv('covid.csv')

In [19]:
data.head() 

Unnamed: 0,Age,Gender,Fever,Cough,City,Has_Covid
0,39,Female,75.0,NO,Nashik,YES
1,65,Female,110.0,NO,Kolhapur,NO
2,50,Female,72.0,MILD,Mumbai,YES
3,33,Female,86.0,STRONG,Mumbai,YES
4,66,Female,85.0,MILD,Mumbai,YES


In [20]:
data.isnull().sum() #checking data for missing values

Age           0
Gender        0
Fever        11
Cough         0
City          0
Has_Covid     0
dtype: int64

So here we have total 6 columns , among which age column is fine , gender and city should be OHE , Fever should be apply simpleImputer , cough should be ordinal Encoder and Has_covid should undergo label Encoder

## <font color='orange'>Using column Transformer </font>

In [21]:
#create a object for ColumnTransformer
trans = ColumnTransformer(transformers=[
    ('tnf1',SimpleImputer(),['Fever']),
    ('tnf2',OneHotEncoder(sparse=False,drop='first'),['Gender','City']),
    ('tnf3',OrdinalEncoder(categories=[['NO','MILD','STRONG']]),['Cough'])
],remainder='passthrough') # here we pass 2 parameters , 1st one transformers which defines a list of transformations we want to apply and 2nd one is remainder which means the remaining columns where the transformations are not applied . remainder has option of 2 values 1.drop ->will drop remaining cols 2.passthrough -> we accept the remaining cols as it is

In [23]:
trans.fit_transform(data)



array([[75.0, 0.0, 0.0, 1.0, 0.0, 0.0, 39, 'YES'],
       [110.0, 0.0, 0.0, 0.0, 0.0, 0.0, 65, 'NO'],
       [72.0, 0.0, 1.0, 0.0, 0.0, 1.0, 50, 'YES'],
       [86.0, 0.0, 1.0, 0.0, 0.0, 2.0, 33, 'YES'],
       [85.0, 0.0, 1.0, 0.0, 0.0, 1.0, 66, 'YES'],
       [90.0, 1.0, 1.0, 0.0, 0.0, 1.0, 23, 'NO'],
       [86.0, 0.0, 0.0, 1.0, 0.0, 2.0, 37, 'NO'],
       [82.0, 0.0, 0.0, 1.0, 0.0, 1.0, 42, 'NO'],
       [91.84269662921348, 1.0, 1.0, 0.0, 0.0, 0.0, 44, 'YES'],
       [96.0, 0.0, 0.0, 0.0, 1.0, 2.0, 51, 'YES'],
       [74.0, 0.0, 0.0, 0.0, 1.0, 2.0, 29, 'YES'],
       [103.0, 0.0, 1.0, 0.0, 0.0, 1.0, 57, 'NO'],
       [114.0, 1.0, 1.0, 0.0, 0.0, 2.0, 25, 'YES'],
       [114.0, 0.0, 0.0, 0.0, 1.0, 2.0, 29, 'NO'],
       [91.0, 0.0, 0.0, 1.0, 0.0, 1.0, 60, 'NO'],
       [100.0, 1.0, 0.0, 0.0, 0.0, 0.0, 43, 'YES'],
       [91.84269662921348, 1.0, 0.0, 0.0, 0.0, 0.0, 62, 'YES'],
       [110.0, 1.0, 0.0, 0.0, 1.0, 2.0, 45, 'YES'],
       [91.0, 0.0, 0.0, 0.0, 0.0, 2.0, 30, 'YES'],
      

In [24]:
trans.fit_transform(data).shape



(100, 8)