
# 🔷 What is a ColumnTransformer?
ColumnTransformer is a tool in scikit-learn (sklearn) that helps you:

Apply different transformations to different columns at the same time.

✅ You can use it to:

Scale numeric columns

Encode categorical columns

Leave some columns unchanged

All in one step



# 🔧 Why Do We Use It?
In real datasets:

Some columns are numbers (need scaling like StandardScaler)

Some are text/categorical (need encoding like OneHotEncoder or LabelEncoder)

Some you may want to drop or pass through

Doing these separately can be confusing or slow.
So, ColumnTransformer helps to organize and apply everything together in a clean and professional way.



In [1]:
import numpy as np
import pandas as pd

from sklearn.impute import SimpleImputer #for filling missing values
from sklearn.preprocessing import OneHotEncoder #for converting categorical data into numerical data but make diffrent columns

from sklearn.preprocessing import OrdinalEncoder #for converting categorical data into numerical data but accoring to us

In [2]:
df=pd.read_csv("covid_toy.csv")

In [3]:
df

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,Male,103.0,Mild,Kolkata,No
1,27,Male,100.0,Mild,Delhi,Yes
2,42,Male,101.0,Mild,Delhi,No
3,31,Female,98.0,Mild,Kolkata,No
4,65,Female,101.0,Mild,Mumbai,No
...,...,...,...,...,...,...
95,12,Female,104.0,Mild,Bangalore,No
96,51,Female,101.0,Strong,Kolkata,Yes
97,20,Female,101.0,Mild,Bangalore,No
98,5,Female,98.0,Strong,Mumbai,No


In [4]:
df.isnull().sum()

age           0
gender        0
fever        10
cough         0
city          0
has_covid     0
dtype: int64

In [5]:
from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(df.drop(
    columns = ['has_covid']),df['has_covid'],test_size =0.2)

In [6]:
x_train

Unnamed: 0,age,gender,fever,cough,city
45,72,Male,99.0,Mild,Bangalore
9,64,Female,101.0,Mild,Delhi
13,64,Male,102.0,Mild,Bangalore
47,18,Female,104.0,Mild,Bangalore
96,51,Female,101.0,Strong,Kolkata
...,...,...,...,...,...
50,19,Male,101.0,Mild,Delhi
31,83,Male,103.0,Mild,Kolkata
57,49,Female,99.0,Strong,Bangalore
77,8,Female,101.0,Mild,Kolkata


In [7]:
# Manually type output

In [8]:
# Step 1

In [9]:
# adding simple imputer to fever column to remove missing values
si = SimpleImputer(strategy = "mean")
x_train_fever = si.fit_transform(x_train[['fever']])
#also the test data
x_test_fever = si.fit_transform(x_test[['fever']])
x_train_fever.shape

(80, 1)

In [10]:
# Step 2

In [11]:
#Ordinal Encoding Cough
oe = OrdinalEncoder(categories=[['Mild', 'Strong']])  
x_train_cough = oe.fit_transform(x_train [['cough']])

x_train_cough.shape
                                

(80, 1)

In [12]:
# OnehotEncoding --> Gender city
ohe = OneHotEncoder(drop = 'first',sparse_output = False)
x_train_gender_city = ohe.fit_transform(x_train[['gender','city']])
#also the test data
x_test_gender_city = ohe.fit_transform(x_test[['gender', 'city']])
x_train_gender_city.shape

(80, 4)

In [13]:
# Extracting Age

x_train_age = x_train.drop(columns=['gender', 'fever', 'cough', 'city']).values

# also the test data
x_test_age = x_test.drop(columns=['gender', 'fever', 'cough', 'city']).values


In [14]:
x_train_age.shape

(80, 1)

In [15]:
x_train_transformed = np.concatenate((x_train_age,x_train_fever,
                                    x_train_gender_city,
                                    x_train_cough), axis =1)


In [16]:
# x_test_transformed = np.concatenate((x_test_age, x_test_fever, x_test_gender_city, x_test_cough))

In [17]:
x_train_transformed.shape

(80, 7)

# By help of column transformer

In [18]:
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder

transformer = ColumnTransformer(transformers=[ 
    ('tnfl', SimpleImputer(strategy="mean"), ['fever']),  # Filling missing values with mean
    ("tnf2", OrdinalEncoder(categories=[["Mild", "Strong"]]), ['cough']),  # Encoding categorical data
    ("tnf3", OneHotEncoder(sparse_output=False, drop="first"), ["gender", "city"])  # One-hot encoding gender & city
], remainder='passthrough')  # Keeping other columns unchanged


In [19]:
# Lets Understand What Happening Here! 

In [20]:
transformer.fit_transform(x_train).shape

(80, 7)

In [21]:
transformer.transform(x_test).shape

(20, 7)