# ColumnTransformer in Scikit-learn
This notebook demonstrates the use of **`ColumnTransformer`** from `sklearn.compose` to apply different preprocessing steps to different columns in a dataset.

### Key Concepts:
- Encoding categorical features
- Scaling numerical features
- Combining transformers using `ColumnTransformer`
- Integrating into a machine learning pipeline

In [2]:
import numpy as np
import pandas as pd

In [6]:
import sklearn

In [8]:
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder

In [20]:
df = pd.read_csv('./Dataset/random_purchase.csv')

In [22]:
df.sample(5)

Unnamed: 0,Age,City,Gender,Review,Purchase
199,15,Bangalore,Male,Better,No
8,15,Kolkata,Female,Good,No
78,25,Kolkata,Male,Better,No
183,9,Mumbai,Male,Good,No
170,21,Kolkata,Female,Better,No


In [48]:
df['City'].value_counts(),df['Review'].value_counts()

(City
 Kolkata      66
 Bangalore    47
 Mumbai       47
 Delhi        40
 Name: count, dtype: int64,
 Review
 Good      81
 Better    79
 Best      40
 Name: count, dtype: int64)

In [46]:
df.isnull().sum()

Age         0
City        0
Gender      0
Review      0
Purchase    0
dtype: int64

In [30]:
from sklearn.model_selection import train_test_split

In [34]:
X_train, X_test, y_train, y_test = train_test_split(df.drop('Purchase', axis = 1), 
                                                    df['Purchase'], 
                                                    test_size = 0.2,
                                                    random_state = 0
                                                   )

In [38]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((160, 4), (40, 4), (160,), (40,))

In [68]:
df.head()

Unnamed: 0,Age,City,Gender,Review,Purchase
0,11,Bangalore,Female,Best,No
1,24,Kolkata,Female,Better,No
2,19,Kolkata,Female,Best,Yes
3,15,Mumbai,Male,Better,Yes
4,12,Mumbai,Female,Good,Yes


In [52]:
from sklearn.compose import ColumnTransformer

In [70]:
transformer = ColumnTransformer(transformers=[
    ('tnf1', OneHotEncoder(sparse_output=False, drop='first'), ['City', 'Gender']),
    ('tnf2', OrdinalEncoder(categories=[['Good', 'Better', 'Best']]), ['Review']),
    # ('tnf3', OrdinalEncoder(categories=[['No', 'Yes']]), ['Purchase'])
], remainder='passthrough')

In [72]:
transformer.fit_transform(X_train)

array([[ 0.,  0.,  1.,  1.,  0., 17.],
       [ 0.,  1.,  0.,  0.,  2.,  8.],
       [ 0.,  1.,  0.,  0.,  1., 19.],
       [ 0.,  0.,  1.,  0.,  0.,  7.],
       [ 0.,  0.,  0.,  0.,  1., 24.],
       [ 0.,  0.,  0.,  1.,  1., 16.],
       [ 0.,  1.,  0.,  0.,  0., 15.],
       [ 0.,  0.,  0.,  0.,  0., 17.],
       [ 0.,  0.,  0.,  1.,  0.,  5.],
       [ 0.,  0.,  1.,  1.,  0., 17.],
       [ 0.,  1.,  0.,  0.,  1., 18.],
       [ 1.,  0.,  0.,  0.,  2., 19.],
       [ 1.,  0.,  0.,  0.,  1.,  5.],
       [ 0.,  1.,  0.,  1.,  2., 14.],
       [ 0.,  0.,  1.,  0.,  2., 15.],
       [ 0.,  0.,  1.,  0.,  1., 14.],
       [ 0.,  1.,  0.,  0.,  1., 24.],
       [ 0.,  1.,  0.,  1.,  1., 13.],
       [ 1.,  0.,  0.,  0.,  0., 18.],
       [ 0.,  0.,  1.,  0.,  2., 12.],
       [ 0.,  0.,  0.,  0.,  0.,  8.],
       [ 0.,  0.,  0.,  0.,  1.,  9.],
       [ 0.,  1.,  0.,  0.,  0.,  6.],
       [ 0.,  0.,  0.,  1.,  2., 25.],
       [ 0.,  1.,  0.,  0.,  1., 24.],
       [ 1.,  0.,  0.,  1

In [74]:
transformer.fit_transform(X_train).shape

(160, 6)

In [76]:
transformer.transform(X_test)

array([[ 0.,  0.,  0.,  0.,  0.,  6.],
       [ 0.,  1.,  0.,  0.,  1., 21.],
       [ 0.,  1.,  0.,  0.,  2.,  9.],
       [ 0.,  0.,  1.,  0.,  1., 11.],
       [ 0.,  0.,  0.,  1.,  0.,  9.],
       [ 1.,  0.,  0.,  0.,  0.,  6.],
       [ 0.,  1.,  0.,  1.,  0., 25.],
       [ 0.,  1.,  0.,  0.,  1., 16.],
       [ 0.,  1.,  0.,  0.,  0., 12.],
       [ 0.,  1.,  0.,  0.,  1.,  5.],
       [ 0.,  0.,  1.,  0.,  2.,  8.],
       [ 1.,  0.,  0.,  1.,  0., 14.],
       [ 0.,  0.,  0.,  0.,  1., 21.],
       [ 1.,  0.,  0.,  1.,  1., 17.],
       [ 0.,  1.,  0.,  0.,  2., 13.],
       [ 1.,  0.,  0.,  1.,  0., 23.],
       [ 0.,  1.,  0.,  0.,  0., 23.],
       [ 0.,  1.,  0.,  1.,  1., 24.],
       [ 0.,  0.,  0.,  0.,  0., 11.],
       [ 0.,  1.,  0.,  0.,  2., 20.],
       [ 0.,  0.,  1.,  1.,  0.,  9.],
       [ 0.,  0.,  0.,  0.,  1., 10.],
       [ 1.,  0.,  0.,  1.,  2., 24.],
       [ 0.,  0.,  0.,  0.,  2., 12.],
       [ 0.,  0.,  0.,  1.,  2., 14.],
       [ 1.,  0.,  0.,  1

In [78]:
transformer.transform(X_test).shape

(40, 6)

## 📌 Conclusion
`ColumnTransformer` is a powerful and efficient way to preprocess heterogeneous data types in one unified pipeline. It improves modularity, readability, and makes your code more production-ready.