# **Column Transformer**

Column Transformer is a library that allows you to transform and manipulate data in a columnar format. 

It is designed to work with large datasets and provides a flexible way to perform various operations on columns of data.

It is designed to work with large datasets and can be used for a variety of tasks such as data cleaning, feature engineering, and data transformation.

Column Transformer is particularly useful when working with datasets that have a large number of columns, as it allows you to perform operations on specific columns without having to load the entire dataset into memory.


In [34]:
import pandas as pd

In [35]:
d = {'sales': [100000,222000,1000000,522000,111111,222222,1111111,20000,75000,90000,1000000,10000], 'city': ['Tampa','Tampa','Orlando','Jacksonville','Miami','Jacksonville','Miami','Miami','Orlando','Orlando','Orlando','Orlando'], 'size': ['Small', 'Medium','Large','Large','Small','Medium','Large','Small','Medium','Medium','Medium','Small',]}

In [36]:
df = pd.DataFrame(data=d)

In [37]:
df

Unnamed: 0,sales,city,size
0,100000,Tampa,Small
1,222000,Tampa,Medium
2,1000000,Orlando,Large
3,522000,Jacksonville,Large
4,111111,Miami,Small
5,222222,Jacksonville,Medium
6,1111111,Miami,Large
7,20000,Miami,Small
8,75000,Orlando,Medium
9,90000,Orlando,Medium


In [38]:
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder

In [39]:
ohe = OneHotEncoder(sparse_output=False)

In [40]:
ode = OrdinalEncoder()

In [41]:
from sklearn.compose import make_column_transformer

In [42]:
# Create a column transformer that applies different transformations to different columns
# The 'city' column will be one-hot encoded
# the 'size' column will be ordinal encoded
# the remainder='passthrough' means that all other columns will be left unchanged
column_transformer = make_column_transformer(
    (ohe, ['city']),
    (ode, ['size']),
    remainder='passthrough'
)


In [43]:
column_transformer.set_output(transform='pandas')

In [44]:
df_pandas = column_transformer.fit_transform(df)

In [45]:
df_pandas

Unnamed: 0,onehotencoder__city_Jacksonville,onehotencoder__city_Miami,onehotencoder__city_Orlando,onehotencoder__city_Tampa,ordinalencoder__size,remainder__sales
0,0.0,0.0,0.0,1.0,2.0,100000
1,0.0,0.0,0.0,1.0,1.0,222000
2,0.0,0.0,1.0,0.0,0.0,1000000
3,1.0,0.0,0.0,0.0,0.0,522000
4,0.0,1.0,0.0,0.0,2.0,111111
5,1.0,0.0,0.0,0.0,1.0,222222
6,0.0,1.0,0.0,0.0,0.0,1111111
7,0.0,1.0,0.0,0.0,2.0,20000
8,0.0,0.0,1.0,0.0,1.0,75000
9,0.0,0.0,1.0,0.0,1.0,90000


In [46]:
# Create a column transformer that applies different transformations to different columns
# The 'city' column will be one-hot encoded
# the 'size' column will be ordinal encoded
# the remainder='drop' means that all other columns will be dropped
column_transformer_2 = make_column_transformer(
    (ohe, [1]),
    (ode, [2]),
    remainder='drop'
)

In [47]:
column_transformer_2.set_output(transform='pandas')

In [48]:
df_pandas_2 = column_transformer_2.fit_transform(df)

In [49]:
df_pandas_2

Unnamed: 0,onehotencoder__city_Jacksonville,onehotencoder__city_Miami,onehotencoder__city_Orlando,onehotencoder__city_Tampa,ordinalencoder__size
0,0.0,0.0,0.0,1.0,2.0
1,0.0,0.0,0.0,1.0,1.0
2,0.0,0.0,1.0,0.0,0.0
3,1.0,0.0,0.0,0.0,0.0
4,0.0,1.0,0.0,0.0,2.0
5,1.0,0.0,0.0,0.0,1.0
6,0.0,1.0,0.0,0.0,0.0
7,0.0,1.0,0.0,0.0,2.0
8,0.0,0.0,1.0,0.0,1.0
9,0.0,0.0,1.0,0.0,1.0


In [50]:
# Create a column transformer that applies different transformations to different columns
# The 'city' column will be one-hot encoded
# ('passthrough', ['size']) means that the 'size' column will be left unchanged
# the remainder='drop' means that all other columns will be dropped
column_transformer_3 = make_column_transformer(
    (ohe, ['city']),
    ('passthrough', ['size']),
    remainder='drop'
)

In [51]:
column_transformer_3.set_output(transform='pandas')

In [53]:
df_pandas_3 = column_transformer_3.fit_transform(df)

In [54]:
df_pandas_3

Unnamed: 0,onehotencoder__city_Jacksonville,onehotencoder__city_Miami,onehotencoder__city_Orlando,onehotencoder__city_Tampa,passthrough__size
0,0.0,0.0,0.0,1.0,Small
1,0.0,0.0,0.0,1.0,Medium
2,0.0,0.0,1.0,0.0,Large
3,1.0,0.0,0.0,0.0,Large
4,0.0,1.0,0.0,0.0,Small
5,1.0,0.0,0.0,0.0,Medium
6,0.0,1.0,0.0,0.0,Large
7,0.0,1.0,0.0,0.0,Small
8,0.0,0.0,1.0,0.0,Medium
9,0.0,0.0,1.0,0.0,Medium
