# Encoding / Dummy Variable Creation
- This categorical data encoding method transforms the categorical variable into a set of binary variables.
- Only applied on Nominal & Ordinal, because continuous already in numerical.
- No need to change the Target variable to dummy variables, because it can take care by the algorithm itself.
- Example:
  - Nominal : GEnder, Martial status, Country...etc **One Hot Encoding**
  - Ordinal : Grade, Feedback, Shirt size.....etc **Ordinal Encoding**


## One Hot Encoding

In [1]:
import pandas as pd

In [69]:
df = pd.read_csv(r"C:\Users\91830\Downloads\Data Science  Course\Machine Learning\Tabular Data\encode (1).csv")

In [70]:
df

Unnamed: 0,ID,Name,Gender,Age,Country,Purchased,Customer Satisfaction
0,1,Alice,Female,23,USA,Yes,Very Satisfied
1,2,Bob,Male,35,Canada,No,Neutral
2,3,Charlie,Male,45,Australia,Yes,Satisfied
3,4,Diana,Female,28,UK,No,Dissatisfied
4,5,Ethan,Male,30,USA,Yes,Very Satisfied
5,6,Fiona,Female,22,Canada,No,Very Dissatisfied
6,7,George,Male,29,Australia,Yes,Satisfied
7,8,Hannah,Female,33,UK,No,Neutral
8,9,Ian,Male,40,USA,Yes,Very Satisfied
9,10,Jessica,Female,27,Canada,No,Dissatisfied


In [4]:
df.drop(["ID",'Name'],axis=1,inplace=True)

In [6]:
from sklearn.preprocessing import OneHotEncoder

In [36]:
ohe = OneHotEncoder(drop='first',
                    dtype=int,
                    sparse_output=False,
                    handle_unknown='ignore') # to solve the MULTICOLINEARITY PROBLEM that we face in ML.

In [40]:
ohe.fit_transform(df[["Gender"]])

array([[0],
       [1],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0]])

In [41]:
ohe.categories_

[array(['Female', 'Male'], dtype=object)]

In [43]:
import warnings
warnings.filterwarnings('ignore')

In [17]:
pd.DataFrame(ohe.fit_transform(df[["Gender"]]).toarray(),columns=ohe.get_feature_names_out(),dtype=int)

Unnamed: 0,Gender_Male
0,0
1,1
2,1
3,0
4,1
5,0
6,1
7,0
8,1
9,0


In [44]:
df = pd.read_csv(r"C:\Users\91830\Downloads\Data Science  Course\Machine Learning\Tabular Data\Car Brand.csv")

In [45]:
df

Unnamed: 0,Make,Model,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,BMW,1 Series M,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46120
11910,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56670
11911,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50620
11912,Acura,ZDX,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50920


In [65]:
ohe = OneHotEncoder(dtype=int,max_categories=10)

In [67]:
ohe = OneHotEncoder(dtype=int,min_frequency=1000)

In [68]:
pd.DataFrame(ohe.fit_transform(df[['Make']]).toarray(),columns=ohe.get_feature_names_out())

Unnamed: 0,Make_Chevrolet,Make_infrequent_sklearn
0,0,1
1,0,1
2,0,1
3,0,1
4,0,1
...,...,...
11909,0,1
11910,0,1
11911,0,1
11912,0,1


## Ordinal Encoding

In [71]:
df = pd.read_csv(r"C:\Users\91830\Downloads\Data Science  Course\Machine Learning\Tabular Data\encode (1).csv")

In [76]:
df['Customer Satisfaction'] # it should be in order and ascending.

0       Very Satisfied
1              Neutral
2            Satisfied
3         Dissatisfied
4       Very Satisfied
5    Very Dissatisfied
6            Satisfied
7              Neutral
8       Very Satisfied
9         Dissatisfied
Name: Customer Satisfaction, dtype: object

In [73]:
from sklearn.preprocessing import OrdinalEncoder

In [82]:
oe = OrdinalEncoder(categories=[['Very Dissatisfied','Dissatisfied','Neutral',"Satisfied",'Very Satisfied']])

In [83]:
oe.fit_transform(df[['Customer Satisfaction']])

array([[4.],
       [2.],
       [3.],
       [1.],
       [4.],
       [0.],
       [3.],
       [2.],
       [4.],
       [1.]])