#### ***One Hot Encoding is a technique to convert categorical variables into binary (0/1) columns. Instead of giving numbers like 0,1,2 (Label Encoding), one-hot encoder creates separate columns for each category.***      

##### **`Dummy Variable: `** ***A dummy variable is a binary (0 or 1) variable that represents the presence of a categorical value.***    
##### **`Dummy Variable Trap: `** ***When we use One Hot Encoding, we create N new columns for N categories. But actually, only N-1 columns are needed. This is called Dummy Variable Trap.***
- ***Because the last column can always be inferred from the others.***   
- ***Example: If Toyota=0 and Hyundai=0, then it must be Tesla=1.***     
##### **`Multicollinearity Problem: `** ***Multicollinearity means high correlation among features.***
- ***In One Hot Encoding, if we keep all dummy variables, one column will always be perfectly correlated with the others.***
- ***Example: Toyota + Hyundai + Tesla = 1 (always true).***
- ***This makes some ML models (like Linear Regression) unstable because they cannot decide which feature is more important.***

In [33]:
import pandas as pd

In [39]:
df = pd.read_csv("car_dataset.csv")
df.head()

Unnamed: 0,brand,km_driven,selling_price
0,Tesla,138767,1796176
1,Toyota,129375,2186013
2,Tesla,184262,4934096
3,Hyundai,141330,2956138
4,Toyota,44504,3660011


In [35]:
one_hot_encoder = pd.get_dummies(df["brand"]).astype(int)
one_hot_encoder.head()

Unnamed: 0,Hyundai,Tesla,Toyota
0,0,1,0
1,0,0,1
2,0,1,0
3,1,0,0
4,0,0,1


In [36]:
one_hot_encoder = pd.get_dummies(df["brand"], drop_first=True).astype(int)
one_hot_encoder.head()

Unnamed: 0,Tesla,Toyota
0,1,0
1,0,1
2,1,0
3,0,0
4,0,1


In [37]:
df = df.drop('brand', axis=1)
df.head()

Unnamed: 0,km_driven,selling_price
0,138767,1796176
1,129375,2186013
2,184262,4934096
3,141330,2956138
4,44504,3660011


In [38]:
df = pd.concat([one_hot_encoder, df],axis=1)
df.head()

Unnamed: 0,Tesla,Toyota,km_driven,selling_price
0,1,0,138767,1796176
1,0,1,129375,2186013
2,1,0,184262,4934096
3,0,0,141330,2956138
4,0,1,44504,3660011
