## Nominal/OHE Encoding

### One Hot Encoding, also known as nominal encoding, is a technique used to represent categorical data as numerical data, which is more suitable for machine learning algorithms. In this technique, each category is represented as a binary vector where each bit corresponds to a unique category. 

### Example : 
### 1. Red = [1,0,0]
### 2. Green = [0,1,0]
### 3. Blue = [0,0,1]

In [3]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Creating a simple dataframe
df = pd.DataFrame({
    'color' : ['red', 'blue', 'green', 'green', 'red', 'blue']
})

df.head()

Unnamed: 0,color
0,red
1,blue
2,green
3,green
4,red


In [4]:
# Creating an instance of OHE

encoder = OneHotEncoder()

In [10]:
# Performing fit and then transform

encoded = encoder.fit_transform(df[['color']]).toarray()
encoded

array([[0., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.]])

In [12]:
encoded_df = pd.DataFrame(encoded, columns = encoder.get_feature_names_out())
encoded_df

Unnamed: 0,color_blue,color_green,color_red
0,0.0,0.0,1.0
1,1.0,0.0,0.0
2,0.0,1.0,0.0
3,0.0,1.0,0.0
4,0.0,0.0,1.0
5,1.0,0.0,0.0


In [13]:
# testing for new data
encoder.transform([['blue']]).toarray()



array([[1., 0., 0.]])

In [14]:
pd.concat([df, encoded_df], axis = 1)

Unnamed: 0,color,color_blue,color_green,color_red
0,red,0.0,0.0,1.0
1,blue,1.0,0.0,0.0
2,green,0.0,1.0,0.0
3,green,0.0,1.0,0.0
4,red,0.0,0.0,1.0
5,blue,1.0,0.0,0.0
