## Data Encoding

1. Nominal/OHE Encoding
2. Label and Ordinal Encoding
3. Target Guided Ordinal Encoding 

### Nominal/OHE Encoding
One hot encoding, also known as nominal encoding, is a technique used to represent categorical data as numerical data, which is more suitable for machine learning algorithms. In this technique, each category is represented as a binary vector where each bit corresponds to a unique category. For example, if we have a categorical variable "color" with three possible values (red, green, blue), we can represent it using one hot encoding as follows:

1. Red: [1, 0, 0]
2. Green: [0, 1, 0]
3. Blue: [0, 0, 1]

In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

In [2]:
## Create a simple dataframe 
df = pd.DataFrame({
    'color': ['red', 'blue', 'green', 'green', 'red', 'blue']
})

In [None]:
df.head()

In [8]:
##create an instance of Onehotencoder
encoder=OneHotEncoder()

In [7]:
## perform fit and transform
encoded=encoder.fit_transform(df[['color']]).toarray()

In [5]:
import pandas as pd
encoder_df=pd.DataFrame(encoded,columns=encoder.get_feature_names_out())

In [None]:
encoder_df

In [None]:
## for new data
encoder.transform([['blue']]).toarray()

In [None]:
pd.concat([df,encoder_df],axis=1)

In [None]:
# Assignment
import seaborn as sns
sns.load_dataset('tips')

### Label Encoding 
Label encoding and ordinal encoding are two techniques used to encode categorical data as numerical data.

Label encoding involves assigning a unique numerical label to each category in the variable. The labels are usually assigned in alphabetical order or based on the frequency of the categories. For example, if we have a categorical variable "color" with three possible values (red, green, blue), we can represent it using label encoding as follows:

1. Red: 1
2. Green: 2
3. Blue: 3

In [None]:
df.head()

In [14]:
from sklearn.preprocessing import LabelEncoder
lbl_encoder=LabelEncoder()

In [None]:
lbl_encoder.fit_transform(df[['color']])

In [None]:
lbl_encoder.transform([['red']])

In [None]:
lbl_encoder.transform([['blue']])

In [None]:
lbl_encoder.transform([['green']])

### Ordinal Encoding
It is used to encode categorical data that have an intrinsic order or ranking. In this technique, each category is assigned a numerical value based on its position in the order. For example, if we have a categorical variable "education level" with four possible values (high school, college, graduate, post-graduate), we can represent it using ordinal encoding as follows:

1. High school: 1
2. College: 2
3. Graduate: 3
4. Post-graduate: 4

In [19]:
## ORdinal Encoding
from sklearn.preprocessing import OrdinalEncoder

In [20]:
# create a sample dataframe with an ordinal variable
df = pd.DataFrame({
    'size': ['small', 'medium', 'large', 'medium', 'small', 'large']
})

In [None]:
df

In [22]:
## create an instance of ORdinalEncoder and then fit_transform
encoder=OrdinalEncoder(categories=[['small','medium','large']])

In [None]:
encoder.fit_transform(df[['size']])

In [None]:
encoder.transform([['small']])