# feature Encoding

## 1. One hot encoding

### 
One-hot encoding is a technique used to represent categorical variables as binary vectors. Each category is represented as a binary vector with a length equal to the number of unique categories in the original variable. The vector has a 1 in the position corresponding to the category and 0s elsewhere.

In [2]:
import pandas as pd
# Sample data
data = {'Color': ['Red', 'Green', 'Blue', 'Red']}
df = pd.DataFrame(data)
print(df)


   Color
0    Red
1  Green
2   Blue
3    Red


In [3]:
# One-Hot Encoding
encoded_data = pd.get_dummies(df, columns=['Color']) 
print(encoded_data)
# 'get_dummies' is a function specifically designed to convert categorical variable(s) into dummy/indicator variables.

   Color_Blue  Color_Green  Color_Red
0       False        False       True
1       False         True      False
2        True        False      False
3       False        False       True


## 2. Label Encoding

### In machine learning, LabelEncoder is commonly used to convert categorical labels into numerical labels. It assigns a unique integer to each category, essentially transforming categorical data into numerical data that algorithms can work with more effectively.


In [9]:
# Import the LabelEncoder class from the sklearn.preprocessing module
from sklearn.preprocessing import LabelEncoder

# Sample data
data = {'Animal': ['Dog', 'Cat', 'Bird', 'Dog', "Bird"]}
df = pd.DataFrame(data)
print(df)

# Label Encoding
label_encoder = LabelEncoder()
df['Animal_encoded'] = label_encoder.fit_transform(df['Animal'])
print(df)
# Using the fit_transform method of LabelEncoder to encode the 'Animal' column into numerical labels
# and assigning the encoded labels to a new column 'Animal_encoded' in the DataFrame df


  Animal
0    Dog
1    Cat
2   Bird
3    Dog
4   Bird
  Animal  Animal_encoded
0    Dog               2
1    Cat               1
2   Bird               0
3    Dog               2
4   Bird               0


## Ordinal Encoding

### Ordinal encoding is a method used to convert categorical variables into numerical values while preserving the order or hierarchy among the categories. It assigns integers to the categories based on their order.

In [10]:
from sklearn.preprocessing import OrdinalEncoder
# Sample data
data = {'Size': ['Small', 'Medium', 'Large', 'Medium']}
df = pd.DataFrame(data)
print(df)

# Ordinal Encoding
ordinal_encoder = OrdinalEncoder(categories=[['Small', 'Medium', 'Large']])
df['Size_encoded'] = ordinal_encoder.fit_transform(df[['Size']])
print(df)


     Size
0   Small
1  Medium
2   Large
3  Medium
     Size  Size_encoded
0   Small           0.0
1  Medium           1.0
2   Large           2.0
3  Medium           1.0
