# Contents
1. [What is Encoding?](#What-is-Encoding?)
2. [Coding Practice for Encoding](#Coding-Practice-for-Encoding)

# What is Encoding?

    Machine learning models only work with numeric values. For this reason, categorical features need to be converted into numeric values. 

 #### Label Encoding
    - It matches each categorical value with a unique integer.
    - It is a simple and fast method.
    - However, it assumes an ordered relationship between categories, which may not always be true.

#### Ordinal Encoding
    - It is used when categorical values are ordered.
    - Categories are encoded with integers according to their order.
    - For example, encoding the categories "low", "medium", "high" as 1, 2, 3 respectively.

#### One-Hot Encoding:
    - It converts each categorical value into binary columns.
    - Each column indicates whether a specific category exists or not.
    - It does not assume an ordered relationship and is generally suitable for features with many categorical values.




## Examples

**Label Encoding**

    Label encoding assigns each unique category in a feature an integer value. This method is simple and efficient but can introduce a misleading ordinal relationship.

**Example:**

    Let’s say we have a categorical feature representing fruit types:

In [74]:
['apple', 'banana', 'cherry', 'apple', 'banana', 'cherry']

['apple', 'banana', 'cherry', 'apple', 'banana', 'cherry']

In [None]:
#after label encoding
['apple', 'banana', 'cherry'] -> [0, 1, 2]
['apple', 'banana', 'cherry', 'apple', 'banana', 'cherry'] -> [0, 1, 2, 0, 1, 2]

**Ordinal Encoding**

    Ordinal encoding is used when the categories have a clear order or ranking. Each category is assigned a numerical value based on this order.

**Example:**

Consider a feature representing education levels:

In [None]:
['High School', 'Bachelor', 'Master', 'PhD']

In [None]:
#after ordinal encoding
['High School', 'Bachelor', 'Master', 'PhD'] -> [0, 1, 2, 3]


**One Hot Encoding**

    One-hot encoding creates a new binary column for each category in the feature. Each observation is represented with a 1 in the column corresponding to its category and 0s in all other columns.

**Example:**

    For the same fruit types example:

In [None]:
['apple', 'banana', 'cherry']

In [None]:
#after one hot encoding

apple  banana  cherry
  1       0       0
  0       1       0
  0       0       1
  1       0       0
  0       1       0
  0       0       1

# Coding Practice for Encoding

In [76]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


In [77]:
data = pd.read_csv("/Users/data/Desktop/encoded_dataset1.csv")
data

Unnamed: 0,Fruit,Color,Size,Shape
0,Cherry,Green,Large,Round
1,Date,Yellow,X-Large,Round
2,Apple,Yellow,Large,Elongated
3,Cherry,Blue,Small,Oval
4,Cherry,Yellow,X-Large,Oval
...,...,...,...,...
95,Banana,Yellow,X-Large,Square
96,Banana,Yellow,Medium,Square
97,Date,Blue,Large,Oval
98,Banana,Red,Small,Square


In [78]:
#For fruit feature: One hot encoding
#For color feature: Label encoding
#For size feature: Ordinal encoding
#For shape feature: One hot encoding

In [79]:
#one hot encoding

fruit = pd.get_dummies(data['Fruit'], prefix='Fruit')
data = pd.concat([data, fruit], axis=1)


data = data.drop('Fruit', axis=1)


print(data.head())

    Color     Size      Shape  Fruit_Apple  Fruit_Banana  Fruit_Cherry  \
0   Green    Large      Round        False         False          True   
1  Yellow  X-Large      Round        False         False         False   
2  Yellow    Large  Elongated         True         False         False   
3    Blue    Small       Oval        False         False          True   
4  Yellow  X-Large       Oval        False         False          True   

   Fruit_Date  
0       False  
1        True  
2       False  
3       False  
4       False  


In [80]:
# label encoding

label_encoder = LabelEncoder()


data['Color_Label'] = label_encoder.fit_transform(data['Color'])


print(data.head())

    Color     Size      Shape  Fruit_Apple  Fruit_Banana  Fruit_Cherry  \
0   Green    Large      Round        False         False          True   
1  Yellow  X-Large      Round        False         False         False   
2  Yellow    Large  Elongated         True         False         False   
3    Blue    Small       Oval        False         False          True   
4  Yellow  X-Large       Oval        False         False          True   

   Fruit_Date  Color_Label  
0       False            1  
1        True            3  
2       False            3  
3       False            0  
4       False            3  


In [81]:
print(label_encoder.classes_)
print(label_encoder.transform(label_encoder.classes_))


['Blue' 'Green' 'Red' 'Yellow']
[0 1 2 3]


In [82]:
# Ordinal Encoding

# creating size mapping

size_mapping = {'Small': 1, 'Medium': 2, 'Large': 3, 'X-Large': 4}

In [83]:
data['Size_Ordinal'] = data['Size'].map(size_mapping)
print(data.head())
#drop size column

data = data.drop(['Size'],axis=1)
data.head()

    Color     Size      Shape  Fruit_Apple  Fruit_Banana  Fruit_Cherry  \
0   Green    Large      Round        False         False          True   
1  Yellow  X-Large      Round        False         False         False   
2  Yellow    Large  Elongated         True         False         False   
3    Blue    Small       Oval        False         False          True   
4  Yellow  X-Large       Oval        False         False          True   

   Fruit_Date  Color_Label  Size_Ordinal  
0       False            1             3  
1        True            3             4  
2       False            3             3  
3       False            0             1  
4       False            3             4  


Unnamed: 0,Color,Shape,Fruit_Apple,Fruit_Banana,Fruit_Cherry,Fruit_Date,Color_Label,Size_Ordinal
0,Green,Round,False,False,True,False,1,3
1,Yellow,Round,False,False,False,True,3,4
2,Yellow,Elongated,True,False,False,False,3,3
3,Blue,Oval,False,False,True,False,0,1
4,Yellow,Oval,False,False,True,False,3,4


In [84]:
#one hot encoding for shape

shape = pd.get_dummies(data['Shape'], prefix='Shape')
data = pd.concat([data, shape], axis=1)


data = data.drop('Shape', axis=1)


print(data.head())

    Color  Fruit_Apple  Fruit_Banana  Fruit_Cherry  Fruit_Date  Color_Label  \
0   Green        False         False          True       False            1   
1  Yellow        False         False         False        True            3   
2  Yellow         True         False         False       False            3   
3    Blue        False         False          True       False            0   
4  Yellow        False         False          True       False            3   

   Size_Ordinal  Shape_Elongated  Shape_Oval  Shape_Round  Shape_Square  
0             3            False       False         True         False  
1             4            False       False         True         False  
2             3             True       False        False         False  
3             1            False        True        False         False  
4             4            False        True        False         False  


In [85]:
data.dtypes

Color              object
Fruit_Apple          bool
Fruit_Banana         bool
Fruit_Cherry         bool
Fruit_Date           bool
Color_Label         int64
Size_Ordinal        int64
Shape_Elongated      bool
Shape_Oval           bool
Shape_Round          bool
Shape_Square         bool
dtype: object

All features has been encoded.