# **Label Encoding**
**Many a times, our dataset contains columns with string values, and we can't provide string data to our ML model. That's why we need to convert our string data into numeric data.
Two types of variables are observed -** 
## Nominal variables - 
These variables don't follow any order for eg. - Male | Female , Car | Bus | Scooter. These variables don't have any relation between them. We can't decide that male will get label 1 and female will get label 0, because due that the model will assign an order to them which will affect the accuracy.

![](https://d1m75rqqgidzqn.cloudfront.net/wp-data/2020/08/11155757/image-37.png)

## Ordinal variables - 
On the other side, ordinal variables follow an order such as Good | Better | Best , Low | Medium | High. These variables can be assigned ordered labels such as - 
Good = 0
Better = 1
Best = 2

![](https://lh3.googleusercontent.com/kfmOfJOQERCTyAvaDRgMfA4GYUhcP9VQnO5q2MeCIqBANJhoiMHHf_XdDk-fMtIC9iqqFEuNLeKESykvCsDxhkUmmBHmNLvEkZaO4tAMKKx7A37zK96pGpusdk95lOOchxmYkVa99FiOwCdB7w)

In [None]:
import pandas as pd
df = pd.read_csv('../input/diamonds/diamonds.csv')
df.head()

In [None]:
df['cut'].unique()

## Ordinal Encoding

In [None]:
cut_labels = {'Ideal':0, 
               'Fair':1, 
               'Good':2,
               'Very Good':3,
               'Premium':4}

In [None]:
df['cut'] = df['cut'].map(color_labels)
df

## Label Encoding

In [None]:
df['color'].unique()

In [None]:
df['clarity'].unique()

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

In [None]:
df['color'] = le.fit_transform(df['color'])
df['clarity'] = le.fit_transform(df['clarity'])

df

# **One Hot🔥 Encoding**
**A one hot encoding allows the representation of categorical data to be more expressive. Many machine learning algorithms cannot work with categorical data directly. The categories must be converted into numbers. This is required for both input and output variables that are categorical.**

**For each unique variable, a separate column will be generated. 1 and 0 will be filled depending on the fact whether that variable is there in the original column at that place or not.**

![](http://miro.medium.com/max/1400/1*ggtP4a5YaRx6l09KQaYOnw.png)

In [None]:
df = pd.read_csv('../input/diamonds/diamonds.csv')
df.head()

In [None]:
dummies = pd.get_dummies(df['color'])
dummies

In [None]:
df = pd.concat([df, dummies], axis = 1)
df

In [None]:
df.drop(['color'], axis = 1, inplace = True)

In [None]:
df