In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
df= sns.load_dataset(name="tips")

In [None]:
df.head()

### One-hot encoding

<font size=4>
One-hot encoding is a technique used in machine learning to convert categorical variables into a format that machine learning models can understand.

<font size=4>
Machine learning models typically work with numerical data. One-hot encoding addresses this by creating a new binary variable for each unique category from the original categorical variable.
<br>These binary variables for each unique category are called dummy variables.
<br> In each new dummy variable, only one position will have a value of 1, indicating the presence of that particular category, and all other positions will be 0.

<font size = 4>
    
**When to use One-hot encoding?**<br>
One-hot encoding must be used for a nominal categorical variable ( a categorical variable with no inherent order or rank between the categories).

**Disadvantages of One-hot encoding-**<br>
Dummy variable trap: A categorical variable having many categories can lead to the creation of large no of columns, which can impact a machine learning model's performance. (Curse of Dimensionality)

**Note**
After creating dummy variables for n unique categories of the categorical variable, drop one of the n dummy variables. Thus, leading to (n-1) dummy variables

In [None]:
df.dtypes

We have 4 categorical variables, namely 'sex','smoker','day','time'

In [None]:
df.sex.unique() # sex is nominal

In [None]:
df.smoker.unique() # smoker is nominal

In [None]:
df.day.unique() # This is ordinal

In [None]:
df.time.unique() # This is also ordinal

We can apply one-hot encoding to 'sex' and 'smoker' as they are nominal categorical variables

In [None]:
df_dummy = pd.get_dummies(df,columns=['sex','smoker']) # We can also give dtype="int"
df_dummy.head()

In [None]:
df_dummy.dtypes 

To avoid the dummy-variable trap , let us drop one of the columns for each categorical variable

In [None]:
df_dummy.drop(columns=["sex_Female","smoker_No"],inplace=True)
df_dummy.head()

In [None]:
df = df_dummy
df.head()

### Label Encoding

- Label encoding involves assigning numbers to each unique category in the categorical variable.
- Unlike one hot encoding, it does not introduce new columns.

In [None]:
from sklearn.preprocessing import LabelEncoder
le= LabelEncoder()

In [None]:
df['day'] = le.fit_transform(df.day)
df['time'] = le.fit_transform(df.time)
df

In [None]:
df.day.unique()

In [None]:
df.time.unique()

In [None]:
df.dtypes

### Task

<font size=3> Perform categorical encoding for the titanic dataset from the seaborn library.

In [None]:
sns.load_dataset("titanic")