# What is categorical data?
Categorical data refers to a type of data that represents categories or labels and cannot be measured in numerical form. Categorical data is often used to group items into discrete classes.

# Why to care about encoding it?
Categorical data, being non-numeric, needs to be converted into a numerical format for some Machine Learning algorithms to process and make predictions. And also numerical data is often more efficiently processed by machine learning algorithms compared to categorical data.

# Technique for encoding categorical data
ONE HOT ENCODING:

One-Hot Encoding is a popular technique for handling categorical data, especially when the categories don't have an inherent order.In this method, we map each category to a vector that contains 1 and 0 denoting the presence or absence of the feature.

# Implementation in python

In [1]:
#Import the neccessary libraries
import pandas as pd
import seaborn as sns

In [2]:
#Loading the dataset (Let's use a pre-existing dataset from seaborn library, the 'titanic' dataset.)
titanic = sns.load_dataset('titanic')

In [3]:
# Selecting a few columns for demonstration purpose.
titanic = titanic[['sex', 'embark_town', 'alone','survived']]

In [4]:
# Perform one-hot encoding
titanic_encoded = pd.get_dummies(titanic, columns=['sex', 'embark_town', 'alone'])

In [5]:
print(titanic_encoded.head())

   sex_female  sex_male  embark_town_Cherbourg  embark_town_Queenstown  \
0           0         1                      0                       0   
1           1         0                      1                       0   
2           1         0                      0                       0   
3           1         0                      0                       0   
4           0         1                      0                       0   

   embark_town_Southampton  alone_False  alone_True  
0                        1            1           0  
1                        0            1           0  
2                        1            0           1  
3                        1            1           0  
4                        1            0           1  


As you can see it creates new columns for each unique value in the 'sex', 'embark_town', and 'alone' columns. A row will have a 1 in the column for its category and 0 in the others.

# NOTE:
One-hot encoding can significantly increase the dimensionality of the dataset if the categorical variable has many unique values. This can lead to an increase in memory and computational requirements, and potentially degrade the performance of the model.