### One-Hot Encoding Explained
One-hot encoding converts categorical variables into a numerical format that can be used by machine learning algorithms.
For each unique category in the original feature, a new binary column is created: If a category is present in a row, the corresponding column is marked with a 1; otherwise, it's marked with a 0.

In [1]:
# One Hot Encoding

In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame(
    {
        'color' : ['red', 'blue', 'green', 'green', 'red', 'blue']
    }
)
df

Unnamed: 0,color
0,red
1,blue
2,green
3,green
4,red
5,blue


In [7]:
# create an instance of OneHotEncoder
encoder = OneHotEncoder()

encoded = encoder.fit_transform(df[['color']])

### Step-by-Step Explanation
Importing OneHotEncoder: Ensure you have the OneHotEncoder class imported from the sklearn.preprocessing module:
<br> Creating an Instance: This line creates an instance of the OneHotEncoder class. The encoder object will be used to fit and transform the categorical data into a one-hot encoded format.
<br> Fitting and Transforming: This line applies the fit_transform method to the color column of your DataFrame df. Here's what happens during this step:
Fitting: The encoder analyzes the unique categories (or levels) present in the color column of the DataFrame. For example, if df['color'] contains values like "red", "blue", and "green", the encoder identifies these three unique categories.
Transforming: The encoder then transforms the categorical data into a one-hot encoded format. Each unique category will be converted into a binary column (0s and 1s).

In [5]:
encoder.fit_transform(df[['color']]).toarray()

array([[0., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.]])

In [13]:
encoded_df = pd.DataFrame(encoded.toarray(), columns = encoder.get_feature_names_out())
encoded_df

Unnamed: 0,color_blue,color_green,color_red
0,0.0,0.0,1.0
1,1.0,0.0,0.0
2,0.0,1.0,0.0
3,0.0,1.0,0.0
4,0.0,0.0,1.0
5,1.0,0.0,0.0


In [15]:
pd.concat([df, encoded_df], axis = 1)

Unnamed: 0,color,color_blue,color_green,color_red
0,red,0.0,0.0,1.0
1,blue,1.0,0.0,0.0
2,green,0.0,1.0,0.0
3,green,0.0,1.0,0.0
4,red,0.0,0.0,1.0
5,blue,1.0,0.0,0.0
