## One Hot Encoding Implementation

One Hot Encoding is a method for converting categorical variables into a binary format.<br>
It creates new columns for each category where 1 means the category is present and 0 means it is not.

### Using Pandas

In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

In [2]:
data = {
    'Employee id': [10, 20, 15, 25, 30],
    'Gender': ['M', 'F', 'F', 'M', 'F'],
    'Remarks': ['Good', 'Nice', 'Good', 'Great', 'Nice']
}

In [3]:
df = pd.DataFrame(data)
print(f"Original Employee Data:\n{df}\n")

Original Employee Data:
   Employee id Gender Remarks
0           10      M    Good
1           20      F    Nice
2           15      F    Good
3           25      M   Great
4           30      F    Nice



We pass the categorical columns to 'columns' and set drop_first=True to avoid dummy variable trap

In [4]:
df_pandas_encoded = pd.get_dummies(df, columns=['Gender', 'Remarks'], drop_first = True)

In [5]:
print(f"One-Hot Encoded Data using Pandas:\n{df_pandas_encoded}\n")

One-Hot Encoded Data using Pandas:
   Employee id  Gender_M  Remarks_Great  Remarks_Nice
0           10      True          False         False
1           20     False          False          True
2           15     False          False         False
3           25      True           True         False
4           30     False          False          True



In [8]:
categorical_columns = ['Gender', 'Remarks']

In [18]:
#  Create the OneHotEncoder instance

encoder = OneHotEncoder(sparse_output=False)

In [17]:
# Fit the encoder on the categorical columns and transform them

one_hot_encoded = encoder.fit_transform(df[categorical_columns])

In [15]:
# Create a DataFrame from the encoded NumPy array

one_hot_df = pd.DataFrame(one_hot_encoded, columns = encoder.get_feature_names_out(categorical_columns))

In [13]:
# Concatenate the one-hot encoded columns back with the original non-categorical columns

df_sklearn_encoded = pd.concat([df.drop(categorical_columns, axis = 1), one_hot_df], axis = 1)

In [14]:
print("One-Hot Encoded Data using Scikit-Learn:\n", df_sklearn_encoded)

One-Hot Encoded Data using Scikit-Learn:
    Employee id  Gender_F  Gender_M  Remarks_Good  Remarks_Great  Remarks_Nice
0           10       0.0       1.0           1.0            0.0           0.0
1           20       1.0       0.0           0.0            0.0           1.0
2           15       1.0       0.0           1.0            0.0           0.0
3           25       0.0       1.0           0.0            1.0           0.0
4           30       1.0       0.0           0.0            0.0           1.0
