<a href="https://colab.research.google.com/github/Thanuja200/ML/blob/main/6th_exp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Sample Data
data = {
    'Employee id': [10, 20, 15, 25, 30],
    'Gender': ['M', 'F', 'F', 'M', 'F'],
    'Remarks': ['Good', 'Nice', 'Good', 'Great', 'Nice']
}
df = pd.DataFrame(data)
print(f"Original Employee Data:\n{df}\n")

# Save original dataset
df.to_csv('employee_data.csv', index=False)

# ---------- Using Pandas get_dummies ----------
df_pandas_encoded = pd.get_dummies(df, columns=['Gender', 'Remarks'], drop_first=True)
print(f"One-Hot Encoded Data using Pandas:\n{df_pandas_encoded}\n")

# ---------- Using Scikit-learn OneHotEncoder ----------
# Define categorical columns
categorical_columns = ['Gender', 'Remarks']

# Initialize OneHotEncoder
encoder = OneHotEncoder(drop='first', sparse_output=False)

# Fit and transform categorical columns
one_hot_encoded = encoder.fit_transform(df[categorical_columns])

# Convert encoded output to DataFrame
one_hot_df = pd.DataFrame(one_hot_encoded, columns=encoder.get_feature_names_out(categorical_columns))

# Concatenate with original DataFrame (excluding original categorical columns)
df_sklearn_encoded = pd.concat([df.drop(categorical_columns, axis=1), one_hot_df], axis=1)
print(f"One-Hot Encoded Data using Scikit-Learn:\n{df_sklearn_encoded}")


Original Employee Data:
   Employee id Gender Remarks
0           10      M    Good
1           20      F    Nice
2           15      F    Good
3           25      M   Great
4           30      F    Nice

One-Hot Encoded Data using Pandas:
   Employee id  Gender_M  Remarks_Great  Remarks_Nice
0           10      True          False         False
1           20     False          False          True
2           15     False          False         False
3           25      True           True         False
4           30     False          False          True

One-Hot Encoded Data using Scikit-Learn:
   Employee id  Gender_M  Remarks_Great  Remarks_Nice
0           10       1.0            0.0           0.0
1           20       0.0            0.0           1.0
2           15       0.0            0.0           0.0
3           25       1.0            1.0           0.0
4           30       0.0            0.0           1.0
