<a href="https://colab.research.google.com/github/charm-23/dataset-encoding/blob/main/ORDINAL_ONE_HOT_ENCODING.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

*ORDINAL ENCODING*

In [None]:
import pandas as pd

In [None]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
df = pd.read_csv(url, header=None)


In [None]:
df.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']


In [None]:
print("Unique species values:")
print(df['species'].unique())

Unique species values:
['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


In [None]:
df['species'] = df['species'].str.strip()

In [None]:
print("Normalized species values:")
print(df['species'].unique())

Normalized species values:
['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


In [None]:
species_mapping = {
    'Iris-setosa': 0,
    'Iris-versicolor': 1,
    'Iris-virginica': 2
}

In [None]:
df['species_encoded'] = df['species'].map(species_mapping)

In [None]:
print("DataFrame after Ordinal Encoding:")
print(df)


DataFrame after Ordinal Encoding:
     sepal_length  sepal_width  petal_length  petal_width         species  \
0             5.1          3.5           1.4          0.2     Iris-setosa   
1             4.9          3.0           1.4          0.2     Iris-setosa   
2             4.7          3.2           1.3          0.2     Iris-setosa   
3             4.6          3.1           1.5          0.2     Iris-setosa   
4             5.0          3.6           1.4          0.2     Iris-setosa   
..            ...          ...           ...          ...             ...   
145           6.7          3.0           5.2          2.3  Iris-virginica   
146           6.3          2.5           5.0          1.9  Iris-virginica   
147           6.5          3.0           5.2          2.0  Iris-virginica   
148           6.2          3.4           5.4          2.3  Iris-virginica   
149           5.9          3.0           5.1          1.8  Iris-virginica   

     species_encoded  
0                 

***ONE HOT ENCODING***

In [None]:

print("Original DataFrame:")
print(df.head())

unique_species = df['species'].unique()
num_categories = len(unique_species)

onehot_encoded = pd.DataFrame(0, index=df.index, columns=unique_species)

for index, row in df.iterrows():
    onehot_encoded.at[index, row['species']] = 1

df_onehot = pd.concat([df, onehot_encoded], axis=1)

print("DataFrame after One-Hot Encoding:")
print(df_onehot.head())


Original DataFrame:
   sepal_length  sepal_width  petal_length  petal_width      species
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa

DataFrame after One-Hot Encoding:
   sepal_length  sepal_width  petal_length  petal_width      species  \
0           5.1          3.5           1.4          0.2  Iris-setosa   
1           4.9          3.0           1.4          0.2  Iris-setosa   
2           4.7          3.2           1.3          0.2  Iris-setosa   
3           4.6          3.1           1.5          0.2  Iris-setosa   
4           5.0          3.6           1.4          0.2  Iris-setosa   

   Iris-setosa  Iris-versicolor  Iris-virginica  
0            1                0               0 