14.	Perform the following operations in python on given dataset [Iris.csv]
a.	Check for and handle any duplicated rows or missing values (insert some intentionally for practice).
b.	Combine with an external species characteristics table (e.g., color, blooming time).
c.	Normalize petal/sepal measurements.
d.	Add a size_ratio = petal_length / sepal_length column

In [1]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# a. Load dataset and handle duplicates/missing values

In [3]:
df = pd.read_csv("Iris.csv")

In [4]:
# Intentionally add some duplicates and missing values (for practice)
df = pd.concat([df, df.iloc[0:2]], ignore_index=True)  # Add duplicates
df.loc[5, 'PetalLengthCm'] = None  # Insert a missing value
df.loc[10, 'SepalWidthCm'] = None

In [5]:
# Drop duplicates
df = df.drop_duplicates()

In [6]:
# Fill missing values with column mean
df.fillna(df.mean(numeric_only=True), inplace=True)

# b. Combine with external species characteristics table

In [8]:
# Example species table (you can expand this with more realistic data)
species_data = {
    'Species': ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'],
    'Color': ['White', 'Purple', 'Pink'],
    'BloomingSeason': ['Spring', 'Summer', 'Autumn']
}

In [9]:
species_df = pd.DataFrame(species_data)

In [10]:
# Merge with Iris dataset on 'Species'
df = pd.merge(df, species_df, on='Species', how='left')

# c. Normalize petal/sepal measurements

In [12]:
scaler = MinMaxScaler()
features_to_normalize = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']
df[features_to_normalize] = scaler.fit_transform(df[features_to_normalize])

# d. Add a size_ratio = petal_length / sepal_length column

In [14]:
df['Size_Ratio'] = df['PetalLengthCm'] / df['SepalLengthCm']

In [15]:
# Final view
print(df.head())

   Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species  \
0   1       0.222222      0.625000       0.067797      0.041667  Iris-setosa   
1   2       0.166667      0.416667       0.067797      0.041667  Iris-setosa   
2   3       0.111111      0.500000       0.050847      0.041667  Iris-setosa   
3   4       0.083333      0.458333       0.084746      0.041667  Iris-setosa   
4   5       0.194444      0.666667       0.067797      0.041667  Iris-setosa   

   Color BloomingSeason  Size_Ratio  
0  White         Spring    0.305085  
1  White         Spring    0.406780  
2  White         Spring    0.457627  
3  White         Spring    1.016949  
4  White         Spring    0.348668  
