# Assignment‑2: Basic Data Pre‑Processing (UCI Dataset)

**Objective:** Perform core data‑preprocessing operations on a real dataset using Python.

Dataset used: **Iris — UCI Machine Learning Repository**

---

## 🔹 Step 1: Import Libraries

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, StandardScaler
print('Libraries imported.')

## 🔹 Step 2: Load Dataset from UCI

In [None]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
columns = ['sepal_length','sepal_width','petal_length','petal_width','class']
df = pd.read_csv(url, names=columns)
df.head()

## 🔹 Step 3: Dataset Overview

In [None]:
print('Shape:', df.shape)
print('\nColumn Names:', df.columns.tolist())
df.head()

## 🔹 Step 4: Data Types & Summary

In [None]:
df.info()

In [None]:
df.describe()

## 🔹 Step 5: Check Missing Values

In [None]:
df.isnull().sum()

## 🔹 Step 6: Handle Missing Values (if found)

In [None]:
# Example strategy: fill numeric columns with mean
for col in df.select_dtypes(include=['float64','int64']).columns:
    df[col].fillna(df[col].mean(), inplace=True)

# Fill categorical values with mode
for col in df.select_dtypes(include=['object']).columns:
    df[col].fillna(df[col].mode()[0], inplace=True)

df.isnull().sum()

## 🔹 Step 7: Remove Duplicate Records

In [None]:
before = df.shape[0]
df.drop_duplicates(inplace=True)
after = df.shape[0]
print('Duplicates removed:', before - after)

## 🔹 Step 8: Encode Categorical Column

In [None]:
encoder = LabelEncoder()
df['class'] = encoder.fit_transform(df['class'])
df.head()

## 🔹 Step 9: Feature Scaling

In [None]:
scaler = StandardScaler()
cols = ['sepal_length','sepal_width','petal_length','petal_width']
df[cols] = scaler.fit_transform(df[cols])
df.head()

## 🔹 Step 10: Save Final Dataset

In [None]:
df.to_csv('assignment2_cleaned_iris.csv', index=False)
print('Saved: assignment2_cleaned_iris.csv')

---
### ✔ End of Assignment‑2
Make sure to explain outputs briefly while submitting.