# 🧹 Data Cleaning and Preprocessing – Task 1
This notebook demonstrates data cleaning steps on the Amazon dataset as per the Data Analyst Internship Task 1.

## 📥 Step 1: Load the Dataset

In [None]:
import pandas as pd

# Load the cleaned dataset
df = pd.read_csv("amazon_cleaned_task1.csv")
df.head()

## 🔍 Step 2: Inspect Data Info & Missing Values

In [None]:
# Check dataset information
df.info()

# Check for missing values
df.isnull().sum()

## 🧼 Step 3: Handle Duplicates and Clean Columns

In [None]:
# Drop duplicates if any
df = df.drop_duplicates()

# Standardize column names (already done in preprocessing)
df.columns = df.columns.str.lower().str.replace(" ", "_")

## 🔢 Step 4: Convert Data Types

In [None]:
# These should already be converted, but here's a check
df.dtypes

## 📊 Step 5: Visualize Missing Values

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Value Heatmap')
plt.show()

## ✅ Summary
- Removed duplicates
- Standardized column names
- Cleaned price, discount, and rating columns
- Preserved missing values for further treatment or imputation
- Dataset ready for analysis!