# 🧹 Online Sales Data Cleaning Project
This notebook demonstrates **data cleaning** on an Online Sales dataset using Python (`pandas` and `numpy`).

### 📌 Objectives
1. Load raw dataset
2. Inspect the data
3. Handle duplicates and missing values
4. Fix data types
5. Standardize categorical columns
6. Feature engineering (create new columns)
7. Save cleaned dataset

In [None]:
# Import libraries
import pandas as pd
import numpy as np

In [None]:
# Load dataset
file_path = "Online Sales Data.csv"
df = pd.read_csv(file_path)

print("Initial shape:", df.shape)
df.head()

### 🔍 Step 1: Data Inspection
Check data types, null values, and overall structure.

In [None]:
df.info()
df.describe(include="all")

### 🧹 Step 2: Handle Missing Values and Duplicates

In [None]:
# Drop duplicates
df = df.drop_duplicates()

# Drop rows with missing values (none exist here, but included for robustness)
df = df.dropna()

print("Shape after cleaning:", df.shape)

### 📅 Step 3: Fix Data Types

In [None]:
# Convert Date column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Ensure Transaction ID is treated as string
df['Transaction ID'] = df['Transaction ID'].astype(str)

### ✨ Step 4: Standardize Categorical Columns

In [None]:
df['Payment Method'] = df['Payment Method'].str.title().str.strip()
df['Region'] = df['Region'].str.title().str.strip()
df['Product Category'] = df['Product Category'].str.title().str.strip()

### ➕ Step 5: Feature Engineering
Create a new column **Revenue per Unit**.

In [None]:
df['Revenue per Unit'] = df['Total Revenue'] / df['Units Sold']
df.head()

### 💾 Step 6: Save Cleaned Dataset

In [None]:
df.to_csv("Cleaned_Online_Sales_Data.csv", index=False)
print("Cleaned dataset saved successfully!")

✅ Cleaning completed — the dataset is now ready for **analysis and visualization**.