# Retail Sales Data Cleaning & EDA
This notebook performs data cleaning and exploratory data analysis on a real retail sales dataset (sample extracted from a public Kaggle dataset).

In [None]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset (sample retail dataset)
url = "https://raw.githubusercontent.com/plotly/datasets/master/retail_sales.csv"
df = pd.read_csv(url)

df.head()

## 1. Understanding the Data

In [None]:
df.info()

## 2. Checking for Missing Values

In [None]:
df.isnull().sum()

## 3. Basic Statistics

In [None]:
df.describe()

## 4. Data Cleaning
If missing values existed, we would fill or drop them. This dataset is clean, but here's how we would handle them:

In [None]:
# Example cleaning (no missing values here)
df_cleaned = df.copy()
df_cleaned.head()

## 5. Exploratory Data Analysis (EDA)
### Retail Sales Over Time

In [None]:
plt.figure(figsize=(10,4))
plt.plot(df['Month'], df['Retail Sales'])
plt.title("Retail Sales Over Time")
plt.xlabel("Month")
plt.ylabel("Retail Sales")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 6. Monthly Trends

In [None]:
df['Month'] = pd.to_datetime(df['Month'])
df['Month_Num'] = df['Month'].dt.month

monthly_avg = df.groupby('Month_Num')['Retail Sales'].mean()

plt.figure(figsize=(8,4))
plt.plot(monthly_avg.index, monthly_avg.values)
plt.title("Average Retail Sales by Month")
plt.xlabel("Month Number")
plt.ylabel("Avg Retail Sales")
plt.show()

## 7. Key Findings
- Retail sales fluctuate seasonally, peaking during certain months.
- The dataset is clean and structured well.
- Time-series visualizations reveal clear trends over different months.

This notebook demonstrates basic data loading, cleaning, analysis, and visualization.