# Retail Sales Analysis
This notebook contains data cleaning, EDA, visualizations and business insights for a retail sales dataset.
Skills: Pandas, NumPy, Matplotlib, EDA, Customer segmentation, Insights.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('../data/retail_sales.csv', parse_dates=['Date'])
df.head()

## Data Cleaning
- Check missing values
- Remove duplicates
- Convert datatypes


In [None]:
df.info()
df.isnull().sum()

In [None]:
# Simple cleaning
df_clean = df.copy()
df_clean.drop_duplicates(inplace=True)
df_clean['Product'].fillna('Unknown', inplace=True)
df_clean['Category'].fillna('Unknown', inplace=True)
df_clean['Region'].fillna('Unknown', inplace=True)
df_clean['Total Revenue'] = df_clean['Total Revenue'].astype(float)
df_clean.head()

## Exploratory Data Analysis
- Monthly revenue trend
- Top products
- Revenue by region


In [None]:
monthly = df_clean.set_index('Date').resample('M')['Total Revenue'].sum()
monthly.plot(figsize=(10,5), title='Monthly Revenue')


In [None]:
top_products = df_clean.groupby('Product')['Total Revenue'].sum().sort_values(ascending=False).head(10)
top_products.plot(kind='barh', figsize=(8,6), title='Top 10 Products by Revenue')


## Customer Segmentation (simple RFM-like)
This section demonstrates how to identify high-value customers.

In [None]:
cust = df_clean.groupby('Customer ID').agg({'Date':'max','Order ID':'count','Total Revenue':'sum'}).rename(columns={'Order ID':'orders'})
cust.sort_values('Total Revenue', ascending=False).head(10)

## Insights & Recommendations
- Focus marketing on top regions and products
- Offer loyalty discounts to top customers
- Investigate low-performing categories