# Exploratory Data Analysis (EDA) on Retail Sales Data

This project is part of the Oasis Infobyte Data Analytics internship.

### Objective
Perform exploratory data analysis (EDA) on a retail sales dataset to uncover patterns, trends, and insights that can help the retail business make informed decisions.

### Steps Covered
1. Data Loading & Cleaning
2. Descriptive Statistics
3. Time Series Analysis
4. Customer & Product Analysis
5. Visualization
6. Recommendations


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


In [None]:

# Load dataset
df = pd.read_csv("retail_sales_dataset.csv")
df.head()


In [None]:

# Strip spaces and convert date column
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].str.strip()

# Detect and convert date column
date_col = "Date"
df[date_col] = pd.to_datetime(df[date_col], errors='coerce')
df['Year'] = df[date_col].dt.year
df['Month'] = df[date_col].dt.month
df['YearMonth'] = df[date_col].dt.to_period('M')

# Create Sales column if not present
if 'Sales' not in df.columns:
    df['Sales'] = df['Quantity'] * df['Price per Unit']

df.info()


In [None]:

# Summary statistics
df.describe(include='all')


In [None]:

plt.figure(figsize=(8,5))
plt.hist(df['Sales'].dropna(), bins=40)
plt.title("Sales Distribution")
plt.xlabel("Sales")
plt.ylabel("Frequency")
plt.show()


In [None]:

monthly_sales = df.set_index(date_col)['Sales'].resample('M').sum()
plt.figure(figsize=(12,6))
plt.plot(monthly_sales.index, monthly_sales.values, marker='o')
plt.title("Monthly Sales Over Time")
plt.xlabel("Date")
plt.ylabel("Total Sales")
plt.grid(True)
plt.show()


In [None]:

top_products = df.groupby('Product Category')['Sales'].sum().sort_values(ascending=False).head(10)
plt.figure(figsize=(10,6))
sns.barplot(x=top_products.index, y=top_products.values)
plt.title("Top 10 Products by Sales")
plt.xticks(rotation=45)
plt.ylabel("Sales")
plt.show()
top_products


In [None]:

plt.figure(figsize=(8,6))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()



### Recommendations
1. **Stock & Promotions**: Focus on top categories (Electronics, Clothing, Beauty).  
2. **Seasonality Planning**: Plan inventory & marketing around high-sales months.  
3. **Customer Targeting**: Segment high-value customers for loyalty programs.  
4. **Data Quality**: Maintain consistent formats, handle missing values properly.  
