# EDA Example: Retail Sales Analysis

In this notebook, we simulate a simple retail sales dataset and perform exploratory data analysis (EDA). 

We'll cover:

- Data simulation and loading
- Data inspection using Pandas
- Visualization with Matplotlib and Seaborn
- Identifying trends and potential outliers

Let's get started!

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Configure visualization style
sns.set(style='whitegrid')
%matplotlib inline

## Simulating a Retail Sales Dataset

We'll simulate a dataset with the following columns:

- **Date:** The date of the sale
- **Store_ID:** A unique identifier for the store
- **Product_Category:** The product category sold
- **Units_Sold:** Number of units sold
- **Revenue:** Total revenue from the sale

This dataset will allow us to explore daily sales trends, category performance, and store comparisons.

In [None]:
import datetime

# Set random seed for reproducibility
np.random.seed(42)

# Generate date range
date_range = pd.date_range(start='2023-01-01', periods=100, freq='D')

# Simulate data
data = {
    'Date': np.random.choice(date_range, size=500),
    'Store_ID': np.random.randint(1, 6, size=500),
    'Product_Category': np.random.choice(['Electronics', 'Clothing', 'Grocery', 'Home'], size=500),
    'Units_Sold': np.random.randint(1, 20, size=500)
}

# Calculate revenue assuming each unit in different categories has a different average price
avg_price = {'Electronics': 300, 'Clothing': 50, 'Grocery': 10, 'Home': 80}
data['Revenue'] = [units * avg_price[cat] for units, cat in zip(data['Units_Sold'], data['Product_Category'])]

# Create DataFrame
df = pd.DataFrame(data)

# Show first few rows
df.head()

## Inspecting the Data

Let's inspect the dataset to understand its structure and check for any anomalies.

In [None]:
# Basic info
print('DataFrame Shape:', df.shape)
print('\nDataFrame Info:')
df.info()

# Descriptive statistics
df.describe()

## Visualizing the Data

### 1. Sales Trend Over Time

We'll analyze the trend in total revenue over time.

In [None]:
# Aggregate revenue by date
daily_revenue = df.groupby('Date')['Revenue'].sum().reset_index()

# Plot daily revenue trend
plt.figure(figsize=(12, 6))
plt.plot(daily_revenue['Date'], daily_revenue['Revenue'], marker='o', linestyle='-')
plt.title('Daily Revenue Trend')
plt.xlabel('Date')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### 2. Product Category Performance

Let's compare the total revenue generated by each product category.

In [None]:
# Aggregate revenue by product category
category_revenue = df.groupby('Product_Category')['Revenue'].sum().reset_index()

# Bar plot of revenue by product category
plt.figure(figsize=(8, 6))
sns.barplot(x='Product_Category', y='Revenue', data=category_revenue, palette='viridis')
plt.title('Revenue by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Total Revenue')
plt.show()

### 3. Store Performance

We can also compare how different stores are performing in terms of total units sold and revenue.

In [None]:
# Aggregate units sold and revenue by store
store_stats = df.groupby('Store_ID').agg({'Units_Sold': 'sum', 'Revenue': 'sum'}).reset_index()
print(store_stats)

# Visualize store revenue
plt.figure(figsize=(8, 6))
sns.barplot(x='Store_ID', y='Revenue', data=store_stats, palette='coolwarm')
plt.title('Revenue by Store')
plt.xlabel('Store ID')
plt.ylabel('Total Revenue')
plt.show()

## Conclusion

In this notebook, we simulated a retail sales dataset and performed some key exploratory data analysis steps:

- **Data Inspection:** We looked at the data structure and summary statistics.
- **Visualization:** We visualized trends over time, product category performance, and store comparisons.

These techniques form the foundation of EDA, helping us gain insights and prepare our data for further analysis or modeling.

Feel free to extend this analysis with more detailed questions or additional visualizations!