# Walmart Sales – EDA & Visualizing Sales Trends

This notebook explores the **Walmart weekly sales dataset** and builds visualizations
to understand sales trends and external drivers like holidays, temperature,
fuel price, CPI and unemployment.

Dataset columns:

- `Store`: Store ID (1–45)
- `Date`: Week end date
- `Weekly_Sales`: Weekly net sales for the store
- `Holiday_Flag`: 1 if the week includes a major holiday, else 0
- `Temperature`: Average temperature in the region (°F)
- `Fuel_Price`: Fuel price in the region
- `CPI`: Consumer Price Index
- `Unemployment`: Unemployment rate

We will:
1. Clean & prepare the data
2. Analyse overall and store‑level sales trends
3. Compare holiday vs non‑holiday performance
4. Study relationships between sales and macro variables
5. Build a few dashboard‑style summary charts


In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import plotly.express as px

plt.rcParams['figure(figsize)'] = (10, 5)

# Load cleaned data (generated in the repo)
df = pd.read_csv('Walmart_cleaned.csv', parse_dates=['Date'])

df.head()

KeyError: 'figure(figsize) is not a valid rc parameter (see rcParams.keys() for a list of valid parameters)'

## 1. Data preparation & basic checks

In [None]:
df.info()

df.describe().T

### Missing values check

In [None]:
df.isna().sum()

### Time features & sorting (already present, but we can re-create if needed)

In [None]:
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Week'] = df['Date'].dt.isocalendar().week.astype(int)
df['Is_Holiday'] = df['Holiday_Flag'].astype(bool)

df.sort_values(['Store', 'Date'], inplace=True)
df.head()

## 2. Overall sales trend over time

In [None]:
weekly_trend = df.groupby('Date', as_index=False)['Weekly_Sales'].sum()

fig = px.line(weekly_trend, x='Date', y='Weekly_Sales',
              title='Total Weekly Sales Over Time')
fig.show()

## 3. Store performance – which stores sell more?

In [None]:
store_sales = df.groupby('Store', as_index=False)['Weekly_Sales'].mean()

plt.figure()
sns.barplot(data=store_sales, x='Store', y='Weekly_Sales')
plt.xticks(rotation=90)
plt.title('Average Weekly Sales by Store')
plt.tight_layout()
plt.show()

## 4. Holiday vs non-holiday weeks

In [None]:
holiday_sales = df.groupby('Is_Holiday', as_index=False)['Weekly_Sales'].mean()
holiday_sales['Is_Holiday'] = holiday_sales['Is_Holiday'].map({True: 'Holiday week', False: 'Non-holiday week'})

plt.figure()
sns.barplot(data=holiday_sales, x='Is_Holiday', y='Weekly_Sales')
plt.title('Average Weekly Sales: Holiday vs Non-Holiday Weeks')
plt.show()

# Distribution by store type
plt.figure()
sns.boxplot(data=df, x='Is_Holiday', y='Weekly_Sales')
plt.title('Sales Distribution by Holiday Flag')
plt.show()

## 5. Seasonality – sales by month and year

In [None]:
month_year_sales = df.groupby(['Year', 'Month'], as_index=False)['Weekly_Sales'].sum()
month_year_sales['YearMonth'] = pd.to_datetime(month_year_sales['Year'].astype(str) + '-' + month_year_sales['Month'].astype(str) + '-01')

fig = px.line(month_year_sales, x='YearMonth', y='Weekly_Sales',
              color='Year',
              title='Monthly Total Sales Trend')
fig.show()

plt.figure()
month_avg = df.groupby('Month', as_index=False)['Weekly_Sales'].mean()
sns.barplot(data=month_avg, x='Month', y='Weekly_Sales')
plt.title('Average Weekly Sales by Month (Seasonality)')
plt.show()

## 6. Correlation between numeric variables

In [None]:
num_cols = ['Weekly_Sales', 'Temperature', 'Fuel_Price', 'CPI', 'Unemployment']
corr = df[num_cols].corr()

plt.figure()
sns.heatmap(corr, annot=True, fmt='.2f')
plt.title('Correlation Heatmap – Sales vs Drivers')
plt.show()

## 7. Relationship between sales and external drivers

In [None]:
fig1 = px.scatter(df, x='Temperature', y='Weekly_Sales',
                 trendline='ols',
                 title='Weekly Sales vs Temperature')
fig1.show()

fig2 = px.scatter(df, x='Fuel_Price', y='Weekly_Sales',
                 trendline='ols',
                 title='Weekly Sales vs Fuel Price')
fig2.show()

fig3 = px.scatter(df, x='CPI', y='Weekly_Sales',
                 trendline='ols',
                 title='Weekly Sales vs CPI')
fig3.show()

fig4 = px.scatter(df, x='Unemployment', y='Weekly_Sales',
                 trendline='ols',
                 title='Weekly Sales vs Unemployment')
fig4.show()

## 8. Simple dashboard-style summary view

In [None]:
# KPI-style aggregates
total_sales = df['Weekly_Sales'].sum()
avg_weekly_sales = df['Weekly_Sales'].mean()
best_store = df.groupby('Store')['Weekly_Sales'].sum().idxmax()
best_store_sales = df.groupby('Store')['Weekly_Sales'].sum().max()

print(f"Total sales in dataset: ${total_sales:,.0f}")
print(f"Average weekly sales per record: ${avg_weekly_sales:,.0f}")
print(f"Best-performing store: Store {best_store} (total sales = ${best_store_sales:,.0f})")

# Interactive dashboard-like plot: trend + store filter
fig = px.line(df, x='Date', y='Weekly_Sales', color='Store',
              title='Weekly Sales Over Time by Store')
fig.update_layout(legend_title_text='Store ID')
fig.show()

## 9. Key Insights (to summarise in README / report)

Use the charts above to answer questions like:

- Are sales generally increasing, decreasing, or flat over time?
- Which stores consistently outperform others?
- Do holiday weeks have significantly higher sales?
- Which months show peak demand (seasonality)?
- Are sales strongly correlated with fuel price, CPI or unemployment?
- What main messages would you show on a business dashboard?

You can copy the bullet points you observe here into your GitHub README or insight report.
