#  Sales Trend Analysis

This analysis focuses on understanding how sales have evolved over time to identify patterns, trends, and seasonality in customer purchases. These insights support better business decisions in marketing, inventory, and forecasting.

## What We Will Explore:

# 1. Time-Based Sales Aggregation

Monthly Sales: Total revenue per month.

Daily Sales: Total revenue per day.

# 2. Smoothing & Trend Detection

3-Month Moving Average: Helps smooth short-term fluctuations and highlight overall sales trends.

# 3. Visualizations

To better understand sales trends over time, we will use:

Matplotlib: Line chart of monthly sales to show overall trends.

Matplotlib: Line chart with a 3-month moving average overlay to visualize smoothed sales trends.

Plotly: Interactive line chart of daily sales to detect short-term fluctuations and anomalies.

In [1]:
# numpy: numerical computations and array operations
import numpy as np

# pandas: data manipulation and handling tabular data
import pandas as pd

# seaborn: statistical and attractive visualizations
import seaborn as sns

# plotly.express: interactive, dynamic plots
import plotly.express as px


In [6]:
#load cleaned dataset
#parse_dates automatically converts string dates into proper datetime format when reading a CSV.
df = pd.read_csv('../data/cleaned/cleaned_online_retail.csv', parse_dates=['InvoiceDate'])

df.head

  df = pd.read_csv('../data/cleaned/cleaned_online_retail.csv', parse_dates=['InvoiceDate'])


<bound method NDFrame.head of        InvoiceNo StockCode                          ProductName  Quantity  \
0         536365    85123A   white hanging heart t-light holder         6   
1         536365     71053                  white metal lantern         6   
2         536365    84406B       cream cupid hearts coat hanger         8   
3         536365    84029G  knitted union flag hot water bottle         6   
4         536365    84029E       red woolly hottie white heart.         6   
...          ...       ...                                  ...       ...   
524873    581587     22613          pack of 20 spaceboy napkins        12   
524874    581587     22899          children's apron dolly girl         6   
524875    581587     23254         childrens cutlery dolly girl         4   
524876    581587     23255      childrens cutlery circus parade         4   
524877    581587     22138         baking set 9 piece retrospot         3   

               InvoiceDate  UnitPrice  Custom

In [8]:
# Convert 'Invoice' from object to numeric type
# errors='coerce' turns invalid values into NaN instead of error
# .astype('Int64') ensures integer type that supports missing values (NaN)
df['InvoiceNo'] = pd.to_numeric(df['InvoiceNo'], errors='coerce').astype('Int64')
#to validate
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 524878 entries, 0 to 524877
Data columns (total 14 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   InvoiceNo    524877 non-null  Int64         
 1   StockCode    524878 non-null  object        
 2   ProductName  524878 non-null  object        
 3   Quantity     524878 non-null  int64         
 4   InvoiceDate  524878 non-null  datetime64[ns]
 5   UnitPrice    524878 non-null  float64       
 6   CustomerID   524878 non-null  int64         
 7   Country      524878 non-null  object        
 8   TotalPrice   524878 non-null  float64       
 9   Year         524878 non-null  int64         
 10  Month        524878 non-null  int64         
 11  Day          524878 non-null  int64         
 12  Hour         524878 non-null  int64         
 13  Weekday      524878 non-null  object        
dtypes: Int64(1), datetime64[ns](1), float64(2), int64(6), object(4)
memory usage: 56.6+ 

In [9]:
#Calculate the total sales per transaction
df['Sales'] = df['Quantity'] * df['UnitPrice']


### Creating a Combined YearMonth Feature

Since the dataset contains separate **Year** and **Month** columns as integers, we convert them to strings before combining. 

- Converting to strings allows us to concatenate Year and Month properly to form a single `YearMonth` string like `"2023-03"`.
- The **Month** is zero-padded (e.g., `"03"` instead of `"3"`) to keep a consistent format for sorting and grouping.
- Without converting to strings, adding the two integer columns would result in numeric addition rather than a combined label.

This new `YearMonth` feature simplifies grouping sales data by month for trend analysis.


In [12]:
# If Year and Month are integers, convert them to strings first
df['Year'] = df['Year'].astype(str)
df['Month'] = df['Month'].astype(str).str.zfill(2)  # Pads month with leading zero if needed (e.g., '03')

# Combine Year and Month into YearMonth string like '2023-03'
df['YearMonth'] = df['Year'] + '-' + df['Month']


### Aggregating Sales by Month

Using the combined `YearMonth` feature, we group the dataset to calculate total sales per month. This aggregation helps analyze monthly sales trends, identify seasonality, and monitor growth or decline over time.

Grouping by `YearMonth` and summing the `Sales` column provides a clear view of monthly revenue performance.


In [13]:
# Aggregate total sales by YearMonth
monthly_sales = df.groupby('YearMonth')['Sales'].sum().reset_index()

# Optional: Sort by YearMonth to ensure chronological order
monthly_sales = monthly_sales.sort_values('YearMonth')

# Display the aggregated monthly sales
print(monthly_sales.head())


  YearMonth       Sales
0   2010-12  821452.730
1   2011-01  689811.610
2   2011-02  522545.560
3   2011-03  716215.260
4   2011-04  536968.491


### Aggregating Sales by Day

To analyze daily sales trends, we group the data by the date part of the `InvoiceDate` column. This helps detect short-term fluctuations, spikes, or dips in sales performance.

Grouping by date and summing the `Sales` column provides detailed insights into daily revenue patterns.


In [14]:
# Ensure InvoiceDate is in datetime format
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])

# Aggregate total sales by day
daily_sales = df.groupby(df['InvoiceDate'].dt.date)['Sales'].sum().reset_index()

# Rename columns for clarity
daily_sales.columns = ['Date', 'TotalSales']

# Display the aggregated daily sales
print(daily_sales.head())


         Date  TotalSales
0  2010-12-01    58776.79
1  2010-12-02    47629.42
2  2010-12-03    46898.63
3  2010-12-05    31364.63
4  2010-12-06    54624.15
