# **(RETAIL SALES ETL AND ANALYSIS NOTEBOOK)**

## Objectives

The objective of this notebook is to extract, clean, transform, and analyse retail sales data in order to identify sales trends, evaluate the impact of promotional markdowns, and compare holiday versus non-holiday sales performance.
The notebook supports data-driven insights to assist retail and marketing stakeholders in strategic decision-making.

## Inputs

To run this notebook, the following inputs are required:

- Raw retail sales dataset stored in the data/raw/ directory

- Data fields including:

    . Weekly sales figures

    . Store identifiers and attributes (store type, size, region)

    . Promotional markdown values (MarkDown1â€“MarkDown5)

    . Holiday indicators

- Python libraries:

    . pandas

    . numpy

    . matplotlib

    . seaborn

    . plotly

## Outputs

By the end of this notebook, the following outputs are generated:

- A cleaned and transformed retail sales dataset saved to data/processed/

- Engineered features such as total promotional markdowns and holiday indicators

- Descriptive statistics summarising sales performance

- Visualisations illustrating:

    . Sales trends over time

    . Store and regional comparisons

    . Impact of promotional markdowns

    . Holiday vs non-holiday sales performance

- Business-focused insights and conclusions documented within the notebook

## Additional Comments

All analysis steps are documented using Markdown cells to ensure transparency and reproducibility.

Limitations of the data and analysis are acknowledged, and potential future improvements are discussed.


---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/Users/isaacola/Documents/vscode-project/retail-sales/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'/Users/isaacola/Documents/vscode-project/retail-sales'

# Section 1

## ETL Process

In this section, we perform Extract, Transform, Load (ETL) operations on the retail sales data:

- **Extract**: Load raw CSV files (sales, stores, features).
- **Transform**: Merge datasets, handle missing values, convert data types, engineer features.
- **Load**: Save the cleaned dataset to the clean-data directory.

In [4]:
# Import necessary libraries
import pandas as pd
import numpy as np
import os

# Extract: Load raw data
sales_df = pd.read_csv('dataset/raw-data/sales-data-set.csv')
stores_df = pd.read_csv('dataset/raw-data/stores-data-set.csv')
features_df = pd.read_csv('dataset/raw-data/Features-data-set.csv')

# Transform: Merge datasets
# Merge sales with stores
merged_df = pd.merge(sales_df, stores_df, on='Store', how='left')

# Merge with features on Store and Date
merged_df = pd.merge(merged_df, features_df, on=['Store', 'Date'], how='left')

# Convert Date to datetime
merged_df['Date'] = pd.to_datetime(merged_df['Date'], format='%d/%m/%Y')

# Handle missing values: Replace 'NA' with NaN and fill MarkDowns with 0
merged_df.replace('NA', np.nan, inplace=True)
markdown_cols = ['MarkDown1', 'MarkDown2', 'MarkDown3', 'MarkDown4', 'MarkDown5']
merged_df[markdown_cols] = merged_df[markdown_cols].fillna(0)

# Feature engineering: Total MarkDown
merged_df['Total_MarkDown'] = merged_df[markdown_cols].sum(axis=1)

# Ensure clean-data directory exists
os.makedirs('dataset/clean-data', exist_ok=True)

# Load: Save cleaned data
merged_df.to_csv('dataset/clean-data/cleaned_sales_data.csv', index=False)

print("ETL completed. Cleaned data saved to dataset/clean-data/cleaned_sales_data.csv")

ETL completed. Cleaned data saved to dataset/clean-data/cleaned_sales_data.csv


---

# Section 2

Section 2 content

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* In cases where you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create your folder here
  # os.makedirs(name='')
except Exception as e:
  print(e)
