# Retail Sales Data Analysis

In this project, we will be exploring and analyzing a historical retail sales dataset that details store-level performance, promotional activities, and seasonal effects over a defined period. The dataset includes information on various stores and their departments, featuring records of weekly sales, store characteristics, and a range of contextual factors such as regional conditions, holidays, and markdown events.

By examining these data, we aim to gain insights into the relationships between promotional markdowns, holiday-driven consumer behavior, and overall sales outcomes across different stores and departments. This exploratory and analytical work will serve as a foundation for understanding the dataset’s intricacies and will guide subsequent modeling and decision-making processes.

The dataset can be found [here](https://www.kaggle.com/datasets/manjeetsingh/retaildataset 'Original Dataset from Manjeet Singh') on Kaggle.com by Manjeet Singh.

In [1]:
# Here we are just importing some libraries for use in Python

import numpy as np
import pandas as pd
from pathlib import Path
import seaborn as sns
import matplotlib.pyplot as plt

# We set the default theme for plots to Seaborn

sns.set_theme()

In [None]:
# We create path objects using pathlib

features_data_file_path = Path('data/features_data_set.csv')
sales_data_file_path = Path('data/sales_data_set.csv')
stores_data_file_path = Path('data/stores_data_set.csv')

# We use those file paths and Panda's read_csv to create DataFrames

features_df = pd.read_csv(features_data_file_path)
sales_df = pd.read_csv(sales_data_file_path)
stores_df = pd.read_csv(stores_data_file_path)

# Let's go ahead and view some informatin about our DataFrames

print(features_df.info())
print(sales_df.info())
print(stores_df.info())

In [None]:
# Let's convert some of the data types before we work with the datasets

features_df['Date'] = pd.to_datetime(features_df['Date'], format="%d/%m/%Y")
sales_df['Date'] = pd.to_datetime(sales_df['Date'], format="%d/%m/%Y")
stores_df["Type"] = stores_df["Type"].astype('string')

# We will print out the data types again so that we can verify the changes

print(features_df['Date'].info())
print(sales_df['Date'].info())
print(stores_df["Store"].info())

In [None]:
# Let's go ahead and merge the tables so that we can have all of the information in one DataFrame

all_tables_merged_df = sales_df.merge(
    features_df,
    on=['Store', 'Date', 'IsHoliday'],
    how='left'
    ).merge(
        stores_df,
        on='Store',
        how='left'
        )

In [None]:
# Let's make sure the merge worked by previewing our DataFrame

all_tables_merged_df.head()

In [5]:
# We are going to take a look at the average weekly sales by store
# First, we will use groupby to group the data according to the Store number

grouped_by_department_df = all_tables_merged_df.groupby('Store').Weekly_Sales.mean().reset_index()

# The mean that is returned by the function is more precise than we really need it to be
# We round the numbers so that they look more like regular currency values

grouped_by_department_df['Weekly_Sales'] = round(grouped_by_department_df['Weekly_Sales'], 2)

# Finally, we will merge the store_df table with the new table so that we can compare the different
# store types, sizes, and average weekly sales

avg_weekly_sales_by_store = pd.merge(
    grouped_by_department_df,
    stores_df,
    on='Store',
    how='left'
)

In [None]:
# Let's preview our DataFrame to make sure everything has gone properly so far

avg_weekly_sales_by_store.head()

In [None]:
sns.catplot(avg_weekly_sales_by_store, x="Store", y="Weekly_Sales",hue="Type", kind='bar', aspect=5)