### TASK #1: Understand The Problem Statement/Goal

- This dataset contains weekly sales from 99 departments belonging to 45 different stores. 
- Our aim is to forecast weekly sales from a particular department.
- The objective of this case study is to forecast weekly retail store sales based on historical data.
- The data contains holidays and promotional markdowns offered by various stores and several departments throughout the year.
- Markdowns are crucial to promote sales especially before key events such as Super Bowl, Christmas and Thanksgiving. 
- Developing accurate model will enable make informed decisions and make recommendations to improve business processes in the future. 
- The data consists of three sheets: 
    - Stores
    - Features
    - Sales
- Data Source : https://www.kaggle.com/manjeetsingh/retaildataset

### # TASK #2: Import Datasets And Libraries

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import zipfile

In [3]:
# Import the csv files using pandas
features = pd.read_csv('source_data/Features_data_set.csv')
sales = pd.read_csv('source_data/sales_data_set.csv')
stores = pd.read_csv('source_data/stores_data_set.csv')

In [4]:
# Let's explore the 3 dataframes
    # 'stores' dataframe contains information related to the 45 stores such as type and size of store
print(stores.shape)
stores

(45, 3)


Unnamed: 0,Store,Type,Size
0,1,A,151315
1,2,A,202307
2,3,B,37392
3,4,A,205863
4,5,B,34875
5,6,A,202505
6,7,B,70713
7,8,A,155078
8,9,B,125833
9,10,B,126512


In [5]:
# Let's explore the 'features' dataframe
    # Features dataframe contains additional data related to the store, department, and regional activity for the given dates.
    # Store: store number
    # Date: week
    # Temperature: average temperature in the region
    # Fuel_Price: cost of fuel in the region
    # MarkDown1-5: anonymized data related to promotional markdowns.
    # CPI: consumer price index.
    # Unemployment: unemployment rate.
    # IsHoliday: wheter the week is a special holiday week or not.

print(features.shape)
features

(8190, 12)


Unnamed: 0,Store,Date,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,IsHoliday
0,1,05/02/2010,42.31,2.572,,,,,,211.096358,8.106,False
1,1,12/02/2010,38.51,2.548,,,,,,211.242170,8.106,True
2,1,19/02/2010,39.93,2.514,,,,,,211.289143,8.106,False
3,1,26/02/2010,46.63,2.561,,,,,,211.319643,8.106,False
4,1,05/03/2010,46.50,2.625,,,,,,211.350143,8.106,False
...,...,...,...,...,...,...,...,...,...,...,...,...
8185,45,28/06/2013,76.05,3.639,4842.29,975.03,3.00,2449.97,3169.69,,,False
8186,45,05/07/2013,77.50,3.614,9090.48,2268.58,582.74,5797.47,1514.93,,,False
8187,45,12/07/2013,79.37,3.614,3789.94,1827.31,85.72,744.84,2150.36,,,False
8188,45,19/07/2013,82.84,3.737,2961.49,1047.07,204.19,363.00,1059.46,,,False


In [6]:
# Let's explore the "sales" dataframe
    # "Sales" dataframe contains historical sales data, which covers 2010-02-05 to 2012-11-01
    # Store: store number
    # Dept: department number
    # Date: the week
    # Weekly_Sales: sales for the given department in the given store
    # IsHoliday: wheter the week is a special holiday week

print(sales.shape)
sales

(421570, 5)


Unnamed: 0,Store,Dept,Date,Weekly_Sales,IsHoliday
0,1,1,05/02/2010,24924.50,False
1,1,1,12/02/2010,46039.49,True
2,1,1,19/02/2010,41595.55,False
3,1,1,26/02/2010,19403.54,False
4,1,1,05/03/2010,21827.90,False
...,...,...,...,...,...
421565,45,98,28/09/2012,508.37,False
421566,45,98,05/10/2012,628.10,False
421567,45,98,12/10/2012,1061.02,False
421568,45,98,19/10/2012,760.01,False
