# Project Introduction

In this project, I aim to create a model that predicts sales and suggests products for my cafe. To achieve this, I have collected various datasets, including:

- **Item Sales and Daily Sales**: Data from my cafe, available in CSV format.
- **Weather Data for Montreal**: Collected from APIs.
- **Macroeconomic Indicators**: Collected from APIs.
- **Local Holidays for Quebec**: Collected from APIs.
- **Pedestrianization Data**: Manual input (after concatenating all datasets)

The goal is to utilize these datasets to build a predictive model. At this stage, I have completed the initial data collection.

## Data Collection

### Initial Fetching - Preprocessing

#### Imports

In [76]:
import importlib
import pandas as pd


from scripts import data_fetching as df

importlib.reload(df)

<module 'scripts.data_fetching' from '/Users/vasilisvc6/Documents/Le grand cormoran project/scripts/data_fetching.py'>

#### Local holidays in QC

In [65]:
holidays = df.local_holidays_fetch()

In [66]:
holidays

Unnamed: 0_level_0,Name
Date,Unnamed: 1_level_1
2023-10-04,Feast of St Francis of Assisi
2023-10-06,Hoshana Rabbah
2023-10-07,Shemini Atzeret
2023-10-08,Simchat Torah
2023-10-09,Thanksgiving Day
...,...
2024-08-15,Assumption of Mary
2024-09-02,Labour Day
2024-09-16,Milad un Nabi (Mawlid)
2024-09-22,September Equinox


#### Daily sales

In [77]:
sales = df.merge_all_sales('data/Sales')

In [78]:
sales

Unnamed: 0,Gross Sales,Net Sales
2023-10-01,863.50,852.98
2023-10-02,591.00,585.47
2023-10-03,506.45,504.45
2023-10-04,404.65,402.85
2023-10-05,414.70,413.62
...,...,...
2024-10-27,1698.70,1687.55
2024-10-28,1156.35,1155.11
2024-10-29,951.05,948.76
2024-10-30,1074.75,1072.76


#### Monthly item sales

In [79]:
item_sales = df.merge_all_sales('data/Item Sales')

Parsed dates: 2023-11-01 00:00:00
Parsed dates: 2024-02-01 00:00:00
Parsed dates: 2023-12-01 00:00:00
Parsed dates: 2024-10-01 00:00:00
Parsed dates: 2024-07-01 00:00:00
Parsed dates: 2024-01-01 00:00:00
Parsed dates: 2024-05-01 00:00:00
Parsed dates: 2024-09-01 00:00:00
Parsed dates: 2024-03-01 00:00:00
Parsed dates: 2023-10-01 00:00:00
Parsed dates: 2024-08-01 00:00:00
Parsed dates: 2024-04-01 00:00:00
Parsed dates: 2024-06-01 00:00:00


In [80]:
item_sales

Unnamed: 0_level_0,Category Name,Name,Gross Sales,Net Sales,Sold
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-10-01,Desserts,Choux A La Creme,139.40,139.40,41
2023-10-01,Desserts,Tarte au chocolat,82.50,82.50,11
2023-10-01,Desserts,Chia Pudding,63.00,63.00,7
2023-10-01,Desserts,Croissant Aux Amandes,51.35,51.35,13
2023-10-01,Desserts,Tarte De Yuzu,34.00,34.00,4
...,...,...,...,...,...
2024-10-01,Desserts,Brioche Citron Noir,106.25,105.61,25
2024-10-01,Desserts,Biscuit Au Caramel,128.80,128.80,28
2024-10-01,Desserts,Gâteau au citron,141.90,141.57,43
2024-10-01,Sans Café - Without Coffee,Earl Grey,237.80,236.37,58


#### Macroeconomic indicators

In [71]:
gdp, cpi, unemployment, bond_yields = df.macroeconomic_fetch_fred()

In [72]:
gdp

2023-10-01    589018.5000
2024-01-01    591591.8125
2024-04-01    594729.3125
dtype: float64

In [73]:
cpi

2023-10-01    3.120936
2023-11-01    3.116883
2023-12-01    3.396473
2024-01-01    2.858999
2024-02-01    2.783171
2024-03-01    2.897618
2024-04-01    2.685422
2024-05-01    2.866242
2024-06-01    2.671756
2024-07-01    2.530044
2024-08-01    1.953371
2024-09-01    1.640379
dtype: float64

In [74]:
unemployment

2023-10-01    5.7
2023-11-01    5.8
2023-12-01    5.8
2024-01-01    5.7
2024-02-01    5.8
2024-03-01    6.1
2024-04-01    6.1
2024-05-01    6.2
2024-06-01    6.4
2024-07-01    6.4
2024-08-01    6.6
2024-09-01    6.5
2024-10-01    6.5
dtype: float64

In [75]:
bond_yields

2023-10-01    4.062000
2023-11-01    3.710952
2023-12-01    3.234211
2024-01-01    3.346364
2024-02-01    3.504000
2024-03-01    3.444000
2024-04-01    3.695909
2024-05-01    3.641818
2024-06-01    3.391500
2024-07-01    3.407727
2024-08-01    3.071500
2024-09-01    2.944444
2024-10-01    3.186364
dtype: float64