<p style='text-align:center;
          font-family: sans-serif;
          font-weight:bold;
          color:black;
          font-size:40px;
          margin: 30px;'> Walmart Sales 
    <font color='#128EB3'; size='7'>Forecasting</font></p>
<p style="text-align:center;
          font-family: sans-serif;
          font-weight:bold;
          color:#616161;
          font-size:20px;
          margin: 30px;">PyCaret library 📈</p>

<h3 style="text-align: left; background-color: #128EB3; font-family:Trebuchet MS; color: white; padding: 14px; line-height: 1; border-radius:10px"> Notebook goals🔎</h3>

In this first version of the project the objective will be: 

* Predicting store-level sales behavior through time series modeling with some of the libraries available in Python

**Note: I have divided the project into several notebooks for each of the models used for sales forecasting. I leave the link below:**


* [Exploratory Data Analysis](https://www.kaggle.com/code/armandodelahoya/walmart-sales-forecasting-eda/notebook)
* [Prophet model](https://www.kaggle.com/code/armandodelahoya/walmart-sales-forecasting-prophet-model/notebook)
* [Neural prophet model](https://www.kaggle.com/code/armandodelahoya/walmart-sales-forecasting-neural-model/notebook)
* [PyCaret library](https://www.kaggle.com/code/armandodelahoya/walmart-sales-forecasting-pycaret-library/notebook)
* [Power BI Dashboard](https://app.powerbi.com/view?r=eyJrIjoiNWFjYWU0ZTktNjFiYS00Zjg3LTg0M2MtOTMzM2M0MjQ5OThmIiwidCI6IjAyNDlhNTcxLWI5YTItNGNhMi1iOTNiLTIwYzc3MDg4ZjA4YiJ9)
* [Github repository](https://github.com/ArmandoLazalde/Walmart-Sales-Forecasting)

<a id="top"></a>
<div class="list-group" id="list-tab" role="tablist">
    <h3 style="text-align: left; background-color: #128EB3; font-family:Trebuchet MS; color: white; padding: 14px; line-height: 1; border-radius:10px"><b>Table of Contents</b></h3>
    
* [1. Import necessary libraries](#1)
* [2. Best store analysis](#2)
* [3. All stores analysis](#3)

<a id="1"></a>
<h3 style="text-align: left;
           background-color: #128EB3;
           font-family:Trebuchet MS; 
           color: white; 
           padding: 14px; 
           line-height: 1; 
           border-radius:10px"> 1. Import necessary libraries
</h3>

In [None]:
pip install pycaret-ts-alpha

In [None]:
import numpy as np
import pandas as pd
from datetime import datetime
from zipfile import ZipFile

#pycaret
def what_is_installed():
    from pycaret import show_versions
    show_versions()

try:
    what_is_installed()
except ModuleNotFoundError:
    !pip install pycaret
    what_is_installed()
    
import time
from pycaret.time_series import TSForecastingExperiment
from pycaret.time_series import *

#Metrics
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, mean_squared_error, r2_score

In [None]:
with ZipFile('/kaggle/input/walmart-recruiting-store-sales-forecasting/features.csv.zip') as f:
    f.extractall(path='walmart-recruiting-store-sales-forecasting')
    
with ZipFile('/kaggle/input/walmart-recruiting-store-sales-forecasting/sampleSubmission.csv.zip') as f:
    f.extractall(path='walmart-recruiting-store-sales-forecasting') 
    
with ZipFile('/kaggle/input/walmart-recruiting-store-sales-forecasting/test.csv.zip') as f:
    f.extractall(path='walmart-recruiting-store-sales-forecasting') 
    
with ZipFile('/kaggle/input/walmart-recruiting-store-sales-forecasting/train.csv.zip') as f:
    f.extractall(path='walmart-recruiting-store-sales-forecasting') 

In [None]:
train_df = pd.read_csv('/kaggle/working/walmart-recruiting-store-sales-forecasting/train.csv')
features_df = pd.read_csv('/kaggle/working/walmart-recruiting-store-sales-forecasting/features.csv') #index_col='Date')
stores_df = pd.read_csv('/kaggle/input/walmart-recruiting-store-sales-forecasting/stores.csv')
test_df = pd.read_csv('/kaggle/working/walmart-recruiting-store-sales-forecasting/test.csv')

In [None]:
data = train_df.merge(features_df, on=['Store', 'Date', 'IsHoliday'], how = 'inner')
final_df = data.merge(stores_df, on=['Store'], how='inner')
final_df.head(5)

In [None]:
final_df["Date"] = pd.to_datetime(final_df["Date"], dayfirst=True)

<a id="1"></a>
<h3 style="text-align: left;
           background-color: #128EB3;
           font-family:Trebuchet MS; 
           color: white; 
           padding: 14px; 
           line-height: 1; 
           border-radius:10px"> 2. Best store analysis
</h3>

In [None]:
#Store 20 analysis
store20 = final_df[final_df['Store'] == 20]
store20 = store20[['Date','Weekly_Sales']]
store20 = store20.groupby('Date').sum()
store20

In [None]:
# We want to forecast the next 23 weeks of data and we will use 3 fold cross-validation to test the models.
fh = 23
fold = 3

# Global Plot Settings
fig_kwargs={'renderer': 'notebook'}

In [None]:
eda = TSForecastingExperiment()
eda.setup(store20, fh=fh, fig_kwargs=fig_kwargs)

In [None]:
eda.plot_model()

In [None]:
# Classical decomposition
eda.plot_model(plot="decomp", data_kwargs={'seasonal_period': 12})

In [None]:
# Show the train-test splits on the dataset
# Internally split - len(fh) as test set, remaining used as test set
eda.plot_model(plot="train_test_split", fig_kwargs={"height": 400, "width": 900})

# Show the Cross Validation splits inside the train set
# The blue dots represent the training data for each fold.
# The orange dots represent the validation data for each fold
eda.plot_model(plot="cv", fig_kwargs={"height": 400, "width": 900})

In [None]:
exp = TSForecastingExperiment()
exp.setup(store20, fh=fh, fig_kwargs=fig_kwargs) 

In [None]:
#### Initial Training ----
model = exp.create_model("prophet", freq='W-Fri')
exp.plot_model(model)

In [None]:
exp.predict_model(model)

In [None]:
#Final model
final_model = exp.finalize_model(model)
exp.plot_model(final_model)
exp.predict_model(final_model)

In [None]:
_ = exp.save_model(final_model, "my_final_model")

In [None]:
exp_load = TSForecastingExperiment()
loaded_model = exp_load.load_model("my_final_model")

In [None]:
# Should match predictions from before the save and load
exp_load.predict_model(loaded_model)

In [None]:
# Example here shows forecasting out 36 weeks 
exp.plot_model(estimator=final_model, data_kwargs={'fh': 36}) 

In [None]:
#exp.models()
exp.compare_models()

<a id="3."></a>
<h3 style="text-align: left; background-color: #128EB3; font-family:Trebuchet MS; color: white; padding: 14px; line-height: 1; border-radius:6px"> 3. Analysis for all stores</h3>

In [None]:
results = {}

for store in final_df['Store'].unique():
    
    #Analysis per store
    store_x = final_df[final_df['Store'] == store]
    store_x = store_x[['Date','Weekly_Sales']]
    store_x = store_x.groupby('Date').sum()
    
    fh = 23
    fold = 3

    exp = TSForecastingExperiment()
    exp.setup(store_x, fh=fh)
    model = exp.create_model("prophet", freq='W-Fri')
    
    ypred = exp.predict_model(model)
    
    y_true = store_x['Weekly_Sales'].tail(23).values
    y_pred = ypred['y_pred'].values
    
    mape = (mean_absolute_percentage_error(y_true, y_pred).round(4))*100
    mae = mean_absolute_error(y_true, y_pred).round(4)
    
    results[store] = [mape, mae]

In [None]:
results = pd.DataFrame(results)
results = results.transpose()
results.reset_index(inplace=True)
results.columns = ['Store', 'MAPE', 'MAE']
results

In [None]:
results.to_csv("pycaret_results.csv", index=False)