# Forecasting Retail Sales using Time Series Models.



DESCRIPTION:
The objective of this project is to develop a robust time series forecasting model that can accurately predict the unit sales for various items sold in Favorita stores. Favorita is a leading grocery retailer based in Ecuador, and the company's sales data will be used to build the model. The data set contains sales data for thousands of items sold at different stores, making it a challenging and complex problem.

To build an accurate forecasting model, various statistical and machine learning techniques will be used. The data will be preprocessed to ensure that it is clean, consistent, and in the right format for analysis. Exploratory data analysis (EDA) techniques will be employed to gain a deeper understanding of the data and identify any patterns or trends in the sales data.

Time series forecasting models, such as ARIMA, Linear regression and a few others will be used to predict store sales. These models will be trained using historical sales data, and their accuracy will be evaluated using various performance metrics such as root mean square logarithmic error, root mean squared error (RMSE), and mean square error (MSE).

The results obtained from the forecasting models will be analyzed, and insights gained from the analysis will be used to make informed business decisions. The insights could include identifying which products have high demand at different times of the year, which stores perform better than others, and identifying any trends in sales data that could inform marketing and inventory decisions.

Overall, the project aims to build a reliable time series forecasting model that can help Favorita optimize their sales strategies and improve their bottom line.


### Data importation

In [1]:
# pip install pyodbc
# pip install sqlalchemy
# pip install lightgbm
# pip install catboost
# pip install python-dotenv



In [22]:
# Libraries Importation
import pyodbc
import sqlalchemy as sa
import pandas as pd
from dotenv import dotenv_values
import warnings
warnings.filterwarnings("ignore")


In [23]:
#Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')


# Get the values for the credentials you set in the '.env' file
database = environment_variables.get("database")
server = environment_variables.get("server")
username = environment_variables.get("username")
password = environment_variables.get("password")

In [11]:
# #create a connection to the server
# conn = pyodbc.connect('DRIVER={ODBC Driver 18 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
# Check if any of the variables is None
if server is None or database is None or username is None or password is None:
    print("One or more connection parameters are missing.")
else:
    # Create a connection to the server
    conn = pyodbc.connect('DRIVER={ODBC Driver 18 for SQL Server};SERVER='+str(server)+';DATABASE='+str(database)+';UID='+str(username)+';PWD='+str(password))


In [12]:
#define the tables
table1 = 'dbo.oil'
table2 = 'dbo.holidays_events'
table3 = 'dbo.stores'

In [13]:
#creating  query to select the tables to be imported
query1 = f"SELECT * FROM {table1}"
df_oil = pd.read_sql(query1, conn)

query2 = f"SELECT * FROM {table2}"
df_holidays_events = pd.read_sql(query2, conn)

query3 = f"SELECT * FROM {table3}"
df_stores= pd.read_sql(query3, conn)

In [34]:
file_path1 = "C:/Users/lenovo/Regression_Time_Series/Assets/oil.csv"
file_path2 = "C:/Users/lenovo/Regression_Time_Series/Assets/holidays_events.csv"
file_path3 = "C:/Users/lenovo/Regression_Time_Series/Assets/stores.csv"
df_oil.to_csv( file_path1 , index=False)
df_holidays_events.to_csv( file_path2 , index=False)
df_stores.to_csv( file_path3 , index=False)

In [None]:
#close the connection
conn.close()

In [16]:
#importing local csv data
df_sample_submission=pd.read_csv('../Regression_Time_Series/Assets/sample_submission.csv')
df_test=pd.read_csv('../Regression_Time_Series/Assets/test.csv')
df_train=pd.read_csv('../Regression_Time_Series/Assets/train.csv')
df_transactions=pd.read_csv('../Regression_Time_Series/Assets/transactions.csv')

In [32]:
df_oil

Unnamed: 0,date,dcoilwtico
0,2013-01-01,
1,2013-01-02,93.139999
2,2013-01-03,92.970001
3,2013-01-04,93.120003
4,2013-01-07,93.199997
...,...,...
1213,2017-08-25,47.650002
1214,2017-08-28,46.400002
1215,2017-08-29,46.459999
1216,2017-08-30,45.959999


In [19]:
df_holidays_events

Unnamed: 0,date,type,locale,locale_name,description,transferred
0,2012-03-02,Holiday,Local,Manta,Fundacion de Manta,False
1,2012-04-01,Holiday,Regional,Cotopaxi,Provincializacion de Cotopaxi,False
2,2012-04-12,Holiday,Local,Cuenca,Fundacion de Cuenca,False
3,2012-04-14,Holiday,Local,Libertad,Cantonizacion de Libertad,False
4,2012-04-21,Holiday,Local,Riobamba,Cantonizacion de Riobamba,False
...,...,...,...,...,...,...
345,2017-12-22,Additional,National,Ecuador,Navidad-3,False
346,2017-12-23,Additional,National,Ecuador,Navidad-2,False
347,2017-12-24,Additional,National,Ecuador,Navidad-1,False
348,2017-12-25,Holiday,National,Ecuador,Navidad,False


In [20]:
df_stores

Unnamed: 0,store_nbr,city,state,type,cluster
0,1,Quito,Pichincha,D,13
1,2,Quito,Pichincha,D,13
2,3,Quito,Pichincha,D,8
3,4,Quito,Pichincha,D,9
4,5,Santo Domingo,Santo Domingo de los Tsachilas,D,4
5,6,Quito,Pichincha,D,13
6,7,Quito,Pichincha,D,8
7,8,Quito,Pichincha,D,8
8,9,Quito,Pichincha,B,6
9,10,Quito,Pichincha,C,15


In [21]:
df_sample_submission

Unnamed: 0,id,sales
0,3000888,0.0
1,3000889,0.0
2,3000890,0.0
3,3000891,0.0
4,3000892,0.0
...,...,...
28507,3029395,0.0
28508,3029396,0.0
28509,3029397,0.0
28510,3029398,0.0
