<img src='../img/logo.png' alt='DS Market logo' height='150px'>

# Sales forecast

## Table of Contents

* [A. Introduction](#introduction)
* [B. Importing Libraries](#libraries)
* [C. Importing data](#data)

## A. Introduction <a class="anchor" id="introduction"></a>

DSMarket has always been depending on rudimentary approaches to forecast product sales. The current process works by obtaining the aggregated sales per department / store / city and add up the independent predictions.

The idea is to provide some forecasting and predictions over 28 days of data (4 weeks).

## B. Importing Libraries <a class="anchor" id="libraries"></a>

In [82]:
# system and path management
import sys
sys.path.append('../scripts') # including helper functions inside the scripts folder

# removing system warnings
import warnings
warnings.filterwarnings('ignore')

# data manipulation
import pandas as pd
import numpy as np

# plotting
import matplotlib.pyplot as plt

# plotting options
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams["figure.figsize"] = (10, 7)

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.options.display.float_format = '{:,.2f}'.format

# helper functions
import outlier_management as outliers
import file_management

## C. Importing Data <a class="anchor" id="data"></a>

In [83]:
# downloading the processed data files from gdrive
directory = '../data/processed/'
urls = [
    {'filename': 'sales_processed.csv', 'url': 'https://drive.google.com/file/d/1JdeAgraKcaFQJrjG2HPVb5D0VD0iTlNB/view?usp=sharing'},
    {'filename': 'prices_processed.csv', 'url': 'https://drive.google.com/file/d/1pSEJAQfAU-owDjKmxcPrxf3CpGFivwa6/view?usp=sharing'},
    {'filename': 'calendar_processed.csv', 'url': 'https://drive.google.com/file/d/1Lnji96iBkTpFiWo-QXeW3TvESiNYWCML/view?usp=sharing'}
]
        
file_management.download_files_from_url(urls, directory)

sales = pd.read_csv(directory + 'sales_processed.csv', index_col = 0)
prices = pd.read_csv(directory + 'prices_processed.csv', index_col = 0)
calendar = pd.read_csv(directory + 'calendar_processed.csv', index_col = 0)

sales_processed.csv file already exists in ../data/processed/
prices_processed.csv file already exists in ../data/processed/
calendar_processed.csv file already exists in ../data/processed/


In [84]:
# downloading the feature file from gdrive
directory = '../data/features/'
urls = [
    {'filename': 'sales_by_date.csv', 'url': 'https://drive.google.com/file/d/1JMy2pJUp7DscjnY3_vhCNM7NZk9Th4i9/view?usp=sharing'},
    {'filename': 'sales_by_date_store.csv', 'url': 'https://drive.google.com/file/d/17Na9Eyj_NUGt9Uial1Oepwn8neUTXMmp/view?usp=sharing'},
    {'filename': 'sales_by_date_city.csv', 'url': 'https://drive.google.com/file/d/1Psykw5DZ7JfQkHYcajVW2ZlcaYoj2rmd/view?usp=sharing'},
]
        
file_management.download_files_from_url(urls, directory)

sales_by_date = pd.read_csv(directory + 'sales_by_date.csv', index_col = 0)
sales_by_date_store = pd.read_csv(directory + 'sales_by_date_store.csv', index_col = 0)
sales_by_date_city = pd.read_csv(directory + 'sales_by_date_city.csv', index_col = 0)

master.csv file already exists in ../data/features/
