<img alt="Colaboratory logo" width="15%" src="https://raw.githubusercontent.com/carlosfab/escola-data-science/master/img/novo_logo_bg_claro.png">

#### **Data Science na Prática 3.0**


---

# Wine demand prediction

Wine is, without a doubt, one of the most popular alcoholic beverages in the world. Although scientists are still unsure about the benefits of wine, the idea that having "one glass of wine a day" can help prevent heart attacks is somewhat widespread in society<sup><a href="https://www.mayoclinic.org/diseases-conditions/heart-disease/in-depth/red-wine/art-20048281">1</a></sup>. Moreover, the wine industry was one of the many things the COVID-19 pandemic had an impact on, with wine sales (especially online) spiking over 500% at the peak period of lockdown in April 2020<sup><a href="https://www.forbes.com/sites/joemicallef/2021/06/22/how-post-pandemic-wine-consumption-trends-are-shaping--demand/">2</a></sup>. Without such an atypical reason for this rise sales, specialized wine stores might be able to predict their sales demand given that they have enough data on their sales over time.

<p align=center>
<img src="img/wine.jpg" width="40%"><br>
<i><sup>Image credits: wavebreakmedia-micro @ <a href="https://www.freepik.com/free-photo/glass-red-wine-bottle-bar-counter_8405671.htm">freepik</a>.</sup></i>
</p>

Such a representation of data is called a Time Series, and reflects a specific sequence of values measured along successive evenly spaced time points<sup><a href="https://en.wikipedia.org/wiki/Time_series">3</a></sup>. Predictions on Time Series — called **Time Series Forecasting** — is a form of *extrapolating* values in the future by leveraging previous data of interest and requires a specific methodology.

In this notebook, we will be predicting wine sales demand using [Facebook's Prophet](https://facebook.github.io/prophet/). Similar to PyCaret, Prophet is a low-code library that provides a strategy for working with time series forecasting based on an additive model that even takes into consideration holiday effects. It is also robust to missing data points and trend shifts, as well as able to handle outliers well<sup><a href="https://facebook.github.io/prophet/">4</a></sup>.


## Demand forecasting and why it is important

**Demand forecasting** is, essentially, predicting the demand for products/materials that meet the company needs, allowing for the maintenance of an optimal profit margin. It is constructed upon studying patterns in market dynamics, cause-effect relationships and trends of sales. And because change constantly affect businesses, it is important for them to be able to predict this demand<sup><a href="https://startup.info/why-demand-forecasting-is-important/">5</a></sup>.

On the other hand, **demand planning* is the process in which the business prepares itself for the *foreseeable* future, based on the forecast being made. With this, companies are able to stimulate consumer demand while also allowing the company to effectively meet the growing demand<sup><a href="https://startup.info/why-demand-forecasting-is-important/">5</a></sup>.

There are several benefits to be gained from time-series forecasting:

* **Price setting:** Predicting product popularity (or a lack thereof) allows for price adjustment (and setting up sales) which can have a great impact on revenue.
* **Budget:** Forecasting demand allows you to estimate the company budget, enabling you to reduce risk and make financial decisions, such as reallocating resources.
* **Improved supply infrastructure**: Knowing the business' demands allows for better control of inventory, of personnel needed.
* **Reducing uncertainty**: Uncertainty makes decision-making much more difficult, and know what to expect can reduce this impact and allow better planning.

Now that we discussed this, let's dive in.

# The Data

The dataset used in the project today was a synthetic dataset created by Rafael at [Sigmoidal](https://sigmoidal.ai). It consists of a dataset from Kaggle which was modified to have 3 years of daily sales, distributed by 3 stores and 219 different products (modified from 5 years of daily sales, from 10 stores and 50 products). 

The products, though, are real and based on a real e-commerce offer of wines. Names, crops and prices are 100% real (in BRL and converted to USD). Some of the products might have appeared in the real market after a given start in sales for that product (for example, a wine from a 2015 crop started being sold in 2013, which makes no sense), but this is just the artifact of this artificial dataset. This might also be reflected in other unexpected characteristics like the trend in sales for a wine that is not very popular, for example.

In [3]:
# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pandas_profiling import ProfileReport
from pywaffle import Waffle
from prophet import Prophet


# Getting the data
df_prod = pd.read_csv("data/products.csv")
df_sale = pd.read_csv("data/sales-clean.csv")

# Life, the Universe, and Everything
np.random.seed(42)

# Defining plot parameters
# plt.style.use('dark_background')
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = 'Arial'
plt.rcParams['font.stretch'] = 'normal'
plt.rcParams['font.style'] = 'normal'
plt.rcParams['font.variant'] = 'normal'

In [4]:
# Seeing first entries of products dataset
df_prod.head()

Unnamed: 0,item_id,name,producer,country,region,vintage,kind,price_brl,price_usd
0,1,Philipponnat Royale Reserve Brut,Philipponat,France,Champagne,NV,sparkling,339.6,58.75
1,2,Philipponnat Royale Reserve Rosé,Philipponat,France,Champagne,NV,rose sparkling,489.0,84.6
2,3,Philipponnat Cuvée 1522 Grand Cru Extra Brut,Philipponat,France,Champagne,2009,sparkling,789.0,136.51
3,4,Philipponnat Cuvée 1522 1er Cru Rosé,Philipponat,France,Champagne,2008,rose sparkling,899.4,155.61
4,5,Philipponnat Clos Des Goisses Brut,Philipponat,France,Champagne,2009,sparkling,1548.0,267.82


In [5]:
# Seeing first entries of sales dataset
df_sale.head()

Unnamed: 0,date,store,item,sales
0,2018-01-01,1,1,13
1,2018-01-02,1,1,11
2,2018-01-03,1,1,14
3,2018-01-04,1,1,13
4,2018-01-05,1,1,10


## References

1: https://www.mayoclinic.org/diseases-conditions/heart-disease/in-depth/red-wine/art-20048281

2: https://www.forbes.com/sites/joemicallef/2021/06/22/how-post-pandemic-wine-consumption-trends-are-shaping--demand/?sh=6fa845857fe9

3: https://en.wikipedia.org/wiki/Time_series

4: https://facebook.github.io/prophet/

5: https://startup.info/why-demand-forecasting-is-important/