# <span style="color:red">Beverages</span> <span style="color:blue">Business in Australia</span>

<img src="https://raw.githubusercontent.com/aperezace20/Data-Science-Retail-Forecasting/main/Week%209%3A%20Deliverables/flag-australia-truck-inscription-supply-chain-concept-cargo-transportation-logistics-254500107.jpg" alt="Alt text" width="400"/>





<b> Problem Statement: </b>The large company who is into beverages business in Australia. They sell their products through various super-markets and also engage into heavy promotions throughout the year. Their demand is also influenced by various factors like holiday, seasonality. They needed forecast of each of products at item level every week in weekly buckets.

<b> Challenges: </b>The time series data showed a range of patterns, some with trends, some seasonal, and some with neither. At the time, they were using their own software, written in-house, but it often produced forecasts that did not seem sensible. Company wanted to explore power of AI/ML based forecasting to replace their in house local solution

<h2 style="background-color:white; color:red; text-align:center;"><b>Analyzing the Beverage Business in Australia</b></h2>



### <font color = #800000> 1.1: </font> <font color = #800000> Importing Python Modules</font>

In [1]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objs as go
from plotly.offline import iplot
import requests
import io 

### <font color = #800000> 1.2: </font> <font color = #800000> Importing a Database from GitHub</font>

In [2]:
url1 = "https://raw.githubusercontent.com/aperezace20/Data-Science-Retail-Forecasting/main/forecasting.csv"
download = requests.get(url1).content
forecasting = pd.read_csv(io.StringIO(download.decode('utf-8')))

### <font color = #800000> 1.3: </font> <font color = #800000> Data Cleaning Best Practices</font>

####  <font color = maroon> 1.3.1: Type of Data</font>

In [14]:
forecasting.dtypes

Product                       object
date                  datetime64[ns]
Sales                          int64
Price Discount (%)            object
In-Store Promo                 int64
Catalogue Promo                int64
Store End Promo                int64
Google_Mobility              float64
Covid_Flag                     int64
V_DAY                          int64
EASTER                         int64
CHRISTMAS                      int64
month                          int64
dtype: object

####  <font color = maroon> 1.3.2: Head & Tail </font>

In [5]:
forecasting.head()

Unnamed: 0,Product,date,Sales,Price Discount (%),In-Store Promo,Catalogue Promo,Store End Promo,Google_Mobility,Covid_Flag,V_DAY,EASTER,CHRISTMAS
0,SKU1,2/5/2017,27750,0%,0,0,0,0.0,0,0,0,0
1,SKU1,2/12/2017,29023,0%,1,0,1,0.0,0,1,0,0
2,SKU1,2/19/2017,45630,17%,0,0,0,0.0,0,0,0,0
3,SKU1,2/26/2017,26789,0%,1,0,1,0.0,0,0,0,0
4,SKU1,3/5/2017,41999,17%,0,0,0,0.0,0,0,0,0


In [6]:
forecasting.tail()

Unnamed: 0,Product,date,Sales,Price Discount (%),In-Store Promo,Catalogue Promo,Store End Promo,Google_Mobility,Covid_Flag,V_DAY,EASTER,CHRISTMAS
1213,SKU6,10/18/2020,96619,54%,0,1,0,-7.56,1,0,0,0
1214,SKU6,10/25/2020,115798,52%,0,1,0,-8.39,1,0,0,0
1215,SKU6,11/1/2020,152186,54%,1,0,1,-7.43,1,0,0,0
1216,SKU6,11/8/2020,26445,44%,1,0,1,-5.95,1,0,0,0
1217,SKU6,11/15/2020,26414,44%,0,0,0,-7.2,1,0,0,0


#### <font color = maroon> 1.3.3: Shape of Data</font>

In [7]:
forecasting.shape

(1218, 12)

#### <font color = maroon> 1.3.4: Missing Values</font>

In [8]:
forecasting.isnull().sum()

Product               0
date                  0
Sales                 0
Price Discount (%)    0
In-Store Promo        0
Catalogue Promo       0
Store End Promo       0
Google_Mobility       0
Covid_Flag            0
V_DAY                 0
EASTER                0
CHRISTMAS             0
dtype: int64

#### <font color = maroon> 1.3.5: Data Size</font>

In [10]:
forecasting.size

14616

#### <font color = maroon> 1.3.6: Split by Month & Adding Column Month</font>

In [11]:
forecasting['date'] = pd.to_datetime(forecasting['date'])

def get_month(dt):
    return dt.month

forecasting['month'] = forecasting['date'].apply(get_month)

print(forecasting)


     Product       date   Sales Price Discount (%)  In-Store Promo  \
0       SKU1 2017-02-05   27750                 0%               0   
1       SKU1 2017-02-12   29023                 0%               1   
2       SKU1 2017-02-19   45630                17%               0   
3       SKU1 2017-02-26   26789                 0%               1   
4       SKU1 2017-03-05   41999                17%               0   
...      ...        ...     ...                ...             ...   
1213    SKU6 2020-10-18   96619                54%               0   
1214    SKU6 2020-10-25  115798                52%               0   
1215    SKU6 2020-11-01  152186                54%               1   
1216    SKU6 2020-11-08   26445                44%               1   
1217    SKU6 2020-11-15   26414                44%               0   

      Catalogue Promo  Store End Promo  Google_Mobility  Covid_Flag  V_DAY  \
0                   0                0             0.00           0      0   
1  

#### <font color = maroon> 1.3.7: Checking the head</font>

In [12]:
forecasting.head()

Unnamed: 0,Product,date,Sales,Price Discount (%),In-Store Promo,Catalogue Promo,Store End Promo,Google_Mobility,Covid_Flag,V_DAY,EASTER,CHRISTMAS,month
0,SKU1,2017-02-05,27750,0%,0,0,0,0.0,0,0,0,0,2
1,SKU1,2017-02-12,29023,0%,1,0,1,0.0,0,1,0,0,2
2,SKU1,2017-02-19,45630,17%,0,0,0,0.0,0,0,0,0,2
3,SKU1,2017-02-26,26789,0%,1,0,1,0.0,0,0,0,0,2
4,SKU1,2017-03-05,41999,17%,0,0,0,0.0,0,0,0,0,3


#### <font color = maroon> 1.3.8: Groupby Months & Sales 

In [15]:
forecasting.groupby('month')['Sales'].sum()

month
1     1593297
2     2519111
3     2896089
4     2703389
5     2935247
6     3741038
7     3155868
8     3368431
9     3904996
10    3948299
11    3293141
12    2840013
Name: Sales, dtype: int64

#### <font color = maroon> 1.3.9: Bar Graph Months & Sales 