# Time Series Model for Monthly Shampoo Sales Using Python and ETS
### David Lowe
### April 29, 2020

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery. https://machinelearningmastery.com/

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Monthly Shampoo Sales dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

Additional Notes: This is a replication, with some small modifications, of Dr. Jason Brownlee's blog post, How to Grid Search Triple Exponential Smoothing for Time Series Forecasting in Python (https://machinelearningmastery.com/how-to-grid-search-triple-exponential-smoothing-for-time-series-forecasting-in-python/). I plan to leverage Dr. Brownlee's exponential smoothing or ETS (Error, Trend and Seasonality) tutorial examples and build an ETS-based notebook template for future uses.

INTRODUCTION: The problem is to forecast the monthly number of shampoo sales. The dataset described a time-series of monthly shampoo sales for three years, and there are 36 observations. We will use the first 24 observations for training the model while using the remaining 12 observations for testing the model.

ANALYSIS: The ETS model, which models multiplicative trend with no trend dampening, no BoxCox transform, and no bias removal, appeared to have the lowest RMSE at 83.72.

CONCLUSION: For this dataset, the chosen ETS model achieved a satisfactory result and should be considered for further modeling.

Dataset Used: Sales of shampoo over a three year period

Dataset ML Model: Time series forecast with numerical attributes

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

A time series predictive modeling project genrally can be broken down into about five major tasks:

1. Set up Environment
2. Inspect and Explore Data
3. Clean and Pre-Process Data
4. Fit and Evaluate Models
5. Finalize Model

## Task 1. Prepare Environment

### 1.a) Load Libraries

In [1]:
# Create the random seed number for reproducible results
seedNum = 888

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import math
import smtplib
import pmdarima as pm
import pandas_datareader.data as web
from datetime import datetime
from email.message import EmailMessage
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
from multiprocessing import cpu_count
from joblib import Parallel
from joblib import delayed
from warnings import catch_warnings
from warnings import filterwarnings

  from pandas.util.testing import assert_frame_equal


### 1.b) Set up the controlling parameters and functions

In [3]:
# Begin the timer for the script processing
startTimeScript = datetime.now()

# Set up the verbose flag to print detailed messages for debugging (setting True will activate!)
verbose = False

# Set up the flag to stop sending progress emails (setting to True will send status emails!)
notifyStatus = False

In [4]:
# Set up the email notification function
def email_notify(msg_text):
    sender = os.environ.get('MAIL_SENDER')
    receiver = os.environ.get('MAIL_RECEIVER')
    gateway = os.environ.get('SMTP_GATEWAY')
    smtpuser = os.environ.get('SMTP_USERNAME')
    password = os.environ.get('SMTP_PASSWORD')
    if sender==None or receiver==None or gateway==None or smtpuser==None or password==None:
        sys.exit("Incomplete email setup info. Script Processing Aborted!!!")
    msg = EmailMessage()
    msg.set_content(msg_text)
    msg['Subject'] = 'Notification from Python ARIMA Time Series Script'
    msg['From'] = sender
    msg['To'] = receiver
    server = smtplib.SMTP(gateway, 587)
    server.starttls()
    server.login(smtpuser, password)
    server.send_message(msg)
    server.quit()

In [5]:
if (notifyStatus): email_notify("Task 1. Prepare Environment has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

### 1.c) Acquire and Load the Data

In [6]:
# load the dataset
time_series = pd.read_csv('https://dainesanalytics.com/datasets/time-series-data-library/tsdl469.csv', index_col='idx', parse_dates=True)
data = time_series.values

In [7]:
# Set the train/test split ratio
n_test = 12

In [8]:
time_series.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 36 entries, 2001-01-01 to 2001-03-12
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   values  36 non-null     float64
dtypes: float64(1)
memory usage: 576.0 bytes


In [9]:
time_series.head()

Unnamed: 0_level_0,values
idx,Unnamed: 1_level_1
2001-01-01,266.0
2001-01-02,145.9
2001-01-03,183.1
2001-01-04,119.3
2001-01-05,180.3


In [10]:
time_series.tail()

Unnamed: 0_level_0,values
idx,Unnamed: 1_level_1
2001-03-08,407.6
2001-03-09,682.0
2001-03-10,475.3
2001-03-11,581.3
2001-03-12,646.9


In [11]:
# This section will be further developed in future iterations

In [12]:
if (notifyStatus): email_notify("Task 1. Prepare Environment completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

## Task 2. Inspect and Explore Data

In [13]:
if (notifyStatus): email_notify("Task 2. Inspect and Explore Data has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [14]:
# This section will be further developed in future iterations

In [15]:
if (notifyStatus): email_notify("Task 2. Inspect and Explore Data completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

## Task 3. Clean and Pre-Process Data

In [16]:
if (notifyStatus): email_notify("Task 3. Clean and Pre-Process Data has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [17]:
# This section will be further developed in future iterations

In [18]:
if (notifyStatus): email_notify("Task 3. Clean and Pre-Process Data completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

## Task 4. Fit and Evaluate Models

In [19]:
if (notifyStatus): email_notify("Task 4. Fit and Evaluate Models has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [20]:
# one-step Holt Winter’s Exponential Smoothing forecast
def exp_smoothing_forecast(history, config):
	t,d,s,p,b,r = config
	# define model
	history = np.array(history)
	model = ExponentialSmoothing(history, trend=t, damped=d, seasonal=s, seasonal_periods=p)
	# fit model
	model_fit = model.fit(optimized=True, use_boxcox=b, remove_bias=r)
	# make one step forecast
	yhat = model_fit.predict(len(history), len(history))
	return yhat[0]

# root mean squared error or rmse
def measure_rmse(actual, predicted):
	return math.sqrt(mean_squared_error(actual, predicted))

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
	return data[:-n_test], data[-n_test:]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
	predictions = list()
	# split dataset
	train, test = train_test_split(data, n_test)
	# seed history with training dataset
	history = [x for x in train]
	# step over each time-step in the test set
	for i in range(len(test)):
		# fit model and make forecast for history
		yhat = exp_smoothing_forecast(history, cfg)
		# store forecast in list of predictions
		predictions.append(yhat)
		# add actual observation to history for the next loop
		history.append(test[i])
	# estimate prediction error
	error = measure_rmse(test, predictions)
	return error

# score a model, return None on failure
def score_model(data, n_test, cfg, debug=False):
	result = None
	# convert config to a key
	key = str(cfg)
	# show all warnings and fail on exception if debugging
	if debug:
		result = walk_forward_validation(data, n_test, cfg)
	else:
		# one failure during model validation suggests an unstable config
		try:
			# never show warnings when grid searching, too noisy
			with catch_warnings():
				filterwarnings("ignore")
				result = walk_forward_validation(data, n_test, cfg)
		except:
			error = None
	# check for an interesting result
	if result is not None:
		print(' > Model[%s] %.3f' % (key, result))
	return (key, result)

# grid search configs
def grid_search(data, cfg_list, n_test, parallel=True):
	scores = None
	if parallel:
		# execute configs in parallel
		executor = Parallel(n_jobs=cpu_count(), backend='multiprocessing')
		tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list)
		scores = executor(tasks)
	else:
		scores = [score_model(data, n_test, cfg) for cfg in cfg_list]
	# remove empty results
	scores = [r for r in scores if r[1] != None]
	# sort configs by error, asc
	scores.sort(key=lambda tup: tup[1])
	return scores

# create a set of exponential smoothing configs to try
def exp_smoothing_configs(seasonal=[None]):
	models = list()
	# define config lists
	t_params = ['add', 'mul', None]
	d_params = [True, False]
	s_params = ['add', 'mul', None]
	p_params = seasonal
	b_params = [True, False]
	r_params = [True, False]
	# create config instances
	for t in t_params:
		for d in d_params:
			for s in s_params:
				for p in p_params:
					for b in b_params:
						for r in r_params:
							cfg = [t,d,s,p,b,r]
							models.append(cfg)
	return models

In [21]:
# model configs
cfg_list = exp_smoothing_configs()
# grid search
print('Model fitting has begun!')
scores = grid_search(data[:,0], cfg_list, n_test)
print('Model fitting completed!')

Model fitting has begun!
 > Model[['add', False, None, None, False, True]] 106.431
 > Model[['add', False, None, None, True, True]] 110.230
 > Model[['add', False, None, None, False, False]] 104.874
 > Model[['add', False, None, None, True, False]] 108.665
 > Model[['mul', False, None, None, True, True]] 110.576
 > Model[['mul', False, None, None, True, False]] 108.747
 > Model[['mul', False, None, None, False, True]] 86.324
 > Model[['add', True, None, None, False, True]] 97.919
 > Model[['add', True, None, None, True, True]] 97.094
 > Model[['mul', False, None, None, False, False]] 83.725
 > Model[[None, False, None, None, True, True]] 96.683
 > Model[[None, False, None, None, True, False]] 112.384
 > Model[[None, False, None, None, False, True]] 99.415
 > Model[[None, False, None, None, False, False]] 108.031
 > Model[['mul', True, None, None, True, True]] 1561.970
 > Model[['add', True, None, None, False, False]] 103.069
 > Model[['add', True, None, None, True, False]] 106.559
 > M

In [22]:
if (notifyStatus): email_notify("Task 4. Fit and Evaluate Models completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

## Task 5. Finalize Model

In [23]:
if (notifyStatus): email_notify("Task 5. Finalize Model has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [24]:
# list top 3 configs
print('Listing the top three models:')
for cfg, error in scores[:3]:
	print(cfg, error)

Listing the top three models:
['mul', False, None, None, False, False] 83.7245585999172
['mul', False, None, None, False, True] 86.32425287302618
[None, False, None, None, True, True] 96.68252034213745


In [25]:
if (notifyStatus): email_notify("Task 5. Finalize Model completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [26]:
print ('Total time for the script:',(datetime.now() - startTimeScript))

Total time for the script: 0:00:16.045495
