## Additionally Statistical Modeling Techniques for Time Series 

In [2]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.statespace.tools import diff 
from pathlib import Path 
import warnings
warnings.filterwarnings('ignore')
plt.rc("figure", figsize=(16,5))


## Forecasting time series data using auto_arima

* For this part we need to install the package pmdarima
    - this package includes auto_arima for automating ARIMA hyperparameter 

* auto_arima automates the process for finding the optimal parameters. The auto_arima function uses a stepwise algorithm that is faster and more efficient than a full grid search 

* When stepwise=True, auto_arima performs a stepwise search (default)
* With stepwise=False, it performs a brute-force grid search (full search)
* With random=True, it performs a random search

stepwise is an optimization technique that utilizes grid search more efficiently. This is accomplished using unit root tests and minimizing information criteria (example: AIC and Maximum Likelihood Estimation (MLE))

Additionally, auto_arima can handle seasonal and non-seasonal ARIMA models. If seasonal ARIMA is desired, we will need to set seasonal=True for auto_arima to optimize over the (P,D,Q) values


Here we will be using the milk_production data, and we must remember from earlier that this dataset contains both trend and seasonality, so we will be training a SARIMA model. 

In [3]:
# import packages and data 
import pmdarima as pm 
milk_file = Path('../TimeSeriesAnalysisWithPythonCookbook/Data/milk_production.csv')
milk = pd.read_csv(milk_file,
                   index_col='month',
                   parse_dates=True)

In [4]:
# Split the dataset into train and test sets
# using standard sci-kit
from sklearn.model_selection import train_test_split
train, test = train_test_split(milk, test_size=0.10, shuffle=False)

In [7]:
# Same splits using pmdarima
train, test = pm.model_selection.train_test_split(milk, test_size=0.10)

print(f'Train: {train.shape}')
print(f'Test: {test.shape}')

Train: (151, 1)
Test: (17, 1)
