# Продажи австралийского вина
Известны ежемесячные продажи австралийского вина в тысячах литров с января 1980 по июль 1995, необходимо построить прогноз на следующие два года.

In [None]:
import numpy as np
import pandas as pd
import scipy.stats as st
import matplotlib.pylab as plt
import seaborn as sns

from statsmodels.tsa.stattools import kpss
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf

from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX

from statsmodels.stats.diagnostic import acorr_ljungbox

from statsmodels.tsa.holtwinters import ExponentialSmoothing

plt.rcParams['figure.figsize'] = (12,8)

In [None]:
data = pd.read_csv('monthly-australian-wine-sales-th.csv', parse_dates=[0], delimiter=',')
data.head()


In [None]:
data.columns = ['date', 'wine']
data.head()

In [None]:
sns.lineplot(data['date'],data['wine'])

plt.show()

Попробуем поделить на число дней в месяце: 

In [None]:
values = data['wine'].values.copy()
days = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
for i in range(len(values)):
    values[i]/=days[i%12]
sns.lineplot(data['date'],values)

plt.show()

Ряд не стал более регулярным, так что вернёмся к исходным данным.

STL-декомпозиция ряда:

In [None]:
data = data.set_index(pd.DatetimeIndex(data['date']))

In [None]:
data.head()

In [None]:
stl_data = seasonal_decompose(data['wine'])
_= stl_data.plot()

In [None]:
transformed, lam = st.boxcox(data['wine'])
data['bc'] = transformed
print (lam)
stl_data = seasonal_decompose(transformed, period=12)
_= stl_data.plot()

# ARIMA
Стационарность:

In [None]:
kpss(data['wine'], nlags='legacy')

Cделаем сезонное дифференцирование: 

In [None]:
diff_ts = data['wine'].diff(12)[12:]
plt.plot(diff_ts)
kpss(diff_ts, nlags='legacy')

Ряд всё ещё нестационарен. Проведём ещё одно дифференцирование: 

In [None]:
diff_diff_ts = diff_ts.diff(1)[1:]
plt.plot(diff_diff_ts)
kpss(diff_diff_ts, nlags='legacy')

Для полученного ряда гипотеза стационарности не отвергается

Посмотрим на ACF и PACF полученного продифференцированного ряда:

In [None]:
plot_acf(diff_diff_ts, lags=15)

plt.show()

In [None]:
plot_pacf(diff_diff_ts, lags=15)

plt.show()

На ACF значимы лаги 1 и 12, на PACF — 1-3. Будем искать модель, оптимальную по AICc, в окрестности ARIMA(3,1,1)

In [None]:
for i in range(-1,2):
    for k in range(-1, 2):
        order = (3+i,1,1+k)
        try:
            arima = ARIMA(data['wine'].values, order=order).fit()
            print(arima.summary())
        except:
            pass 

In [None]:
arima = ARIMA(data['wine'].values, order=(2,1,1)).fit()
print(arima.summary())

In [None]:
plt.plot(arima.resid)
plot_pacf(arima.resid, lags=15)
plot_acf(arima.resid, lags=15)

plt.show()

In [None]:
for i in range(-1,2):
    for j in range(-1, 2):
        order = (3+i, 1, 1+j)
        sorder = (1, 1, 1, 12)
        try:
            arima = SARIMAX(data['wine'].values, order=order, seasonal_order=sorder).fit()
            print(arima.summary())
        except:
            pass 

In [None]:
arima = SARIMAX(data['wine'].values, order=(2,1,1), seasonal_order=(1, 1, 1, 12)).fit()


plt.plot(arima.resid)
plot_pacf(arima.resid, lags=15)
plot_acf(arima.resid, lags=15)

plt.show()

In [None]:
plt.scatter(range(16), acorr_ljungbox(arima.resid, lags=16)[1])

plt.show()

Q-Q plot: 

In [None]:
_ = st.probplot(arima.resid, plot=plt)
print (st.shapiro(arima.resid))
plt.show()
plt.hist(arima.resid)

plt.show()

In [None]:
arima = SARIMAX(data['wine'].values[:-12], order=(3,1,1), seasonal_order=(1,1,1,12)).fit()
predicted = arima.predict(0, data.shape[0])

In [None]:
plt.plot(data['wine'].values[:-12], alpha=0.5, ls='--')
plt.plot(data['wine'].values[-12:], alpha=0.5, ls='--')
plt.plot(predicted)

plt.show()

In [None]:
arima = SARIMAX(data['wine'].values[:-12], order=(3,1,1), seasonal_order=(1,1,1,12)).fit()
predicted = arima.predict(data.shape[0]-12, data.shape[0]-1, dynamic=True)

In [None]:
plt.plot(data['wine'].values[:-12], alpha=0.5, ls='--')
plt.plot(range(len(data['wine'].values[:-12]), len(data['wine'])), data['wine'].values[-12:], alpha=0.5, ls='--')
plt.plot(range(len(data['wine'].values[:-12]), len(data['wine'])), predicted)

plt.show()

# Прогноз ETS

In [None]:
ets = ExponentialSmoothing(data['wine'].values[12:]).fit(optimized=True)

predicted = ets.predict(0, data.shape[0])

plt.plot(data['wine'].values[:-12], alpha=0.5, ls='--')
plt.plot(range(len(data['wine'].values[:-12]), len(data['wine'])), data['wine'].values[-12:], alpha=0.5, ls='--')
plt.plot(predicted)

plt.show()

In [None]:
ets = ExponentialSmoothing(data['wine'].values[12:],  trend='add').fit()

predicted = ets.predict(0, data.shape[0])

plt.plot(data['wine'].values[:-12], alpha=0.5, ls='--')
plt.plot(range(len(data['wine'].values[:-12]), len(data['wine'])), data['wine'].values[-12:], alpha=0.5, ls='--')
plt.plot(predicted)

plt.show()

In [None]:
ets = ExponentialSmoothing(data['wine'].values[12:], seasonal='mul', trend='add', seasonal_periods=12).fit()

predicted = ets.predict(0, data.shape[0])

plt.plot(data['wine'].values[:-12], alpha=0.5, ls='--')
plt.plot(range(len(data['wine'].values[:-12]), len(data['wine'])), data['wine'].values[-12:], alpha=0.5, ls='--')
plt.plot(predicted)

plt.show()

In [None]:
_ = st.probplot(ets.resid, plot=plt)
print (st.shapiro(ets.resid))
plt.show()

plt.hist(ets.resid)
plt.show()

In [None]:
ets = ExponentialSmoothing(data['wine'].values[12:], seasonal='mul', trend='add', seasonal_periods=12).fit(use_boxcox=True)

predicted = ets.predict(0, data.shape[0])

plt.plot(data['wine'].values[:-12], alpha=0.5, ls='--')
plt.plot(range(len(data['wine'].values[:-12]), len(data['wine'])), data['wine'].values[-12:], alpha=0.5, ls='--')
plt.plot(predicted)

plt.show()

In [None]:
_ = st.probplot(ets.resid, plot=plt)
print (st.shapiro(ets.resid))
plt.show()

plt.hist(ets.resid)
plt.show()

In [None]:
ets = ExponentialSmoothing(data['bc'].values[12:], seasonal='mul', trend='add', seasonal_periods=12).fit()

predicted = ets.predict(0, data.shape[0])

plt.plot(data['bc'].values[:-12], alpha=0.5, ls='--')
plt.plot(range(len(data['bc'].values[:-12]), len(data['bc'])), data['bc'].values[-12:], alpha=0.5, ls='--')
plt.plot(predicted)

plt.show()

In [None]:
_ = st.probplot(ets.resid, plot=plt)
print (st.shapiro(ets.resid))
plt.show()

plt.hist(ets.resid)
plt.show()