# <span style='color:Blue'> TIME SERIES </span>

This `Notebook` is based on the analysis of a dataset called “Atmospheric CO2 from Continuous Air Samples at Mauna Loa Observatory, Hawaii, U.S.A.,” which collected CO2 samples from March 1958 to December 2001.

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error, mean_absolute_error
from math import sqrt

import statsmodels.api as sm

%matplotlib inline
sns.set_style('whitegrid')
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg')

### Load Data

In [None]:
data = sm.datasets.co2.load_pandas()
X = data.data

date_min = X.index.min()
date_max = X.index.max()
print('Minimum date from data set: {}'.format(date_min))
print('Maximum date from data set: {}'.format(date_max))

Since it is a briefer amount of time, we will resample the dataset and use monthy averages.

In [None]:
# The 'MS' string groups the data in buckets by start of the month
X = X['co2'].resample('MS').mean()

# The term bfill means that we use the value before filling in missing values
X = X.fillna(X.bfill())

X = pd.DataFrame(X)

Visualization plays an important role in time series analysis and forecasting. Plots of the raw sample data can provide valuable diagnostics to identify temporal structures like trends, cycles, and seasonality that can influence the choice of model.

In [None]:
sns.lineplot(x=X.index, y="co2", data=X)
plt.xticks(rotation=15)
plt.title("Monthly CO2 Emissions")
plt.xlabel("Time")
plt.show()

If we want to predict the CO2 Emmisions for the next few months, we will try to look at the past values and try to gauge and extract the pattern. Here, we observe a pattern within each year indicating a seasonal effect. Such observations will help us in predicting future values.

In [None]:
from pandas.plotting import lag_plot

lag_plot(X['co2'])

From the lag plot above, we can see that the relationship is strongly positive that means the data has strong seasonality.

In [None]:
# Let's plot ACF & PACF graphs to visualize AR & MA components
import statsmodels.tsa.api as smt


fig, axes = plt.subplots(1, 2)
fig.set_figwidth(7.5)
fig.set_figheight(3)
smt.graphics.plot_acf(X['co2'], lags=30, ax=axes[0], alpha=0.5)
smt.graphics.plot_pacf(X['co2'], lags=30, ax=axes[1], alpha=0.5)
plt.tight_layout()

#### Moving Average Smoothing


Smoothing is a technique applied to time series to remove the fine-grained variation between time steps. The hope of smoothing is to remove noise and better expose the signal of the underlying causal processes. Moving averages are a simple and common type of smoothing used in time series analysis and time series forecasting. Calculating a moving average involves creating a new series where the values are comprised of the average of raw observations in the original time series.
Also, moving average can help us to identify trends in time series. Becouse we are taking the average, it tends to smooth out noise and seasonality.

In [None]:
# Let's plot the 12-Month Moving Rolling Mean & Variance and find Insights
# Rolling Statistics
X["Moving_Average"] = X.rolling(window=12, center=True).mean()
#rolvar = X.rolling(window=12).std()

sns.lineplot(x=X.index, y="co2", data=X)
sns.lineplot(x=X.index, y="Moving_Average", data=X)
plt.xticks(rotation=15)
plt.show()

#### Seasonal Patterns in Time Series

One way to think about the seasonal components to the time series of your data is to remove the trend from a time series, so that you can more easily investigate seasonality. To remove the trend, you can subtract the trend you computed above (rolling mean) from the original signal. This, however, will be dependent on how many data points you averaged over.

In [None]:
X["Trend_Corrected"] = X["co2"] - X["Moving_Average"]

sns.lineplot(x=X.index, y="Trend_Corrected", data=X)
plt.xticks(rotation=15)
plt.show()

### Decomposing: Eliminating trend and seasonality

Time series decomposition involves thinking of a series as a combination of trend, seasonality, and noise components. Decomposition provides a useful abstract model for thinking about time series generally and for better understanding problems during time series analysis and forecasting.

In [None]:
decomposition_add = sm.tsa.seasonal_decompose(X["co2"], period=12, model="additive")
# comment: seasonal_decompose expects index to be datetime format
fig = decomposition_add.plot()
plt.show()

In [None]:
def hist(series):
    fig, ax= plt.subplots()
    sns.distplot(series, ax=ax, hist_kws={'alpha': 0.8, 'edgecolor':'black', 'color': 'blue'},  
                 kde_kws={'color': 'black', 'alpha': 0.7})
    sns.despine()
    return fig, ax

hist(decomposition_add.resid)
plt.show()

## Naive Forecast

In [None]:
tscv = TimeSeriesSplit(n_splits=3)

X_CO2 = X['co2']
plt.figure(1)
index = 1

for train_index, test_index in tscv.split(X_CO2):
    
    train = X_CO2.iloc[train_index]
    test = X_CO2.iloc[test_index]

    print('Observations: %d' % (len(train) + len(test)))
    print('Training Observations: %d' % (len(train)))
    print('Testing Observations: %d' % (len(test)))

    
    dd = np.asarray(train)

    lastvalue = dd[len(dd)-1]

    y_hat = pd.DataFrame()
    y_hat['co2'] = test.values
    
    y_hat['naive'] = lastvalue
    
    y_hat.index = test.index

    plt.figure()
    sns.lineplot(x=train.index, y=train, label='Train')
    sns.lineplot(x=test.index, y=test, label='Test')
    sns.lineplot(x=y_hat.index, y=y_hat['naive'], label='Naive Forecast')
    plt.legend(loc='best')
    plt.title("Naive Forecast")
    plt.show()
    rms = sqrt(mean_squared_error(test, y_hat.naive))
    mae = mean_absolute_error(test, y_hat.naive)
    print('RMSE = '+str(rms))
    print('MAE = '+str(mae))

    index += 1

    plt.show()