<a href="https://colab.research.google.com/github/cagBRT/timeSeries/blob/main/4_TimeSeriesAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

By the end of this notebook the student will be able to:<br>
1. Explain the components of a time series
2. Decompose a time series into its components
3. Use Python, Pandas, and MatPlotLib to analyze a time series

# **Time Series Analysis**

This notebook covers some of the more common techniques for analysis of time series

Why even analyze a time series?<br>

Because it is the preparatory step before you develop a forecast of the series.
<br><br>
Besides, time series forecasting has enormous commercial significance because stuff that is important to a business like demand and sales, number of visitors to a website, stock price etc are essentially time series data.
<br><br>
So what does analyzing a time series involve?
<br><br>
Time series analysis involves understanding various aspects about the inherent nature of the series so that you are better informed to create meaningful and accurate forecasts.

https://www.machinelearningplus.com/time-series/time-series-analysis-python/


In [None]:
!git clone -l -s https://github.com/cagBRT/timeSeries.git cloned-repo
%cd cloned-repo


**Import the necessary libraries**

In [None]:
from dateutil.parser import parse 
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120})
from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse

**Define functions for plotting**

In [None]:
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
    plt.figure(figsize=(16,5), dpi=dpi)
    plt.plot(x, y, color='tab:red')
    plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
    plt.show()

**Import the time series data**<br>
Import the time series with the date as the index

** Note, in the series, the ‘value’ column is placed higher than date to imply that it is a series.

In [None]:
# Import as Dataframe
#Monthly Anti-Diabetic drug sales from Australia
df_tse = pd.read_csv('timeSeriesExample.csv', parse_dates=['date'], index_col='date')
df_tse.head()

# **Visualizing time series data**

Plot the time on the x-axis and the sales on the y-axis for the df_tse dataframe

**Reflecting over the x-axis**

Load data from a file called AirPassengers.csv


When all the values are positive, you can also reflect the graph over the x-axis to empathsize growth

In [None]:
df_ap = pd.read_csv('AirPassengers.csv', parse_dates=['Month'])
x = df_ap['Month'].values
y1 = df_ap['#Passengers'].values

In [None]:
plot_df(df_ap, x,y1,title='Air Passengers per Month')  

# **Seasonal Plot of a Time Series**

Plot the data for Monthly Anti-Diabetic drug sales in Australia

In [None]:
plot_df(df_tse, x=df_tse.index, y=df_tse.value, 
        title='Monthly anti-diabetic drug sales in Australia from 1992 to 2008.')    

**Prepare the data for seasonal plotting**<br>
There is a steep fall in drug sales every February, rising again in March, falling again in April and so on. <br>
Clearly, the pattern repeats within a given year, every year.

As the years progress, the drug sales increase overall.

Reset the index so that the date is the index

In [None]:
df_tse.reset_index(inplace=True)
df_tse['year'] = [d.year for d in df_tse.date]
df_tse['month'] = [d.strftime('%b') for d in df_tse.date]
years = df_tse['year'].unique()

Set the colors for the plot

In [None]:
# Prep Colors
np.random.seed(100)
mycolors = np.random.choice(list(mpl.colors.XKCD_COLORS.keys()), 
                            len(years), replace=False)

In [None]:
# Draw Plot
plt.figure(figsize=(16,8), dpi= 80)
for i, y in enumerate(years):
    if i > 0:        
        plt.plot('month', 'value', data=df_tse.loc[df_tse.year==y, :], 
                 color=mycolors[i], label=y)
        plt.text(df_tse.loc[df_tse.year==y, :].shape[0]-.9, 
                 df_tse.loc[df_tse.year==y, 'value'][-1:].values[0], y, 
                 fontsize=12, color=mycolors[i])

plt.gca().set(xlim=(-0.3, 11), ylim=(2, 30), ylabel='$Drug Sales$', 
              xlabel='$Month$')
plt.yticks(fontsize=12, alpha=.7)
plt.title("Seasonal Plot of Drug Sales Time Series", fontsize=20)
plt.show()

**Boxplot of Month-wise (Seasonal) and Year-wise (trend) Distribution**<br>
The boxplots make the year-wise and month-wise distributions evident. Also, in a month-wise boxplot, the months of December and January clearly has higher drug sales, which can be attributed to the holiday discounts season.

Plot the data<br>
Diamonds show outlier points

In [None]:
# Draw Plot
fig, axes = plt.subplots(1, 2, figsize=(20,15), dpi= 80)
sns.boxplot(x='year', y='value', data=df_tse, ax=axes[0])
sns.boxplot(x='month', y='value', data=df_tse.loc[~df_tse.year.isin([1991, 2008]), :])

# Set Title
axes[0].set_title('Year-wise Box Plot\n(The Trend)', fontsize=18); 
axes[1].set_title('Month-wise Box Plot\n(The Seasonality)', fontsize=18)
plt.show()

# **Patterns in Time Series**

Time series may be split into the following components: <br>
>- Base Level <br>
- Trend: is observed when there is an increasing or decreasing slope observed in the time series <br>
- Seasonality: is observed when there is a distinct repeated pattern observed between regular intervals due to seasonal factors <br>
- Error<br>
<br>

Time series do not have to have trends or seasonality





In [None]:
fig, axes = plt.subplots(1,3, figsize=(20,4), dpi=100)
df_gr = pd.read_csv('guinearice.csv', parse_dates=['date'], index_col='date').plot(title='Trend Only', legend=False, ax=axes[0])
df_ss = pd.read_csv('sunspotarea.csv', parse_dates=['date'], index_col='date').plot(title='Seasonality Only', legend=False, ax=axes[1])

df_ap = pd.read_csv('AirPassengers.csv', parse_dates=['Month'], index_col='Month').plot(title='Trend and Seasonality', legend=False, ax=axes[2])

**Cyclic Behavior**<br>
Another aspect to consider is the cyclic behaviour. It happens when the rise and fall pattern in the series does not happen in fixed calendar-based intervals. **Care should be taken to not confuse ‘cyclic’ effect with ‘seasonal’ effect.**<br><br>

**An example of a cyclic timeseries**: Stock market fluctuations: The stock market is known to fluctuate in a cyclical pattern, often following a regular pattern of highs and lows throughout the year

**Diffentiate between a ‘cyclic’ vs ‘seasonal’ pattern**

If the patterns are not of fixed calendar based frequencies, then it is cyclic. Because, unlike the seasonality, cyclic effects are typically influenced by the business and other socio-economic factors.

# **Decompose a time series into its components**

You can do a classical decomposition of a time series by considering the series as an additive or multiplicative combination of the base level, trend, seasonal index and the residual.

Depending on the nature of the trend and seasonality, a time series can be modeled as an additive or multiplicative, wherein, each observation in the series can be expressed as either a sum or a product of the components:<br>

**Additive time series**:<br>
>Value = Base Level + Trend + Seasonality + Error<br>

**Multiplicative Time Series**:<br>
>Value = Base Level x Trend x Seasonality x Error<br>

Decompose the time series into:<br>
- trend
- seasonal
- residual<br>

The seasonal_decompose function is breaks the time series into trend, seasonal, and residual components. 


Setting extrapolate_trend='freq' takes care of any missing values in the trend and residuals at the beginning of the series.

In [None]:
df_tse = pd.read_csv('timeSeriesExample.csv', parse_dates=['date'], 
                     index_col='date')
df_tse.head()

In [None]:
# Multiplicative Decomposition 
result_mul = seasonal_decompose(df_tse['value'], model='multiplicative', 
                                extrapolate_trend='freq')

# Additive Decomposition
result_add = seasonal_decompose(df_tse['value'], model='additive', 
                                extrapolate_trend='freq')

A Python function that plots the decomposition of a time series

In [None]:
def plotseasonal(res, axes ):
    res.observed.plot(ax=axes[0], legend=False)
    axes[0].set_ylabel('Observed')
    res.trend.plot(ax=axes[1], legend=False)
    axes[1].set_ylabel('Trend')
    res.seasonal.plot(ax=axes[2], legend=False)
    axes[2].set_ylabel('Seasonal')
    res.resid.plot(ax=axes[3], legend=False)
    axes[3].set_ylabel('Residual')

A residual is a deviation from the sample mean. <br>
Errors, like other population parameters (e.g. a population mean), are usually theoretical. <br>
Residuals, like other sample statistics (e.g. a sample mean), are measured values from a sample.

If you look at the residuals of the additive decomposition closely, it has some pattern left over. <br>
The multiplicative decomposition, however, looks quite random which is good. <br>
Ideally, multiplicative decomposition should be preferred for this particular series.

In [None]:
fig, axes = plt.subplots(ncols=2, nrows=4, sharex=True, figsize=(12,5))
axes[0,0].set_title("Multiplicative")
axes[0,1].set_title("Additive")
plotseasonal(result_mul, axes[:,0])
plotseasonal(result_add, axes[:,1])
plt.show()

**Create a dataframe of the individual components**

The numerical output of the trend, seasonal and residual components are stored in the result_mul output<br>
<br>
The product of seas, trend, resid equals actual<br>
>seas * trend * resid  = actual

In [None]:
# Extract the Components ----
# Actual Values = Product of (Seasonal * Trend * Resid)
df_reconstructed = pd.concat([result_mul.seasonal, result_mul.trend, result_mul.resid, result_mul.observed], axis=1)
df_reconstructed.columns = ['seas', 'trend', 'resid', 'actual_values']
df_reconstructed.head()

# **Assignment**<br>
Using the Maruti.csv time series, so the following:
- breakdown into it's different components

Note: you will need to add a freq component to the seasonal decompose<br>
freq=365 for a daily freq
freq=12 for a monthly freq 
<br>
The season_decomposition function is a little confusing and there is a possibility that the software for the function contains a bug. <br><br>
It is still a valuable tool. <br>
For more information [Time Series Decomposition](https://towardsdatascience.com/time-series-decomposition-and-statsmodels-parameters-69e54d035453
)

In [None]:
#@title 
df_m = pd.read_csv('MARUTI.csv', parse_dates=['Date'], 
                     index_col='Date')
#df_m['VWAP'].plot(figsize=(8,4),title=' volume weighted average price')
#Freq 12 ==> monthly
#Freq 365 ==> daily
result_mul = seasonal_decompose(df_m['VWAP'], model='multiplicative', period=365,
                                extrapolate_trend='freq')

# Additive Decomposition
result_add = seasonal_decompose(df_m['VWAP'], model='additive',period=365, 
                                extrapolate_trend='freq')
fig, axes = plt.subplots(ncols=2, nrows=4, sharex=True, figsize=(12,5))
axes[0,0].set_title("Multiplicative")
axes[0,1].set_title("Additive")
plotseasonal(result_mul, axes[:,0])
plotseasonal(result_add, axes[:,1])
plt.show()

In the next notebook you learn to de-seasonalize, and de-trend time series