<a href="https://colab.research.google.com/github/cagBRT/timeSeries/blob/main/4_TimeSeriesAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Time Series Analysis**

This notebook covers some of the more common techniques for analysis of time series

Why even analyze a time series?<br>

Because it is the preparatory step before you develop a forecast of the series.
<br><br>
Besides, time series forecasting has enormous commercial significance because stuff that is important to a business like demand and sales, number of visitors to a website, stock price etc are essentially time series data.
<br><br>
So what does analyzing a time series involve?
<br><br>
Time series analysis involves understanding various aspects about the inherent nature of the series so that you are better informed to create meaningful and accurate forecasts.

https://www.machinelearningplus.com/time-series/time-series-analysis-python/


In [None]:
!git clone -l -s https://github.com/cagBRT/timeSeries.git cloned-repo
%cd cloned-repo


**Import the necessary libraries**

In [None]:
from dateutil.parser import parse 
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120})


**Import the time series data**<br>
Import the time series with the date as the index

** Note, in the series, the ‘value’ column is placed higher than date to imply that it is a series.

In [None]:
# Import as Dataframe
df_tse = pd.read_csv('timeSeriesExample.csv', parse_dates=['date'], index_col='date')
df_tse.head()

In [None]:
df_MA = pd.read_csv('MarketArrivals.csv')
df_MA = df_MA.loc[df_MA.market=='MUMBAI', :]
df_MA.head()

In [None]:
import matplotlib.pyplot as plt
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
    plt.figure(figsize=(16,5), dpi=dpi)
    plt.plot(x, y, color='tab:red')
    plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
    plt.show()


In [None]:
plot_df(df_tse, x=df_tse.index, y=df_tse.value, title='Monthly anti-diabetic drug sales in Australia from 1992 to 2008.')    

In [None]:
df_ap = pd.read_csv('AirPassengers.csv', parse_dates=['Month'])
x = df_ap['Month'].values
y1 = df_ap['#Passengers'].values

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(16,5), dpi= 120)
plt.fill_between(x, y1=y1, y2=-y1, alpha=0.5, linewidth=2, color='seagreen')
plt.ylim(-800, 800)
plt.title('Air Passengers (Two Side View)', fontsize=16)
plt.hlines(y=0, xmin=np.min(df_ap.Month), xmax=np.max(df_ap.Month), linewidth=.5)
plt.show()

In [None]:
df_tse['year'] = [d.year for d in df_tse.date]
df_tse['month'] = [d.strftime('%b') for d in df_tse.date]
years = df_tse['year'].unique()

In [None]:
np.random.seed(100)
mycolors = np.random.choice(list(mpl.colors.XKCD_COLORS.keys()), len(years), replace=False)

# Draw Plot
plt.figure(figsize=(16,12), dpi= 80)
for i, y in enumerate(years):
    if i > 0:        
        plt.plot('month', 'value', data=df_tse.loc[df_tse.year==y, :], color=mycolors[i], label=y)
        plt.text(df_tse.loc[df_tse.year==y, :].shape[0]-.9, df_tse.loc[df_tse.year==y, 'value'][-1:].values[0], y, fontsize=12, color=mycolors[i])

# Decoration
plt.gca().set(xlim=(-0.3, 11), ylim=(2, 30), ylabel='$Drug Sales$', xlabel='$Month$')
plt.yticks(fontsize=12, alpha=.7)
plt.title("Seasonal Plot of Drug Sales Time Series", fontsize=20)
plt.show()

In [None]:
# Prepare data
df_tse['year'] = [d.year for d in df_tse.date]
df_tse['month'] = [d.strftime('%b') for d in df_tse.date]
years = df_tse['year'].unique()

# Draw Plot
fig, axes = plt.subplots(1, 2, figsize=(20,7), dpi= 80)
sns.boxplot(x='year', y='value', data=df_tse, ax=axes[0])
sns.boxplot(x='month', y='value', data=df_tse.loc[~df_tse.year.isin([1991, 2008]), :])

# Set Title
axes[0].set_title('Year-wise Box Plot\n(The Trend)', fontsize=18); 
axes[1].set_title('Month-wise Box Plot\n(The Seasonality)', fontsize=18)
plt.show()

In [None]:
fig, axes = plt.subplots(1,3, figsize=(20,4), dpi=100)
pd.read_csv('guinearice.csv', parse_dates=['date'], index_col='date').plot(title='Trend Only', legend=False, ax=axes[0])

pd.read_csv('sunspotarea.csv', parse_dates=['date'], index_col='date').plot(title='Seasonality Only', legend=False, ax=axes[1])

pd.read_csv('AirPassengers.csv', parse_dates=['Month'], index_col='Month').plot(title='Trend and Seasonality', legend=False, ax=axes[2])