## Time Series Analysis and Forecasting using ETS model

- Time series analysis is all about understanding of the given historical data, in order to extract meaningful statistics, patterns and other characteristics, forecasting is to predict its future behaviour

#### The Goal
The goal of this lab is to forecast future values with "AirPassengers" dataset, using ETS forecasting technique.

#### About the "AirPassengers" dataset
Airline data showing monthly totals of international airline passengers, from 1949 to 1960

#### Download and Install Python Libraries

In [None]:
#!pip install pandas
#!pip install numpy
#!pip install scikit-learn
#!pip install scipy
#!pip install seaborn
#!pip install matplotlib
#!pip install statsmodels

#### Import Python Libraries

In [None]:
import pandas as pd
import seaborn as sns

import matplotlib.pyplot as plt
%matplotlib inline

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# these two lines of code needs to be run when using pandas dataframes within matplotlib
# if not, you would get some warning/error messages
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()


# Switching off unnecessary warning messages 
import warnings
warnings.filterwarnings('ignore')





#### Process map
Below illustrates a 9-step process used during this lab.

    1.	Import Data
    2.	Data Quality Checks
    3.	Data Cleansing
    4.	Data Pre-processing
    5.	Visualisations
    6.	Model: Build
    7.	Model: Evaluation
    8.	Model: Predictions
    9.	Model: Save Predictions

#### 1. Import Data

In [None]:
# Reading data from a Excel file and saving that data into a dataframe called "df"
df = pd.read_excel("AirPassengers.xlsx")
df

#### 2. Data Quality Checks

    2.1 Check data
    2.2 Check shape of data
    2.3 Check for duplicates
    2.4 Check for missing values

In [None]:
# 2.1
# Viewing top 5 records

df.head()

In [None]:
# 2.2
# Looking at the structure of the dataframe

df.shape

In [None]:
# 2.3
# Let’s use duplicated() function to identify how many duplicate records there are in the dataset

df.duplicated().sum()

In [None]:
# 2.4
# This method prints out information about a dataframe including the index, dtype, columns, non-null values and memory usage
# This method is also useful for finding out missing values in a dataset
# if found, we can use interpolation techniques to rectify those missing values

df.info()

#### 3. Data Cleansing

    3.1 Remove duplicates
    3.2 Fill missing values

In [None]:
# 3.1
# This is how you remove all the duplicates from the dataset using drop_duplicates() function

# df = df.drop_duplicates()

In [None]:
# 3.2
# Fill missing values (NaN, Null) with median value of a column

# This is how you fix a missing value for a specific column
# df.Passengers = df.Passengers.fillna(df.Passengers.median())
# df

#### 4. Data Pre-processing

In [None]:
# Currently the "df" is a dataframe object, but for time-series analysis this data needs to be reformatted.

type(df)

#### Creation of the time-series object
#### Step1:
- It is good practice to create an index to explain the behaviour of the time-series, 
- This is mainly done as a support measure to further explain the frequency of the time-series. 
- Think of this as a better formatted version of the” Datestamps”. 
- Index will be used during model building stage.


[Learn more about different time-series frequencies found in python] (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)

In [None]:
dti = pd.date_range(start ='1949-01-01', freq ='MS', periods = 144)
dti

#### Step2:

In [None]:
# Creating a pandas time-series object using "dti" index 

df2 = pd.Series(data=list(df.Passengers), index=dti)
print(df2)

In [None]:
# As you can see this new "df2" is no longer a dataframe object, it's a series object

type(df2)

#### 5. Visualisations

In [None]:
# Creating a lineplot using matplotlib package

plt.plot(df2)
plt.title("Air passenger analysis")
plt.xlabel("Year")
plt.ylabel("Number of Passengers")
plt.show()

In [None]:
# Creating a lineplot using seaborn package

sns.lineplot(data=df2)
plt.title("Air passenger analysis")
plt.xlabel("Year")
plt.ylabel("Passengers")
plt.show()

#### 6. Model: Build

In [None]:
# Creation of the ETS Model using ExponentialSmoothing() function

model = ExponentialSmoothing(df2, seasonal_periods=12, trend='add', seasonal='mul', freq="MS", dates=dti)
fit1 = model.fit()

#### 7. Model: Evaluation

In [None]:
fit1.summary()

#### 8. Model: Predictions

In [None]:
# Predicting for 12 months

passengers_forecast = fit1.forecast(steps=12)
passengers_forecast

In [None]:
### Visualising the forecasted values in a line chart

# creating the line chart object using matplotlib
fig = plt.figure()
fig.suptitle('Air Passengers')

# creating two charts to show actual and forecasted values
Actual, = plt.plot(df2.index, df2, 'blue', label='Actual')
predicted, = plt.plot(passengers_forecast.index, passengers_forecast, 'red', label='Forecast')

plt.legend(handles=[Actual, predicted])
plt.show()

#### 8. Model: Save Predictions

In [None]:
# Save the Predictions to a CSV file

passengers_forecast.to_csv("AirPassengersPredicted.csv", header=False, index=True, encoding="utf-8")