# what is Timeseries?
* Time Series algorithm provides multiple algorithms that are optimized for forecasting continuous values, such as product sales, over time.A time series model can predict trends based only on the original dataset that is used to create the model.
* Across industries, organizations commonly use time series data, which means any information collected over a regular interval of time, in their operations. Examples include daily stock prices, energy consumption rates, social media engagement metrics and retail demand, among others. Analyze time series data yields insights like trends, seasonal patterns and forecasts into future events that can help generate profits. For example, by understanding the seasonal trends in demand for retail products, companies can plan promotions to maximize sales throughout the year.

**White noise**
* A time series is white noise if the variables are independent and identically distributed with a mean of zero. This means that all variables have the same variance (sigma^2) and each value has a zero correlation with all other values in the series.


**Data patterns**

**cycle**
* A cyclic pattern occurs when data rise and fall, but this does not happen within the fixed time and the duration of these fluctuations is usually at least 2 years
![](cycle.png)

**Trend**
* A trend pattern exists when there is a long-term increase or decrease in the series. The trend can be linear, exponential
![](trend.png)

**Seasonal**
* Seasonality exists when data is influenced by seasonal factors, such as a day of the week, a month, and one-quarter of the     year. A seasonal pattern exists of a fixed known period.
![](sea.png)

**Random**
* which do not follow any trend ,cycle or seasonal patterns
![](ran.png)


**Types**
* Types of time series models are moving average,ARIMA.The crucial thing is to choose the right forecasting method as per the characteristics of the time series data.



**MA(Moving Average)**
* A moving average is defined as an average of fixed number of items in the time series which move through the series by dropping the top items of the previous averaged group and adding the next in each successive average.
Yt depends only on random error terms
     	Yt = f( εt, εt-1, εt-2, εt-3, ..)
		or
 	Yt = β + εt + θ1 εt-1 + θ2εt-2 + θ3 εt-3 +…



**AR(Auto Regressive)**
* Autoregression is a time series model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step. It is a very simple idea that can result in accurate forecasts on a range of time series problems.
* Yt depends only of past values. 
* Yt-1, Yt-2, Yt-3 etc
              * Yt  = f(Yt-1, Yt-2, Yt-3… )
              * Yt = β0 + β1Yt-1 + β2Yt-2 + β3Yt-3 …


**ARMA**
* Time series, autoregressive–moving-average (ARMA) models provide a parsimonious description of a (weakly) stationary stochastic process in terms of two polynomials, one for the autoregression (AR) and the second for the moving average (MA).
* Combines AR and MA
	* Yt = β0 + β1Yt-1 + β2Yt-2 + β3Yt-3 …εt + θ1 εt-1 + θ2εt-2 + θ3 εt-3 +…


**ARIMA**
* Autoregressive Integrated Moving Average (ARIMA) model is another widely used forecasting technique that involves the combination of two or more time series models. This model is suitable for multivariate non-stationary data. ARIMA method is based on the concepts of autoregression, autocorrelation, and moving average.






### Create a Time Series Model to predict the future passengers number

In [None]:
## Importing librries
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [None]:
## loading the data
data=pd.read_csv('AirPassengers.csv')
# we have data of airline passengers travelled between January 1949 and December 1960

In [None]:
data.head(2)#first two rows

In [None]:
data.info()#info about datatype and null value

In [None]:
# Month is actually given as string here. It must be in date-time format

In [None]:
## parse_dates: parsing the date (Converts the string representation of a date to Date object) 
# index_col: using date column as index

data=pd.read_csv('AirPassengers.csv',parse_dates=[0],index_col='Month')

# Basic checks

In [None]:
data.head()#first five rows

In [None]:
data.tail()#last five rows

In [None]:
data.describe()##used to view some basic statistical details like percentile, mean, std etc. 

# EDA

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(15,10),facecolor='white')#canvas  size
plt.plot(data)#line plot 
plt.tight_layout()
## from plot we can see the series given is not stationary

## Stationarity

* Stationarity means that the statistical properties of a time series (or rather the process generating it) do not change over time.
* Stationarity is important because many useful analytical tools and statistical tests and models rely on it.

* Constant mean
* Constant variance
* Constant covariance between periods of identical distance

* All it states is that the covariance between time periods of identical lengths (let’s say 10 days/hours/minutes) should be identical to the covariance of some other period of the same length:

![image-2.png](attachment:image-2.png)



* why do we need stationarity? 2 reasons (the most important ones):
Stationary processes are easier to analyze
Stationarity is assumed by most of the algorithms

* How to check if given series is stationary or not.
We need to check autocorrelation
Autocorrelation is the similarity between observations as a function of the time lag between them.

* When plotting the value of the ACF for increasing lags (a plot called a correlogram), the values tend to degrade to zero quickly for stationary time series (see figure 1, right), while for non-stationary data the degradation will happen more slowly 

In [None]:
## Plotting the autocorrelation plot
from statsmodels.graphics.tsaplots import plot_acf

In [None]:
plot_acf(data);
## from the autocorrelation plot it is clear that given series is not stationary

In [None]:
## making it stationary by taking difference of 1
data1=data.diff(periods=1)

#diff() will subtract 1 cell value from another cell value within the same index.

In [None]:
#loading the data
data1.head()  

In [None]:
#Remove the 1st row
data1=data1.iloc[1:]

In [None]:
data1.head()

In [None]:
#A plot of the autocorrelation of a time series by lag 
plot_acf(data1);

In [None]:
## Creating training and test sets
train=data1[:100] #from 0th to 99th record - traning data
test=data1[100:] #from 100th record to end - testing data

In [None]:
# We cannot use train_test_split as it will randomly take the records for both the set. 
# But in time series, we take records in the given order only.

In [None]:
train.info()#info about datatype and null value

In [None]:
#pip install statsmodels

In [None]:
## Applying autoregressive model

In [None]:
import warnings
warnings.filterwarnings('ignore')
from statsmodels.tsa.ar_model import AR
ar_model = AR(train)
ar_m = ar_model.fit()

In [None]:
prediction=ar_m.predict(start=100,end=142)

In [None]:
import matplotlib.pyplot as plt

plt.plot(test)
plt.plot(prediction,color='green')#graph of test vs prediction

## ARIMA Model

In [None]:
# No need to do differencing as ARIMA does it

In [None]:
train=data[:100] #from 0th to 99th record - traning data
test=data[100:] #from 100th record to end - testing data

In [None]:
## importing the library
from statsmodels.tsa.arima_model import ARIMA

In [None]:
##Model object creation and fitting the model
model_arima = ARIMA(train, order=(2,1,2))#order= p,d,q (Randomly giving values for p,d,q)

#p - autoregressive model
#d - how many times differencing is done (integrated order)
#q - moving average model

model_arima_fit = model_arima.fit()#training

In [None]:
## evaluate the model
print(model_arima_fit.aic) # Akaike Information Criteria

In [None]:
# forecast() - forecasts data at a specific future point in time
# predict() - refers to future data in general
forecasting_9 = model_arima_fit.forecast(steps=9)[0] # forecasting for next 9 months

# The result of the forecast() function is an array containing the forecast value, 
# the standard error of the forecast, and the confidence interval information.
# we are only interested in the first element of this forecast.. so index 0 to take 1st value.

In [None]:
forecasting_9

In [None]:
## plotting the forecasted values
plt.plot(forecasting_9,color='green')

In [None]:
## Geeting the optimal values of p,q an d
import itertools

p =d= q=range(0,4) #values of p,d,q (range can be from 0 to 5 for large datasets)

pdq = list(itertools.product(p,d,q))# is used to find the cartesian product from the given iterator,
pdq  #list of all possible combinations of p,d,q

In [None]:
import warnings
warnings.filterwarnings('ignore')
##The Python try… except statement catches an exception. It is used to test code for an error which is written in the “try” statement.
#If an error is encountered, the contents of the “except” block are run.
for params in pdq:#iterating params over pdq
    try:
        model_arima = ARIMA(train, order=params)#training model on various pdq values
        model_arima_fit = model_arima.fit()#training
        print(params, model_arima_fit.aic)#printing parameter and aic values
    except:
        continue
#take lowest aic score        

In [None]:
## create the final model with lowest aic score parameter
model_arima = ARIMA(train, order=(3,1,3))

model_arima_fit = model_arima.fit()#training

In [None]:
forecast = model_arima_fit.forecast(steps=9)[0]

In [None]:
forecast

In [None]:
plt.plot(forecast,color='green')#line plot for prediction