#What is a Time Series?
What is a Time Series?
Time Series is a series of observations measured over time.

These observations are applicable to different fields such as

Cardiology (Heart Rate Monitor)

Finance (Stock Market Data)

Neurology (EEG Data)

Meteriology ( Temperature Measurements)

#Time Series Properties
Time Series Properties
Univariate Data

Values are indexed by time

Observations captured in constant time intervals
#Uniqueness of Time Series
The same data is captured over a series of time intervals and the models are also a bit unique to the data.

Ideally, when you collect observations, you capture multiple data attributes, so we apply multivariate models in those situations.

In this topic, since you collect a single attribute over time, it is unique.

#Course Structure
In the first part of the course, you will understand how to slice and dice time series data.

In the second part, you will understand how to fit models on time series data.

You will also learn how to implement the concepts through Python.


#Date as an Index
Indexing is the process of sequencing a given set of data points for easy searching and retrieval.

Ideally indexing is just numbering or sequencing the data based on a numeric.

For time series data, the date-time attribute is the index.

Here the data points are studied at different time steps.
#Why Use Date as Index?
Date as Index has several advantages. It gives you flexibility to aggregate and disintegrate your data based on any time step.

You will learn them through Python so that you can practically see how it works.

#Date as Index in Python
In Python, when you load a time series data, you can specify the index to be the date column.
```
import pandas as pd 
date = [pd.Timestamp("2017-01-01"),
        pd.Timestamp("2017-01-02"),
        pd.Timestamp("2017-01-03")]
timeSeries = pd.Series(np.random.randn(len(date)), index=date)
In the first step, we create date values. In the second step, we create random time series with date as index.

timeSeries.index 
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq=None)
```

#Retrieving the Values
If you want to pull out the value at a given timestamp, you can easily reference the date to pull the value, since it is a date time index.

timeSeries 
2017-01-01   -1.215527
2017-01-02   -0.401494
2017-01-03    1.741686
For the above code, if you want to get the value for Jan 1, all you have to code is,

timeSeries['2017-01-01']
-1.2155267083127297
This makes the work of the analyst easy.

#Retrieving a Range of Dates
Once your time series is indexed by date, you can retrieve a single date as well as a date range.

You pass the start and the end date to retrieve all values in the given range.

timeSeries['2017-01-01':'2017-01-03']
2017-01-01   -1.215527
2017-01-02   -0.401494
2017-01-03    1.741686
dtype: float64
In the above example, all the values specified in the range are retrieved.

#Date Range
In time series, sometimes the date values are not provided explicitly. So how will you generate the date time values and index in that scenario?

date_range() Function in Python helps in creating a set of sequential date time values in a given range.

There are many ways to create a sequence of dates based on the parameters passed to the function.

The date_range() function in Python has multiple features and parameters. You have the flexibility to generate date values in many ways. You will learn some of them in the following cards.

#Date Range Generate Dates
Say you know the start date and the end date, and you would want to generate a set of dates in that range.

You can do that using Python in the following way
```
pd.date_range(start='2017-01-01',end='2017-01-19',freq='B')
Output 
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05',
               '2017-01-06', '2017-01-09', '2017-01-10', '2017-01-11',
               '2017-01-12', '2017-01-13', '2017-01-16', '2017-01-17',
               '2017-01-18', '2017-01-19'],
              dtype='datetime64[ns]', freq='B')
```
Here the Start Date was Jan 1 , 2017 and the end date is Jan 19 , 2017. The freq = 'B' signifies business day. Hence you will not find any date that is a saturday or sunday.
7 of 11

#Generating Dates in Time Intervals
In the previous example, you saw how to generate dates in a range for business days.
You can go a bit granular and generate based on time steps (Hours, Minutes and Seconds).
Say you want to generate dates starting from Jan 1, 2017, 00:00:00 hrs every hour/minute or second you will do that in the following way In the following examples, the parameter freq= controls how the different date values are generated.
When freq = 'H':
```
pd.date_range(start="2017-01-01", periods=3, freq='H')
Output

DatetimeIndex(['2017-01-01 00:00:00', '2017-01-01 01:00:00',
               '2017-01-01 02:00:00'],
              dtype='datetime64[ns]', freq='H')
When freq = 'T':

pd.date_range(start="2017-01-01", periods=3, freq='T')`
Output

DatetimeIndex(['2017-01-01 00:00:00', '2017-01-01 00:01:00',
               '2017-01-01 00:02:00'],
              dtype='datetime64[ns]', freq='T')
When freq = 'S':

pd.date_range(start="2017-01-01", periods=3, freq='S')
Output

DatetimeIndex(['2017-01-01 00:00:00', '2017-01-01 00:00:01',
               '2017-01-01 00:00:02'],
              dtype='datetime64[ns]', freq='S')`

```
You can get the list of other offsets for freq from Offset Aliases

#Varying Frequencies
So far, you have seen how to generate date time indices at specific frequencies

Let's say you want to generate date-time values that are 1 day, 1 hour, 1 minute and 10 seconds apart.

How will you do that using Python?

See the code below.
```
import pandas as pd 
pd.date_range(start="2017-01-01", periods=5, freq='1D1h1min10s')
DatetimeIndex(['2017-01-01 00:00:00', '2017-01-02 01:01:10',
               '2017-01-03 02:02:20', '2017-01-04 03:03:30',
               '2017-01-05 04:04:40'],
              dtype='datetime64[ns]', freq='90070S')

```
You can give what kind of frequency you need by customizing the freq = parameter.

#Generating Custom Date Ranges
Instead of specifying a date, you can also specify a day from when you want to generate the date time.

For example, you want to generate date time stamp every Friday for five instances from a given start date.

pd.date_range(start="2017-01-01", periods=5, freq='W-FRI')
DatetimeIndex(['2017-01-06', '2017-01-13', '2017-01-20', '2017-01-27',
               '2017-02-03'],
              dtype='datetime64[ns]', freq='W-FRI')
freq= 'W-FRI' here W stands for Week.

#Combining Indices
You have generated separate indices with different dates and would want to combine them . How would you do that ?
The code below explains the steps.
```
a = pd.date_range(start="2017-01-01", periods=10, freq='BAS-JAN')
b = pd.date_range(start="2017-01-01", periods=10, freq='A-FEB')
a.union(b)
DatetimeIndex(['2017-01-02', '2017-02-28', '2018-01-01', '2018-02-28',
               '2019-01-01', '2019-02-28', '2020-01-01', '2020-02-29',
               '2021-01-01', '2021-02-28', '2022-01-03', '2022-02-28',
               '2023-01-02', '2023-02-28', '2024-01-01', '2024-02-29',
               '2025-01-01', '2025-02-28', '2026-01-01', '2026-02-28'],
              dtype='datetime64[ns]', freq=None)
```
First index generated 10 first business days in January starting 2017

Second index genetated 10 last buisiness days in February starting 2017.

The union() function helped in combining one index to another.

#Resampling Time Series
Resampling Time Series
You have your time series data captured in a specific time interval (frequency). This could be Hourly, Daily, and Weekly but you are interested in aggregating this date at a different frequency, i.e., Monthly, Yearly, etc. How do you think you can achieve that?

Resampling will help you.

Resampling is the process of converting your time series data from a given frequency to the desired frequency.

Upsampling is converting the data from a low frequency to a high frequency.

Downsampling is converting the data from a high frequency to a low frequency.

Why Resample?
The collected Time Series Data might not always be at uniform intervals. To study them, they have to be confined to regular time intervals.

Resampling helps in these situations.

#Downsample Scenario
Let us take an example where customers are visiting a supermarket.

You are interested in studying the customer incidence pattern at different time steps.

You can simulate that scenarios in the following way using Python.
```
import numpy as np
import pandas as pd
customerArrival = pd.date_range('18/09/2017 8:00', periods=600, freq='T')
custArrivalTs = pd.Series(np.random.randint(0, 100, len(customerArrival)), index=customerArrival)
custArrivalTs.head(10)
2017-09-18 08:00:00    32
2017-09-18 08:01:00    32
2017-09-18 08:02:00    85
2017-09-18 08:03:00    59
2017-09-18 08:04:00    53
2017-09-18 08:05:00    76
2017-09-18 08:06:00    60
2017-09-18 08:07:00    83
2017-09-18 08:08:00    16
2017-09-18 08:09:00    85
```
The data says that 32 customers have arrived at 8:00 and 76 customers have come at 8:05 . This is just a hypothetical number.

#Downsample Data
In the previous card, you saw how to create a random customer incidence scenario for every minute.

You are not interested in customer incidence every minute but you would want to get the mean customer incidence every 10 mins.

You will resample (downsample) your time series in the following way.
```
custArrivalTs.resample('10min').head()
2017-09-18 08:00:00    58.1
2017-09-18 08:10:00    46.2
2017-09-18 08:20:00    54.5
2017-09-18 08:30:00    48.1
2017-09-18 08:40:00    40.8
```
The default aggregation is using the arithmetic mean.

#Custom Aggregation
If you do not want the aggregation using the mean, you can specify your custom function.

See the below code to understand that process.
```
custArrivalTs.resample('10min', how='sum').head()
2017-09-18 08:00:00    581
2017-09-18 08:10:00    462
2017-09-18 08:20:00    545
2017-09-18 08:30:00    481
2017-09-18 08:40:00    408
Freq: 10T, dtype: int64
```

#Other Custom Aggregation Options
You have seen how to pass custom aggregation functions to downsample a time series data.

In this example, you will notice how to get the maximum incidence in a given time interval.

```
custArrivalTs.resample('1h', how='max').head()
2017-09-18 08:00:00    95
2017-09-18 09:00:00    99
2017-09-18 10:00:00    98
2017-09-18 11:00:00    98
2017-09-18 12:00:00    98
Freq: H, dtype: int64
```
The above output is the maximum incidence at a given hour.

#Using Lambda Function in Custom Aggregation
When you perform down sampling and you want to write your own custom function, you can accomplish that in the following manner.

```
import random
custArrivalTs.resample('1h', how=lambda m: random.choice(m)).head()
2017-09-18 08:00:00    79
2017-09-18 09:00:00    67
2017-09-18 10:00:00    83
2017-09-18 11:00:00    26
2017-09-18 12:00:00    20
Freq: H, dtype: int64
```

#Open High Low Close
Let's say you are analyzing customer incidence data. You would wish to see the opening, closing, high and low incidence values in a given interval of time.

How will you do that?

See the code below.
```
custArrivalTs.resample('1h', how='ohlc').head()
 	               open 	high    low 	close
2017-09-18 08:00:00 	32 	95 	1 	66
2017-09-18 09:00:00 	75 	99 	0 	20
2017-09-18 10:00:00 	16 	98 	1 	6
2017-09-18 11:00:00 	66 	98 	3 	92
2017-09-18 12:00:00 	50 	98 	2 	35
```
This scenario has a lot of applications in Financial Data Analysis.

#Upsampling
In upsampling, the frequency of the data points is more than that of the original data captured.

For example, you are creating ten time stamps with random values every one hour on a given date.
```
sampleRng = pd.date_range('9/18/2017 8:00', periods=10, freq='H')
sampleTs = pd.Series(np.random.randint(0, 100, len(sampleRng)), index=sampleRng)
sampleTs
2017-09-18 08:00:00    62
2017-09-18 09:00:00    22
2017-09-18 10:00:00    22
2017-09-18 11:00:00    55
2017-09-18 12:00:00    98
2017-09-18 13:00:00    95
2017-09-18 14:00:00    34
2017-09-18 15:00:00    47
2017-09-18 16:00:00    61
2017-09-18 17:00:00    70
Freq: H, dtype: int64
```

#Upsampling Example
In the previous card, you have seen how to create a sample time series every 1 hour.

If you want to study your data every 15 mins, you have to perform upsampling.

How to perform upsampling?

Observe the following usage.
```
sampleTs.resample('15min').head(10)
2017-09-18 08:00:00    62.0
2017-09-18 08:15:00     NaN
2017-09-18 08:30:00     NaN
2017-09-18 08:45:00     NaN
2017-09-18 09:00:00    22.0
2017-09-18 09:15:00     NaN
2017-09-18 09:30:00     NaN
2017-09-18 09:45:00     NaN
2017-09-18 10:00:00    22.0
2017-09-18 10:15:00     NaN
Freq: 15T, dtype: float64
```
If you have observed, the data shows time stamps at which the data was not captured as NaN.

How to resolve this issue?

#Forward Filling
The Forward and Backward filling can be used to fill missing values.

In forward filling, you have to fill the missing values based on the forward values.
```
sampleTs.resample('15min', fill_method='ffill').head()
2017-09-18 08:00:00    62
2017-09-18 08:15:00    62
2017-09-18 08:30:00    62
2017-09-18 08:45:00    62
2017-09-18 09:00:00    22
Freq: 15T, dtype: int64
```

#Backward Filling
In backward filling, the missing values are filled from backwards.
```
sampleTs.resample('15min', fill_method='bfill').head()
2017-09-18 08:00:00    62
2017-09-18 08:15:00    22
2017-09-18 08:30:00    22
2017-09-18 08:45:00    22
2017-09-18 09:00:00    22
Freq: 15T, dtype: int64
```

#Fill with Limitation
When you fill the missing values, you can also limit the number of fills.
```
sampleTs.resample('15min', fill_method='ffill', limit=2).head()
2017-09-18 08:00:00    40.0
2017-09-18 08:15:00    40.0
2017-09-18 08:30:00    40.0
2017-09-18 08:45:00     NaN
2017-09-18 09:00:00    87.0
Freq: 15T, dtype: float64
```
You have noticed that the number of fills is limited to 2 in the above example. This can be any number.

#Interpolation
Forward or Backward filling is a work around to fill the missing values.

It might not be accurate.

Some algorithms can fill the missing values based on the data patterns.

This approach works better to get more accurate insights from Time Series Data.

This method is called interpolation.

You will now learn how to perform interpolation in Python.

Interpolation Example
In the below example, you will see how to use interpolation to fix the missing values.
```
interEx = sampleTs.resample('15min')
interEx.head(10)
2017-09-18 08:00:00    40.0
2017-09-18 08:15:00     NaN
2017-09-18 08:30:00     NaN
2017-09-18 08:45:00     NaN
2017-09-18 09:00:00    87.0
2017-09-18 09:15:00     NaN
2017-09-18 09:30:00     NaN
2017-09-18 09:45:00     NaN
2017-09-18 10:00:00    51.0
2017-09-18 10:15:00     NaN
Freq: 15T, dtype: float64
interEx.interpolate().head(10)
2017-09-18 08:00:00    40.00
2017-09-18 08:15:00    51.75
2017-09-18 08:30:00    63.50
2017-09-18 08:45:00    75.25
2017-09-18 09:00:00    87.00
2017-09-18 09:15:00    78.00
2017-09-18 09:30:00    69.00
2017-09-18 09:45:00    60.00
2017-09-18 10:00:00    51.00
2017-09-18 10:15:00    57.25
Fre
q: 15T, dtype: float64
```

In [0]:
import pandas as pd
pd.date_range(start='2017-01-01',end='2017-01-19',freq='B')
pd.date_range(start="2017-01-01", periods=5, freq='1D1h1T10s') == pd.date_range(start="2017-01-01", periods=5, freq='1D1h1min10s')
pd.date_range(start="2017-01-01", periods=5, freq='W-FRI')

a = pd.date_range(start="2017-01-01", periods=10, freq='BAS-JAN')
print(a)
b = pd.date_range(start="2017-01-01", periods=10, freq='A-FEB')
print(b)
a.union(b)


In [0]:
import numpy as np
customerArrival = pd.date_range('18/09/2017 8:00', periods=600, freq='T')
custArrivalTs = pd.Series(np.random.randint(0, 100, len(customerArrival)), index=customerArrival)
custArrivalTs.head(10)
custArrivalTs.resample('M',how='max',)

In [0]:
sampleRng = pd.date_range('2017-01-01', periods=6, freq='')
sampleTS = pd.Series(np.random.randint(0, 100, len(sampleRng)), index=sampleRng)
print(sampleTS)

st = pd.DataFrame(sampleTs.resample('M',)).head()
print(st)

In [0]:
###Start code here
upsample = closeTS.resample('M', how='max')
print(upsample)###End code(approx 1 line)
upsample.to_csv("output.txt")

#List of Time Zones
List of Time Zones
There are many time zones in this world.

One of the most used standard time zone is (coordinated universal time) UTC.

All other time zones are expressed as offset of UTC. For example: US Eastern Time Zone is 4 hours behind UTC during Daylight saving and 5 hours behind rest of the year.

#Time Zones in Python
To work with time zones, we can use pytz package in Python
```
import pytz 
pytz.common_timezones[-5:]
['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']
```
In the example above, we have selected some common timezones.

Time Zone Object
```
usEastTz = pytz.timezone('US/Eastern')
<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>
```
When you execute the above command, you will get the current time at the specified time zone.

In the following cards, you will learn how to use timezone extensively.

#Localization
Localization is the first step towards standardizing the time zone. Any specific time stamp is first localized to a given time zone.
You will now learn how to set your datetime index to a specific time zone.

import pandas as pd 
import random 
timeZoneRng = pd.date_range('9/18/2017 9:30', periods=6, freq='D',tz='UTC')
timeZoneTs = pd.Series(np.random.randn(len(timeZoneRng)), index=timeZoneRng)
timeZoneTs.index.tz
<UTC> 
In the example above, the given timezone is localized to UTC using the tz= parameter.

You can also localize using the tz_localize() function.

#Conversion
In the previous card, you have seen how to localize your date-time value to a particular time zone.

If you want to convert your date-time value to another time zone you can use the tz_convert function.
```
timeZoneTs
2017-09-18 09:30:00+00:00   -1.825521
2017-09-19 09:30:00+00:00    0.961487
2017-09-20 09:30:00+00:00   -0.978146
2017-09-21 09:30:00+00:00    0.960428
2017-09-22 09:30:00+00:00   -0.077467
2017-09-23 09:30:00+00:00   -0.761420
Freq: D, dtype: float64
timeZoneTs.tz_convert('US/Eastern')
2017-09-18 05:30:00-04:00   -1.825521
2017-09-19 05:30:00-04:00    0.961487
2017-09-20 05:30:00-04:00   -0.978146
2017-09-21 05:30:00-04:00    0.960428
2017-09-22 05:30:00-04:00   -0.077467
2017-09-23 05:30:00-04:00   -0.761420
Freq: D, dtype: float64
```

#Using Timestamp
You can create date values and convert them to different time zones and also perform similar operations with time stamp values.

```
sampleTimeStamp =  pd.Timestamp('2011-09-19 04:00')
timeStamp_utc = sampleTimeStamp.tz_localize('UTC')
timeStamp_utc
Timestamp('2011-09-19 04:00:00+0000', tz='UTC')
timeStamp_utc.tz_convert('US/Eastern')
Timestamp('2011-09-19 00:00:00-0400', tz='US/Eastern')
```
The above example explains how to create a sample timestamp using TimeStamp function, Localize and Convert the timestamp to the desired value.


#Daylight Savings
Some timezones follow the daylight savings concept whereas some done.

To offset the time based on Daylight Savings, you can use the DateOffset() function.

The example below explains the steps.
```
# 30 minutes before DST transition
In [440]: from pandas.tseries.offsets import Hour
In [441]: stamp = pd.Timestamp('2012-03-12 01:30', tz='US/Eastern')
In [442]: stamp
Out[442]: <Timestamp: 2012-03-12 01:30:00-0400 EDT, tz=US/Eastern>
In [443]: stamp + Hour()
Out[443]: <Timestamp: 2012-03-12 02:30:00-0400 EDT, tz=US/Eastern>
# 90 minutes before DST transition
In [444]: stamp = pd.Timestamp('2012-11-04 00:30', tz='US/Eastern')
In [445]: stamp
Out[445]: <Timestamp: 2012-11-04 00:30:00-0400 EDT, tz=US/Eastern>
In [446]: stamp + 2 * Hour()
Out[446]: <Timestamp: 2012-11-04 01:30:00-0500 EST, tz=US/Eastern>

```

#Combining Different Timezones
When you work in a Multi National Company, you can get data from different time zone. But you have to bring them to one standard for working.

In the below example, you will learn how to combine multiple timezones.
```
dateRng = pd.date_range('9/19/2017 9:30', periods=10, freq='B')
timeSeries =  pd.Series(np.random.randn(len(dateRng)), index=dateRng)
tz1 = timeSeries[:7].tz_localize('Asia/Singapore')
tz2 = tz1[2:].tz_convert('Asia/Seoul')
combine = tz1 + tz2
combine.index
DatetimeIndex(['2017-09-19 01:30:00+00:00', '2017-09-20 01:30:00+00:00',
               '2017-09-21 01:30:00+00:00', '2017-09-22 01:30:00+00:00',
               '2017-09-25 01:30:00+00:00', '2017-09-26 01:30:00+00:00',
               '2017-09-27 01:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='B')

```


#Plotting using Python
Plotting is easy with Python. You just have to pass the time series to the plot function.
```
%matplotlib inline 
import pandas as pd
import numpy as np 
sampleRng = pd.date_range(start='2017', periods=120, freq='MS')
sampleTs = pd.Series(np.random.randint(-10, 10, size=len(sampleRng)), sampleRng).cumsum()
sampleTs.head()

sampleTs.plot(c='r', title='Sample time series')
```
In the above code, the time series is passed to the plot function. The necessary aesthetics like color, gridline, and scale can also be passed.
The above code creates a sample time series for plotting.

#Lag Plot
The Lag Plot is a very important and useful visualization for Time Series Data.

Time Series is a Univariate Data.

In the lag plot, you plotted the actual data against the data with a time lag. This helps in determining how the current data is predicting the future data.

#Lag Plot Using Python
Lag Plot Using Python
The below code explains how to create a sample lag plot in Python. You can use the time series created in the previous example.
```
from pandas.tools.plotting import lag_plot
lag_plot(sampleTs)
```

#Auto Correlation Plot
In the lag plot, we have just seen how the data is scattered when plotted against one-time lag.

Autocorrelation plot goes one step further.

Auto Correlation refers to correlating the data with itself. Here we are correlating the data with a one-time lag.

The plot gives a more accurate picture of how the data point is correlated among themselves.

#Autocorrelation Plot in Python
Autocorrelation Plot in Python
from pandas.tools.plotting import autocorrelation_plot
autocorrelation_plot(sampleTs)
The plot drawn above shows the autocorrelation plot for the time series.

When the autocorrelation plot shows an exponential behavior, the time series is stationary.

In [0]:
sampleTs.resample('2A').plot(c='b', ls='--')
#In the above code, we have done a resampling of the time series annually and then plotted the values.
sampleTs.resample('5A').plot(c='g', ls='-.')
#In the above code, we have resampled further and then plot the values.

#Stationarity
Stationarity is a very significant property in Time Series Analysis. In Time series, data is collected at different time intervals. The data might behave in a deterministic or stochastic nature.

Models can be applied only when your data is deterministic. If the nature of the data is stochastic then the model results will not be interpretable. Hence we have to check this property before applying the model.

#What is Stationarity?
In statistical terms, the mean, variance and the temporal correlation remain constant over time.

A simpler definition is that there are no seasonal or trend components in the time series.

#How to Check for Stationarity?
Data Visualization: You can look at your data and see if there are any trends or patterns. This is a very crude approach.

Summary Statistics: You can take some summary statistics at different time intervals and see how the data behaves.

Statistical Tests: You can apply certain specific statistical tests and check if your time series supports stationarity property.

#Augmented Dickey - Fuller test
ADF test is the best way to determine if the time series data is stationary or not.

This kind of test is known as Unit Root Test.

The main objective of this test is that it identifies how the trend component determines the time series.

#Statistics Behind ADF Test
ADF test makes use of an autoregressive model and optimizes its criteria across multiple lag values.

The Null Hypothesis supposes that the time series is non-stationary.

The alternate hypothesis is that the time series is stationary.

#Interpreting the Results
Null Hypothesis: H0 - If accepted then the time series data is non-stationary, and it has a unit root.

Alternate Hypothesis: H1 - The null hypothesis is rejected. The time series data is stationary and does not have any unit root.

Results

p-value > 0.05: Accept the (H0), the data is non-stationary and has a unit root.

p-value <= 0.05: Reject the H0, the data is stationary and does not have a unit root.

#Auto Correlation Function
Another way of determining stationarity is Autocorrelation Function. Here, you find the correlation between two data points that are one time step away.

When you visualize this correlation, you can get insights on the stationarity of the time series.

If the ACF plot is having an exponential decay, it means the time series is stationary.

#Stationarity Check
In Python, the statsmodels package has a method named adffuller that can be used for stationarity check.
```
from statsmodels.tsa.stattools import adfuller
Once you pass your time series to this method, you will be able to get the results.

The output has the following entities.

ADF Statistic: 
p-value: 
	1%: 
	5%: 
	10%:

```
The more negative the ADF statistic value is the more likely the data is stationary.

The ADF Statistic should be compared to critical p-values that are at 1, 5, and 10%.

If the ADF statistic value is less than the critical value at 5% and the p-value is less than 0.05, then we can reject the null hypothesis that the data is non-stationary with 95% confidence level.

#Sample Data Creation
Let us create a sample random time series.
```
import random 
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import adfuller
sampleRng = pd.date_range(start='2017', periods=120, freq='MS')
sampleTs = pd.Series(np.random.randint(-10, 10, size=len(sampleRng)), sampleRng).cumsum()

tsResult = adfuller(sampleTs)
print('ADF Statistic: %f' % tsResult[0])
print('p-value: %f' % tsResult[1])
for key, value in tsResult[4].items():
    print('\t%s: %.3f' % (key, value))
Output

ADF Statistic: -1.328310
p-value: 0.616123
	1%: -3.487
	5%: -2.886
	10%: -2.580
```

Interpreting ADF Test Results
The ADF Statistic value is -1.328310. It is negative.

The p-value: 0.616123 and is greater than 0.05 so we accept the null hypothesis, which means the data is non-stationary.

#Components Explained
Trend: This component shows the overall series behavior - the slow change of values over time.

Season: This shows the changes that happen in cycles that are less than one year.

Cycles: Changes that happen for more than a year.

Random: Anything that is not included in the above three components.

The underlying assumption has to be that the time series data is stationary.
#Steps in Time Series Analysis
Few steps to be followed while performing time series analysis:

Check for Stationarity.

Decompose the model into its various components.

Analyse the components.

Fit the time series forecasting model and predict future values.
#Time Series Difference
Apart from the decomposing the Time Series, there is another method, you can follow while analysis is the difference.

You take the difference between two time periods. The difference can have a shift of one time period or more.

This process also helps in understanding the data better.
#Decomposing Using Python
Python used statsmodels package seasonal_decompose method for time series decomposing.

You can just pass the time series and call the respective decomposing function to get the results.
#Seasonal Decomposition
```
from statsmodels.tsa.seasonal import seasonal_decompose
sampleTs_decomp = seasonal_decompose(sampleTs, freq=12) 
sampleTs_trend = sampleTs_decomp.trend 
sampleTs_seasonal = sampleTs_decomp.seasonal 
sampleTs_residual = sampleTs_decomp.resid
Here you can get the trend, seasonal, and residuals separately for the time series.
sampleTs_seasonal.plot()
```
#Modeling Time Series
So far, you have seen how to slice and dice the time series data and how to check for stationarity. The next logical step in time series analysis is forecasting. Forecasting in time series can be done in several ways.

You will be learning about the following methodologies:

Autoregressive

Moving Average

Autoregressive Moving Average

Autoregressive Integrated Moving Average

In Autoregression, you use the current value of the variable to predict its future values.

Here, the current and past time stamp values of the time series are used to predict the future values.

#Autoregression Using Python
```
from statsmodels.tsa.arima_model import ARIMA 
model = ARIMA(ts, order=(1, 1, 0)) 
predValues = model.fit()
```
The sample code above explains how Auto Regression is implemented in Python.

The parameter order = is very important in calling the right function for forecasting.

#What is Moving Average?
What is Moving Average?
Moving average is another way to predict the time series data.

Here the dependent variable is expressed as a function of the previous values along with an average component.

Average component keeps moving along the time series.
Moving Average Using Python
```
model = ARIMA(ts, order=(0, 1, 1)) 

movingAvgRes = model.fit() 
```
ARIMA
ARIMA is the combination of Autoregressive and Moving Average.

ARIMA stands for Autoregressive Integrated Moving Average.

It is another model used for forecasting in Time Series Analysis.

#Steps in Time Series
The first step is to visualize the time series.

The second step is to make the data stationary.

 - This can be accomplished by 

 - Detrending 

 - Differencing

 - Seasonality 
Getting the optimal parameters through Auto Correlation and Partial Auto Correlation

Build Model (AR , MA , ARMA , ARIMA) using the parameters

Make Predictions