<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#What-is-Time-Series-Data?" data-toc-modified-id="What-is-Time-Series-Data?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>What is Time Series Data?</a></span><ul class="toc-item"><li><span><a href="#Examples" data-toc-modified-id="Examples-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Examples</a></span></li><li><span><a href="#Loading-in-time-series" data-toc-modified-id="Loading-in-time-series-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Loading in time series</a></span></li><li><span><a href="#Make-data-readable-as-a-datetime" data-toc-modified-id="Make-data-readable-as-a-datetime-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Make data readable as a datetime</a></span></li><li><span><a href="#Slicing-time-series-data" data-toc-modified-id="Slicing-time-series-data-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Slicing time series data</a></span></li><li><span><a href="#Follow-up:-Why-should-we-make-the-date-as-the-index?" data-toc-modified-id="Follow-up:-Why-should-we-make-the-date-as-the-index?-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Follow-up: Why should we make the date as the index?</a></span></li></ul></li><li><span><a href="#Resampling" data-toc-modified-id="Resampling-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Resampling</a></span><ul class="toc-item"><li><span><a href="#Downsampling" data-toc-modified-id="Downsampling-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Downsampling</a></span><ul class="toc-item"><li><span><a href="#Example" data-toc-modified-id="Example-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Example</a></span></li></ul></li><li><span><a href="#Upsampling" data-toc-modified-id="Upsampling-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Upsampling</a></span><ul class="toc-item"><li><span><a href="#Example" data-toc-modified-id="Example-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Example</a></span></li></ul></li></ul></li></ul></div>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# What is Time Series Data?

Important to consider progression of time
> - Is the temporal information a key focus of the data?  

## Examples

- Stock prices
- Temperature over the year
- Atmoshperic changes over the course of decades

## Loading in time series

In [None]:
# Load and display
df_temp = pd.read_csv("min_temp.csv")
display(df_temp.head(20))
display(df_temp.info())

## Make data readable as a datetime

In [None]:
# Creating a proper datetime using the string formatting
df_temp['Date'] = pd.to_datetime(df_temp['Date'], format='%d/%m/%y')

# Make the temporal data as the focus
df_temp = df_temp.set_index('Date')

In [None]:
display(df_temp.head(10))
display(df_temp.info())

## Slicing time series data

In [None]:
after_1990 = df_temp['1990':]
display(after_1990.head(15))

In [None]:
df_temp

## Follow-up: Why should we make the date as the index?

# Resampling

Converting the time series into a particular frequency

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling

## Downsampling

- resample at a lower rate
- may loose information
- more computationally efficient

### Example

In [None]:
# Average out so we have monthly means (compared to using days)
monthly = df_temp.resample('MS')
month_mean = monthly.mean()

In [None]:
month_mean.head(10)
df_temp.head(10)

## Upsampling

- resample at a higher rate
- should keep information

### Example

In [None]:
# Data to every 12hours but only fill the parts known (blank otherwise)
bidaily = df_temp.resample('12H').asfreq()
bidaily.head(10)

In [None]:
# Interpolate to every 12hours but fill the parts unknown (no blanks)
bidaily = df_temp.resample('12H').ffill()
bidaily.head(10)

In [None]:
hourly = df_temp.resample('1H').ffill()
hourly.head(30)

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Visualizing-Time-Series" data-toc-modified-id="Visualizing-Time-Series-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Visualizing Time Series</a></span><ul class="toc-item"><li><span><a href="#Showing-Changes-Over-Time" data-toc-modified-id="Showing-Changes-Over-Time-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Showing Changes Over Time</a></span><ul class="toc-item"><li><span><a href="#Line-Plot" data-toc-modified-id="Line-Plot-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Line Plot</a></span></li><li><span><a href="#Dot-Plot" data-toc-modified-id="Dot-Plot-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Dot Plot</a></span></li><li><span><a href="#Question-time:-Dot-vs-Line-Plots" data-toc-modified-id="Question-time:-Dot-vs-Line-Plots-1.1.3"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Question time: Dot vs Line Plots</a></span></li><li><span><a href="#Grouping-Plots" data-toc-modified-id="Grouping-Plots-1.1.4"><span class="toc-item-num">1.1.4&nbsp;&nbsp;</span>Grouping Plots</a></span></li><li><span><a href="#Example-all-separated-annual-(from-curriculum)" data-toc-modified-id="Example-all-separated-annual-(from-curriculum)-1.1.5"><span class="toc-item-num">1.1.5&nbsp;&nbsp;</span>Example all separated annual (from curriculum)</a></span></li><li><span><a href="#Example-all-together-annual-(from-curriculum)" data-toc-modified-id="Example-all-together-annual-(from-curriculum)-1.1.6"><span class="toc-item-num">1.1.6&nbsp;&nbsp;</span>Example all together annual (from curriculum)</a></span></li></ul></li><li><span><a href="#Showing-Distributions" data-toc-modified-id="Showing-Distributions-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Showing Distributions</a></span><ul class="toc-item"><li><span><a href="#Histogram" data-toc-modified-id="Histogram-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Histogram</a></span></li><li><span><a href="#Density" data-toc-modified-id="Density-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Density</a></span></li><li><span><a href="#Box-Plot" data-toc-modified-id="Box-Plot-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Box Plot</a></span></li><li><span><a href="#Heat-Maps" data-toc-modified-id="Heat-Maps-1.2.4"><span class="toc-item-num">1.2.4&nbsp;&nbsp;</span>Heat Maps</a></span></li></ul></li></ul></li></ul></div>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Visualizing Time Series

## Showing Changes Over Time

Can identify patterns and trends with visualizations

In [None]:
# New York Stock Exchange average monthly returns [1961-1966] from curriculum
nyse = pd.read_csv("NYSE_monthly.csv")
col_name= 'Month'
nyse[col_name] = pd.to_datetime(nyse[col_name])
nyse.set_index(col_name, inplace=True)

display(nyse.head(10))
display(nyse.info())

### Line Plot

In [None]:
nyse.plot(figsize = (16,6))
plt.show()

### Dot Plot

In [None]:
nyse.plot(figsize = (16,6), style="*")
plt.show()

### Question time: Dot vs Line Plots

Note the difference between this and the line plot

When would you want a dot vs a line plot?

### Grouping Plots

What if we wanted to look at year-to-year (e.g., temperature throughout many years)

Couple options to choose from

### Example all separated annual (from curriculum)

In [None]:
# Annual Frequency
year_groups = nyse.groupby(pd.Grouper(freq ='A'))

In [None]:
#Create a new DataFrame and store yearly values in columns 
nyse_annual = pd.DataFrame()

for yr, group in year_groups:
    nyse_annual[yr.year] = group.values.ravel()
    
# Plot the yearly groups as subplots
nyse_annual.plot(figsize = (13,8), subplots=True, legend=True)
plt.show()

### Example all together annual (from curriculum)

In [None]:
# Plot overlapping yearly groups 
nyse_annual.plot(figsize = (15,5), subplots=False, legend=True)
plt.show()

## Showing Distributions

Sometimes the distribution of the values are important.

What are some reasons?

- Checking for normality (for stat testing)
- First check on raw & transformed data

### Histogram

In [None]:
nyse.hist(figsize = (10,6))
plt.show()

In [None]:
# Bin it to make it more obvious if normal
nyse.hist(figsize = (10,6), bins = 7)
plt.show()

### Density

In [None]:
nyse.plot(kind='kde', figsize = (15,10))
plt.show()

### Box Plot

- Shows distribution over time
- Can help show outliers
- Seasonal trends

#### Example

In [None]:
# Generate a box and whiskers plot for temp_annual dataframe
nyse_annual.boxplot(figsize = (12,7))
plt.show()

### Heat Maps

Use color to show patterns throughout a time period for data

#### Example of how heat maps are useful

In [None]:
# Create a new DataFrame and store yearly values in columns for temperature
temp_annual = pd.DataFrame()

for yr, group in df_temp.groupby(pd.Grouper(freq ='A')):
    temp_annual[yr.year] = group.values.ravel()

##### Plotting each line plot in a subplot

Let's use our strategy in plotting multiple line plots to see if we can see a pattern:

In [None]:
# Plot the yearly groups as subplots
temp_annual.plot(figsize = (16,8), subplots=True, legend=True)
plt.show()

You likely will have a hard time seeing exactly the temperature shift is throughout the year (if it even exists!)

We can try plotting all the lines together to see if a pattern is more obvious in our visual.

##### Plotting all line plots in one plot

In [None]:
# Plot overlapping yearly groups 
temp_annual.plot(figsize = (15,5), subplots=False, legend=True)
plt.show()

That's great we can see that the temperature decreases in the middle of the data! But now we sacrificed being able to observe any pattern for an individual year. 

This is where using a heat map can help visualize patterns throughout the year for temperature! And of course, the heat map can be used for more than just temperature related data.

##### And finally, using a heat map to visualize a pattern

In [None]:
# Year and month 
year_matrix = temp_annual.T
plt.matshow(year_matrix, interpolation=None, aspect='auto', cmap=plt.cm.Spectral_r)
plt.show()

☝🏼 Look at that beautiful visual pattern! Makes me want to weep with joy for all the information density available to us!

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Types-of-Trends" data-toc-modified-id="Types-of-Trends-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Types of Trends</a></span><ul class="toc-item"><li><span><a href="#Stationary" data-toc-modified-id="Stationary-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Stationary</a></span><ul class="toc-item"><li><span><a href="#Definition:" data-toc-modified-id="Definition:-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Definition:</a></span></li><li><span><a href="#No-Trend" data-toc-modified-id="No-Trend-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>No Trend</a></span></li></ul></li><li><span><a href="#Linear-Trend" data-toc-modified-id="Linear-Trend-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Linear Trend</a></span><ul class="toc-item"><li><span><a href="#Upward" data-toc-modified-id="Upward-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Upward</a></span></li><li><span><a href="#Downward" data-toc-modified-id="Downward-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Downward</a></span></li></ul></li><li><span><a href="#Exponential" data-toc-modified-id="Exponential-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Exponential</a></span></li><li><span><a href="#Periodic" data-toc-modified-id="Periodic-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Periodic</a></span></li></ul></li><li><span><a href="#Assessing-Trends" data-toc-modified-id="Assessing-Trends-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Assessing Trends</a></span><ul class="toc-item"><li><span><a href="#Rolling-Statistics" data-toc-modified-id="Rolling-Statistics-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Rolling Statistics</a></span><ul class="toc-item"><li><span><a href="#Example" data-toc-modified-id="Example-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Example</a></span></li></ul></li><li><span><a href="#Dickey-Fuller-Test" data-toc-modified-id="Dickey-Fuller-Test-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Dickey-Fuller Test</a></span><ul class="toc-item"><li><span><a href="#Code-Example" data-toc-modified-id="Code-Example-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Code Example</a></span></li></ul></li></ul></li><li><span><a href="#Removing-Trends" data-toc-modified-id="Removing-Trends-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Removing Trends</a></span><ul class="toc-item"><li><span><a href="#Quick-Check:-Why-do-want-to-get-a-stationary-series?" data-toc-modified-id="Quick-Check:-Why-do-want-to-get-a-stationary-series?-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Quick Check: Why do want to get a stationary series?</a></span></li><li><span><a href="#Log-Transformation" data-toc-modified-id="Log-Transformation-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Log Transformation</a></span></li><li><span><a href="#Subtract-the-Rolling-Mean" data-toc-modified-id="Subtract-the-Rolling-Mean-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Subtract the Rolling Mean</a></span><ul class="toc-item"><li><span><a href="#Weighted-Rolling-Mean-(Weighted-Moving-Average)" data-toc-modified-id="Weighted-Rolling-Mean-(Weighted-Moving-Average)-3.3.1"><span class="toc-item-num">3.3.1&nbsp;&nbsp;</span>Weighted Rolling Mean (Weighted Moving Average)</a></span></li></ul></li><li><span><a href="#Differencing" data-toc-modified-id="Differencing-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Differencing</a></span><ul class="toc-item"><li><span><a href="#Example---NYSE" data-toc-modified-id="Example---NYSE-3.4.1"><span class="toc-item-num">3.4.1&nbsp;&nbsp;</span>Example - NYSE</a></span></li><li><span><a href="#Example---Temperature-over-a-decade" data-toc-modified-id="Example---Temperature-over-a-decade-3.4.2"><span class="toc-item-num">3.4.2&nbsp;&nbsp;</span>Example - Temperature over a decade</a></span></li></ul></li></ul></li><li><span><a href="#Time-Series-Decomposition" data-toc-modified-id="Time-Series-Decomposition-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Time Series Decomposition</a></span><ul class="toc-item"><li><span><a href="#Example" data-toc-modified-id="Example-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Example</a></span></li></ul></li></ul></div>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# New York Stock Exchange average monthly returns [1961-1966] from curriculum
nyse = pd.read_csv("NYSE_monthly.csv")
col_name= 'Month'
nyse[col_name] = pd.to_datetime(nyse[col_name])
nyse.set_index(col_name, inplace=True)

In [None]:
# Generated data 
years = pd.date_range('2012-01', periods=72, freq="M")
index = pd.DatetimeIndex(years)

np.random.seed(3456)
sales= np.random.randint(-4, high=4, size=72)
bigger = np.array([0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,3,3,3,3,
                   3,3,3,3,3,3,3,3,7,7,7,7,7,7,7,7,7,7,7,
                   11,11,11,11,11,11,11,11,11,11,18,18,18,
                   18,18,18,18,18,18,26,26,26,26,26,36,36,36,36,36])

data = pd.Series(sales+bigger+6, index=index)
ts = data

# Types of Trends

## Stationary

### Definition:
> images from [https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/](https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/)

- The series' **mean** is **not** a function of time
![https://www.analyticsvidhya.com/wp-content/uploads/2015/02/Mean_nonstationary.png](images/Mean_nonstationary.png)

- The series' **varaince** is **not** a function of time (heteroscedacity)
![https://www.analyticsvidhya.com/wp-content/uploads/2015/02/Var_nonstationary.png](images/Var_nonstationary.png)

- The series' **covaraince** is **not** a function of time
![https://www.analyticsvidhya.com/wp-content/uploads/2015/02/Cov_nonstationary.png](images/Cov_nonstationary.png)

### No Trend

In [None]:
data = nyse
data.plot(figsize=(12,6), linewidth=2, fontsize=14)
plt.xlabel(col_name, fontsize=20)
plt.ylabel("Monthly NYSE returns", fontsize=20)
plt.ylim((-0.15,0.15));

## Linear Trend

### Upward

![](https://github.com/learn-co-students/dsc-3-25-05-types-of-trends-online-ds-sp-000/raw/master/index_files/index_15_0.png)

### Downward

![](https://github.com/learn-co-students/dsc-3-25-05-types-of-trends-online-ds-sp-000/raw/master/index_files/index_19_0.png)

## Exponential

![](https://github.com/learn-co-students/dsc-3-25-05-types-of-trends-online-ds-sp-000/raw/master/index_files/index_22_0.png)

## Periodic

![](https://github.com/learn-co-students/dsc-3-25-05-types-of-trends-online-ds-sp-000/raw/master/index_files/index_25_0.png)

![](https://github.com/learn-co-students/dsc-3-25-05-types-of-trends-online-ds-sp-000/raw/master/index_files/index_30_0.png)

# Assessing Trends 

In [None]:
fig = plt.figure(figsize=(12,6))
plt.plot(data)
plt.xlabel("month", fontsize=16)
plt.ylabel("monthly sales", fontsize=16)
plt.show()

## Rolling Statistics

Take the average of a number of past data points (over a time period)

### Example

In [None]:
rolmean = ts.rolling(window = 6, center = False).mean()
rolstd = ts.rolling(window = 1, center = False).std()

fig = plt.figure(figsize=(12,7))
orig = plt.plot(ts, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)

## Dickey-Fuller Test

Statistical test for testing stationarity; $H_0$ is that time series is **not** stationary

Doc Resource: http://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html

### Code Example

In [None]:
from statsmodels.tsa.stattools import adfuller

dftest = adfuller(ts)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

In [None]:
print(dfoutput)

# Removing Trends

Goal is to make the time series stationary to use most modeling techniques

Most common reasons for non-stationary series are **seasonality** and **trends**

## Quick Check: Why do want to get a stationary series?

- We in a way want to get rid of the temporal dependence: leave just the "noise"
- That "noise" can then be modeled based on other features!
- Think of "stationary" like "independence"

## Log Transformation

Penalize higher values more (similar alternatives: square & cube roots)

- Clear & significant positive trend but maybe not linear
- Certain heteroscedasticity

In [None]:
fig = plt.figure(figsize=(12,10))


# No transformation
plt.subplot(3, 1, 1)
plt.plot(ts)
plt.xlabel("month", fontsize=16)
plt.ylabel("monthly sales", fontsize=16)

# Log transformation (linear and heteroscedastic)
plt.subplot(3, 1, 2)
plt.plot(pd.Series(np.log(ts), index=index), color="blue")
plt.xlabel("month", fontsize=14)
plt.ylabel("log(monthly sales)", fontsize=14)

# Square root transformation 
plt.subplot(3, 1, 3)
plt.plot(pd.Series(np.sqrt(ts), index=index), color="green")
plt.xlabel("month", fontsize=14)
plt.ylabel("sqrt(monthly sales)", fontsize=14)


plt.show()

Goal is to make this more linear; you can tell it's still not stationary

## Subtract the Rolling Mean

In [None]:
# Start with the square root transform
data_transform = pd.Series(np.sqrt(ts))

rolmean = data_transform.rolling(window = 4).mean()
fig = plt.figure(figsize=(11,7))
orig = plt.plot(data_transform, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)

In [None]:
# Subtract the moving average from the original data and check head for Nans
data_minus_rolmean = data_transform - rolmean

In [None]:
# Drop the NaN values from timeseries calculated above 
# (the first few values didn't have a rolling mean)
data_minus_rolmean.dropna(inplace=True)

In [None]:
fig = plt.figure(figsize=(11,7))
plt.plot(data_minus_rolmean, label='Sales - rolling mean')
plt.legend(loc='best')
plt.title('Sales while the rolling mean is subtracted')
plt.show(block=False)

### Weighted Rolling Mean (Weighted Moving Average)

We can have the window to be adjusted for more complex situations (stock prices)

Popular one: **Exponentially Weighted Moving Average**
> weights are given using an exponential decay factor
> 
> `DataFrame.ewm()` (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html)

## Differencing

Subtract off previous point values (lagging); essentially plots the difference from last point

Doc: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.diff.html

### Example - NYSE

In [None]:
data_diff = ts.diff(periods=1)

fig = plt.figure(figsize=(11,7))
plt.plot(data_diff,label='Sales - differenced')
plt.legend()
plt.title('Differenced sales series')
plt.show(block=False)

### Example - Temperature over a decade

In [None]:
data = pd.read_csv("min_temp.csv")
data.Date = pd.to_datetime(data.Date)
data.set_index('Date', inplace=True)

data.plot(figsize=(18,6), linewidth=1, fontsize=14)
plt.xlabel('Date', fontsize=14)
plt.ylabel('Temperature (Degrees Celsius)', fontsize=14);

# Period is 365 days (each point)
data_diff = data.diff(periods=365)
data_diff.plot(figsize=(18,6), linewidth=1, fontsize=14)
plt.xlabel('Date', fontsize=14)
plt.ylabel('Differenced Temperature (Degrees Celsius)', fontsize=14);

# Time Series Decomposition

Another method where we transform a series into multiple series

Commonly:
- Seasonal
- Trend
- Noise/Random/Irregular/Remainder

Identify if the model is **additive** or **Multiplicative**

> Multiplicative usually when the distribution/magnitude of fluctuations change

![](images/add-vs-mutli.png)

## Example

In [None]:
# Import passengers.csv and set it as a time-series object. Plot the TS
data = pd.read_csv('passengers.csv')
ts = data.set_index('Month')
ts.index = pd.to_datetime(ts.index)

ts.plot(figsize=(10,4), color="blue")

In [None]:
# log transform to get rid of that heteroscedasticity
np.log(ts).plot(figsize=(10,4), color="blue");

In [None]:
# import seasonal_decompose
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(np.log(ts))

# Gather the trend, seasonality and noise of decomposed object
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

In [None]:
# Plot gathered statistics
plt.figure(figsize=(12,8))
plt.subplot(411)
plt.plot(np.log(ts), label='Original', color="blue")
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend', color="blue")
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality', color="blue")
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals', color="blue")
plt.legend(loc='best')
plt.tight_layout()

Residuals look stationary now; can check with trend assessment like with a Dickey-Fuller test

In [None]:
# Drop NaN values from residuals.
ts_log_decompose = residual
ts_log_decompose = ts_log_decompose.dropna()

In [None]:
# Import adfuller
from statsmodels.tsa.stattools import adfuller

# Change the passengers column as required 
dftest = adfuller(ts_log_decompose['#Passengers']) 

In [None]:
# Print Dickey-Fuller test results
print ('Results of Dickey-Fuller Test:')

dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print (dfoutput)

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#White-Noise-Model" data-toc-modified-id="White-Noise-Model-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>White Noise Model</a></span><ul class="toc-item"><li><span><a href="#Properites" data-toc-modified-id="Properites-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Properites</a></span></li><li><span><a href="#Example" data-toc-modified-id="Example-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Example</a></span></li></ul></li><li><span><a href="#Random-Walk-Model" data-toc-modified-id="Random-Walk-Model-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Random Walk Model</a></span><ul class="toc-item"><li><span><a href="#Properites" data-toc-modified-id="Properites-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Properites</a></span></li><li><span><a href="#Example" data-toc-modified-id="Example-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Example</a></span></li><li><span><a href="#Variation:-Random-Walk-w/-a-drift" data-toc-modified-id="Variation:-Random-Walk-w/-a-drift-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Variation: Random Walk w/ a drift</a></span></li></ul></li></ul></div>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(13)

# White Noise Model

https://machinelearningmastery.com/white-noise-time-series-python/

Stationary series

## Properites

- Fixed/constant mean
- Fixed/constant variance
- No correlation over time (pattern is random)

Note that *Gaussian white noise* model has mean of 0 and variance of 1

## Example

In [None]:
# Create a date series
n_days = 100
date_series = pd.date_range(start='1/1/2015', periods=n_days)

# Create a normally distributed temperature values for each day
avg_temp = 10
std_temp = 3

temp_series = np.random.normal(avg_temp, std_temp, n_days)

In [None]:
time_series = pd.Series(data=temp_series, index=date_series)


ax = time_series.plot(figsize=(14,6))
ax.set_ylabel("Temperature (C)")
ax.set_xlabel("Date")
plt.show()

# Random Walk Model

https://machinelearningmastery.com/gentle-introduction-random-walk-times-series-forecasting-python/

## Properites

Previous value will influence the current value

- No specified mean
- No specified variance
- Strong dependence over time

Formula: $Y(t) = Y(t-1) + \epsilon(t)$

$\epsilon(t)$ is a white noise model with mean=0

## Example

In [None]:
avg = 0
std = 10
n_pts = 1500

# Dates & white noise (epsilon)
date_vals = pd.date_range(start='1/1/2015', periods=n_pts)
epsilon = np.random.normal(avg,std,n_pts)

# Generate data starting at y0 & "walk" based on epsilon (white noise model)
y0 = 0
vals = y0 + np.cumsum(epsilon) 
time_series =  pd.Series(vals, index=date_vals)

# Plot out the model
ax = time_series.plot(figsize=(14,6))
ax.set_ylabel("Value")
ax.set_xlabel("Date")
plt.show()

## Variation: Random Walk w/ a drift

"Drifts" with a particular value

Formula: $Y(t) = c + Y(t-1) + \epsilon(t)$

In [None]:
# Same values from above but have a constant "drift" in the epsilon
c = 0.5
vals_drift = y0 + np.cumsum(c + epsilon) 
time_series_drift =  pd.Series(vals_drift, index=date_vals)

# Plot out the model
ax = time_series.plot(figsize=(14,6))
time_series_drift.plot()
ax.set_ylabel("Value")
ax.set_xlabel("Date")
plt.show()

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Autoregressive-(AR)-Model" data-toc-modified-id="Autoregressive-(AR)-Model-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Autoregressive (AR) Model</a></span><ul class="toc-item"><li><span><a href="#Properties" data-toc-modified-id="Properties-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Properties</a></span></li><li><span><a href="#Example" data-toc-modified-id="Example-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Example</a></span></li></ul></li><li><span><a href="#Moving-Average-(MA)-Model" data-toc-modified-id="Moving-Average-(MA)-Model-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Moving Average (MA) Model</a></span><ul class="toc-item"><li><span><a href="#Example" data-toc-modified-id="Example-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Example</a></span></li></ul></li><li><span><a href="#ARMA-Model" data-toc-modified-id="ARMA-Model-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>ARMA Model</a></span><ul class="toc-item"><li><span><a href="#Higher-Order" data-toc-modified-id="Higher-Order-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Higher Order</a></span></li></ul></li></ul></div>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(13)

# Autoregressive (AR) Model

## Properties

Formula: $Y(t) = \mu + \phi * Y(t-1)+\epsilon(t)$

> $\phi = 0$: simply the white noise model (mean of $\mu$)
>
> $\phi \lt 0$: oscillates
>
> $\phi \gt 0$: previous points correlate with past (**autocorrelated**)

## Example

In [None]:
avg = 0
std = 4
n_pts = 100


mu = 7
phi = 0.1
# phi = 0.5
# phi = 0.9
# phi = -0.1
# phi = -0.5
# phi = -0.9

# Dates & white noise (epsilon)
date_vals = pd.date_range(start='1/1/2015', periods=n_pts)
epsilon = np.random.normal(avg,std,n_pts)

#
vals = []
y = 0
for e in epsilon:
    y = y * phi  + e + mu
    vals.append(y)
    
    
time_series =  pd.Series(vals, index=date_vals)

# Plot out the model
ax = time_series.plot(figsize=(14,6))
ax.set_ylabel("Value")
ax.set_xlabel("Date")
plt.show()

In [None]:
a = [1]*5
a = a * np.linspace(1,2,5)
a

# Moving Average (MA) Model

Formula: $Y(t) = \mu + \theta * \epsilon(t-1)+\epsilon(t)$

> $\theta = 0$: simply the white noise model (mean of $\mu$)
>
> $\theta \lt 0$: oscillates
>
> $\theta \gt 0$: previous points correlate with past (**autocorrelated**)

## Example

In [None]:
avg = 0
std = 4
n_pts = 100


mu = 7
# theta = 0.1
# theta = 0.5
# theta = 0.9
# theta = -0.1
theta = -0.5
# theta = -0.9

# Dates & white noise (epsilon)
date_vals = pd.date_range(start='1/1/2015', periods=n_pts)
epsilon = np.random.normal(avg,std,n_pts+1)

#
vals = []
y = 0
for i in range(len(epsilon)-1):
    y = epsilon[i] * theta  + epsilon[i+1] + mu
    vals.append(y)
    
    
time_series =  pd.Series(vals, index=date_vals)

# Plot out the model
ax = time_series.plot(figsize=(14,6))
ax.set_ylabel("Value")
ax.set_xlabel("Date")
plt.show()

# ARMA Model

Combine them together; can have both regression on past values (AR) and previous errors affect future errors (MA)

Formula: $Y(t) = \mu + \epsilon(t) + \phi * Y(t-1) +  \theta * \epsilon(t-1)$


## Higher Order

ARMA(2,1) yields

$$Y(t) = \mu + \epsilon(t) + \phi_2 * Y(t-2) + \phi_1 * Y(t-1) +  \theta * \epsilon(t-1)$$