# Time Series Analysis Exercises

## Chapter 2: Taxing Exercise - Compute the ACF
In this exercise, you will compute the array of autocorrelations for the H&R Block quarterly earnings and plot the autocorrelation function.You will compute the array of autocorrelations for the H&R Block quarterly earnings that is pre-loaded in the DataFrame HRB. Then, plot the autocorrelation function using the plot_acf module. This plot shows what the autocorrelation function looks like for cyclical earnings data. The ACF at lag=0 is always one, of course. In the next exercise, you will learn about the confidence interval for the ACF, but for now, suppress the confidence interval by setting alpha=1.


In [None]:
from statsmodels.tsa.stattools import acf
from statsmodels.graphics.tsaplots import plot_acf
import matplotlib.pyplot as plt

# Compute the acf array of HRB
acf_array = acf(HRB)
print(acf_array)

# Plot the acf function
plot_acf(HRB, alpha=1)
plt.show()

## Are We Confident This Stock is Mean Reverting?
In the last chapter, you saw that the autocorrelation of MSFT's weekly stock returns was -0.16. That autocorrelation seems large, but is it statistically significant? In other words, can you say that there is less than a 5% chance that we would observe such a large negative autocorrelation if the true autocorrelation were really zero? And are there any autocorrelations at other lags that are significantly different from zero?

Even if the true autocorrelations were zero at all lags, in a finite sample of returns you won't see the estimate of the autocorrelations exactly zero. In fact, the standard deviation of the sample autocorrelation is  where  is the number of observations, so if , for example, the standard deviation of the ACF is 0.1, and since 95% of a normal curve is between +1.96 and -1.96 standard deviations from the mean, the 95% confidence interval is . This approximation only holds when the true autocorrelations are all zero.

You will compute the actual and approximate confidence interval for the ACF, and compare it to the lag-one autocorrelation of -0.16 from the last chapter. The weekly returns of Microsoft is pre-loaded in a DataFrame called returns.


In [None]:
from statsmodels.graphics.tsaplots import plot_acf
from math import sqrt

# Compute and print the autocorrelation of MSFT weekly returns
autocorrelation = returns['Adj Close'].autocorr()
print(f'The autocorrelation of weekly MSFT returns is {autocorrelation:.2f}')

# Find the number of observations
nobs = len(returns)

# Compute the approximate confidence interval
conf = 1.96/sqrt(nobs)
print(f'The approximate confidence interval is +/- {conf:.2f}')

# Plot the autocorrelation function
plot_acf(returns, alpha=0.05, lags=20)
plt.show()

## Can't Forecast White Noise
A white noise time series is simply a sequence of uncorrelated random variables that are identically distributed. Stock returns are often modeled as white noise. Unfortunately, for white noise, we cannot forecast future observations based on the past - autocorrelations at all lags are zero.

You will generate a white noise series and plot the autocorrelation function to show that it is zero for all lags. You can use np.random.normal() to generate random returns. For a Gaussian white noise process, the mean and standard deviation describe the entire process.

Plot this white noise series to see what it looks like, and then plot the autocorrelation function.


In [None]:
import numpy as np

# Simulate white noise returns
returns = np.random.normal(loc=0.02, scale=0.05, size=1000)

# Print mean and standard deviation
mean = np.mean(returns)
std = np.std(returns)
print(f'The mean is {mean:.3f} and the standard deviation is {std:.3f}')

# Plot returns series
plt.plot(returns)
plt.show()

# Plot autocorrelation function
plot_acf(returns, lags=20)
plt.show()

## Generate a Random Walk
Whereas stock returns are often modeled as white noise, stock prices closely follow a random walk. In other words, today's price is yesterday's price plus some random noise.

You will simulate the price of a stock over time that has a starting price of 100 and every day goes up or down by a random amount. Then, plot the simulated stock price. If you hit the "Run Code" code button multiple times, you'll see several realizations.


In [None]:
# Generate random steps
steps = np.random.normal(loc=0, scale=1.0, size=500)
steps[0] = 0

# Simulate stock prices
P = 100 + np.cumsum(steps)

# Plot the simulated stock prices
plt.plot(P)
plt.title('Simulated Random Walk')
plt.show()

## Get the Drift
In the last exercise, you simulated stock prices that follow a random walk. You will extend this in two ways in this exercise.

You will look at a random walk with a drift. Many time series, like stock prices, are random walks but tend to drift up over time.
In the last exercise, the noise in the random walk was additive: random, normal changes in price were added to the last price. However, when adding noise, you could theoretically get negative prices. Now you will make the noise multiplicative: you will add one to the random, normal changes to get a total return, and multiply that by the last price.


In [None]:
# Generate random steps with drift
steps = np.random.normal(loc=0.001, scale=0.01, size=500) + 1
steps[0] = 1

# Simulate the stock price
P = 100 * np.cumprod(steps)

# Plot the simulated stock prices
plt.plot(P)
plt.title('Simulated Random Walk with Drift')
plt.show()

## Are Stock Prices a Random Walk?
Most stock prices follow a random walk (perhaps with a drift). You will look at a time series of Amazon stock prices, pre-loaded in the DataFrame AMZN, and run the 'Augmented Dickey-Fuller Test' from the statsmodels library to show that it does indeed follow a random walk.

With the ADF test, the "null hypothesis" (the hypothesis that we either reject or fail to reject) is that the series follows a random walk. Therefore, a low p-value (say less than 5%) means we can reject the null hypothesis that the series is a random walk.


In [None]:
from statsmodels.tsa.stattools import adfuller

# Run ADF test
results = adfuller(AMZN['Adj Close'])
print(results)

# Print the p-value
print(f'The p-value of the test on prices is: {results[1]}')

## How About Stock Returns?
 In this exercise. you will do the same thing for Amazon returns (percent change in prices) and show that the returns do not follow a random walk.



In [None]:
# Compute returns
AMZN_ret = AMZN.pct_change().dropna()

# Run ADF test
results = adfuller(AMZN_ret['Adj Close'])
print(f'The p-value of the test on returns is: {results[1]}')

## Seasonal Adjustment During Tax Season
Many time series exhibit strong seasonal behavior. The procedure for removing the seasonal component of a time series is called seasonal adjustment. For example, most economic data published by the government is seasonally adjusted.

You saw earlier that by taking first differences of a random walk, you get a stationary white noise process. For seasonal adjustments, instead of taking first differences, you will take differences with a lag corresponding to the periodicity.

Look again at the ACF of H&R Block's quarterly earnings, pre-loaded in the DataFrame HRB, and there is a clear seasonal component. The autocorrelation is high for lags 4,8,12,16,… because of the spike in earnings every four quarters during tax season. Apply a seasonal adjustment by taking the fourth difference (four represents the periodicity of the series). Then compute the autocorrelation of the transformed series.


In [None]:
# Import the plot_acf module from statsmodels
from statsmodels.graphics.tsaplots import plot_acf

# Seasonally adjust quarterly earnings
HRBsa = HRB.diff(4)

# Print the first 10 rows of the seasonally adjusted series
print(HRBsa.head(10))

# Seasonally adjust earnings
HRBsa = HRB.diff(4).dropna()

# Plot ACF of seasonally adjusted series
plot_acf(HRBsa)
plt.show()