# Time Analysis in Python

## 1. Video:Introduction to Course

Time series analysis refers to the examination of data arranged in a specific sequence over time. This subject holds significant importance in the financial domain. Time series are used in the analysis of various financial instruments such as stocks, bonds, commodities, and cryptocurrencies like Bitcoin. Time series models can be employed to fit these data to the models and make predictions for future values. These predictions play a crucial role in making financial decisions.

### Video 1 Exercise 1 

In [None]:
# Import pandas and plotting modules
import pandas as pd
import matplotlib.pyplot as plt

# Convert the date index to datetime
diet.index = pd.to_datetime(diet.index)

### Video 1 Exercise 2

In [None]:
# Import pandas
import pandas as pd

# Convert the stock index and bond index into sets
set_stock_dates = set(stocks.index)
set_bond_dates = set(bonds.index)

# Take the difference between the sets and print
print(set_stock_dates - set_bond_dates)

# Merge stocks and bonds DataFrames using join()
stocks_and_bonds = stocks.join(bonds,how='inner')

## 2. Video: Corelation of Two Time Series

The correlation coefficient is a measure of how two series change together. This statistic is used to measure the relationship between two variables. A high correlation indicates that the two series are strongly linked, while a low correlation represents a weak relationship. The correlation coefficient typically takes a value between -1 and 1. A correlation approaching 1 signifies a positive relationship, while a correlation approaching -1 indicates a negative relationship. By calculating the correlation coefficient, information about the strength and direction of the relationship between two variables can be obtained.

To better understand the correlation relationship, a scatter plot can be used. This graph is employed to visualize the relationship between two variables. Each data point represents a pair of values for the two variables, and when these points are scattered on the graph, they provide information about the nature of the relationship. If the data points approximately form a straight line, this may indicate a high positive correlation. However, if the data points are scattered randomly, it indicates a low or close-to-zero correlation.

### Video 2 Exercise 1

In [None]:
# Compute percent change using pct_change()
returns = stocks_and_bonds.pct_change()

# Compute correlation using corr()
correlation = returns['SP500'].corr(returns['US10Y'])
print("Correlation of stocks and interest rates: ", correlation)

# Make scatter plot
plt.scatter(returns['SP500'], returns['US10Y'])
plt.show()

### Video 2 Exercise 2

In [None]:
# Compute correlation of levels
correlation1 = levels['DJI'].corr(levels['UFO'])
print("Correlation of levels: ", correlation1)

# Compute correlation of percent changes
changes = levels.pct_change()
correlation2 = changes['DJI'].corr(changes['UFO'])
print("Correlation of changes: ", correlation2)

## Video 3: Simple Linear Regression

In the narration of this video, Least Squares Regression (OLS) and R-squared (R²) are discussed.

OLS is a regression analysis method, and its purpose is to model and predict the relationship between a dependent variable and one or more independent variables. OLS creates a regression equation that represents this relationship as a line or a plane. This equation represents the line or plane that best fits the data. The OLS method is used to minimize the errors between data points and the regression line to achieve this fit. These errors are expressed as the sum of the squares of the distances between data points and the line.

R-squared, or determination coefficient, is a statistical measure that quantifies the explanatory power of a regression model. The R-squared value indicates how much of the variance in the dependent variable is explained by the independent variables. The R-squared value ranges from 0 to 1. A R-squared value close to 0 indicates that the regression model is weak in explaining the variance of the dependent variable, while a R-squared value close to 1 indicates that the model explains the variance of the dependent variable very well. In other words, the higher the R-squared value, the better the regression model is considered to be.

### Video 3 Exercise 1

In [None]:
# Import the statsmodels module
import statsmodels.api as sm

# Compute correlation of x and y
correlation = x.corr(y)
print("The correlation between x and y is %4.2f" %(correlation))

# Convert the Series x to a DataFrame and name the column x
dfx = pd.DataFrame(x, columns=['x'])

# Add a constant to the DataFrame dfx
dfx1 = sm.add_constant(dfx)

# Regress y on dfx1
result = sm.OLS(y, dfx1).fit()

# Print out the results and look at the relationship between R-squared and the correlation above
print(result.summary())

## Video 4: Autocorrelation

This video has discussed autocorrelation in time series data. Autocorrelation refers to the correlation of a time series with its lagged values and is sometimes referred to as serial correlation. It indicates how related the series is to its past data. Specifically, a time series with positive autocorrelation indicates that past data helps in predicting future data, and such a series is considered to have a 'trend-following' characteristic.

Positive autocorrelation is also of interest to investors. Especially in financial markets, a popular strategy is to buy stocks that have fallen in the past week and sell them when they appreciate in value. This strategy takes into account the effects of autocorrelation by considering past price movements. However, in financial markets, many factors come into play, so this strategy may not always work. Investors should carefully manage their risks and conduct a broader analysis when making investment decisions.

In [5]:
import pandas as pd

MSFT = pd.read_csv('MSFT.csv')

# Tarih sütununu datetime nesnelerine dönüştürme ve index olarak ayarlama
MSFT['Date'] = pd.to_datetime(MSFT['Date'])
MSFT.set_index('Date', inplace=True)


### Video 4 Exercise 1

In [6]:
# Convert the daily data to weekly data
MSFT = MSFT.resample(rule='W').last()

# Compute the percentage change of prices
returns = MSFT.pct_change()

# Compute and print the autocorrelation of returns
autocorrelation = returns['Adj Close'].autocorr()
print("The autocorrelation of weekly returns is %4.2f" %(autocorrelation))

The autocorrelation of weekly returns is -0.16


### Video 4 Exercise 2

In [None]:
# Compute the daily change in interest rates
daily_diff = daily_rates.diff()

# Compute and print the autocorrelation of daily changes
autocorrelation_daily = daily_diff['US10Y'].autocorr()
print("The autocorrelation of daily interest rate changes is %4.2f" %(autocorrelation_daily))

# Convert the daily data to annual data
yearly_rates = daily_rates.resample(rule='A').last()

# Repeat above for annual data
yearly_diff = yearly_rates.diff()
autocorrelation_yearly = yearly_diff['US10Y'].autocorr()
print("The autocorrelation of annual interest rate changes is %4.2f" %(autocorrelation_yearly))

## Video 5: Autocorrelation Function

This video focuses on visualizing the plot_acf function used to examine autocorrelation between 2 or more lagged values. Additionally, the importance of this visualization is emphasized in determining how far the autocorrelation extends in lag time and observing if there is a seasonal pattern in the lags.

In [None]:
import pandas as pd

HRB = pd.read_csv('HRB.csv')

# Tarih sütununu datetime nesnelerine dönüştürme ve index olarak ayarlama
HRB['Date'] = pd.to_datetime(HRB['Date'])
HRB.set_index('Date', inplace=True)

### Video 5 Exercise 1

In [None]:
# Import the acf module and the plot_acf module from statsmodels
from statsmodels.tsa.stattools import acf
from statsmodels.graphics.tsaplots import plot_acf

# Compute the acf array of HRB
acf_array = acf(HRB)
print(acf_array)

# Plot the acf function
plot_acf(HRB,alpha=1)
plt.show()

### Video 5 Exercise 2

In [None]:
# Import the plot_acf module from statsmodels and sqrt from math
from statsmodels.graphics.tsaplots import plot_acf
from math import sqrt

# Compute and print the autocorrelation of MSFT weekly returns
autocorrelation = returns['Adj Close'].autocorr()
print("The autocorrelation of weekly MSFT returns is %4.2f" %(autocorrelation))

# Find the number of observations by taking the length of the returns DataFrame
nobs = len(returns)

# Compute the approximate confidence interval
conf = 1.96/sqrt(nobs)
print("The approximate confidence interval is +/- %4.2f" %(conf))

# Plot the autocorrelation function with 95% confidence intervals and 20 lags using plot_acf
plot_acf(returns, alpha=0.05, lags=20)
plt.show()

## 6. Video: White Noise

White noise is a process that, statistically, exhibits a constant mean, constant variance over time, and zero autocorrelation among all time lags in a time series. If a time series conforms to a normal distribution and contains white noise, it is commonly referred to as 'Gaussian White Noise.' Financial data often exhibit the characteristics of white noise and are therefore frequently used in financial analyses.

### Video 6 Exercise 1

In [None]:
# Import the plot_acf module from statsmodels
from statsmodels.graphics.tsaplots import plot_acf

# Simulate white noise returns
returns = np.random.normal(loc=0.02, scale=0.05, size=1000)

# Print out the mean and standard deviation of returns
mean = np.mean(returns)
std = np.std(returns)
print("The mean is %5.3f and the standard deviation is %5.3f" %(mean,std))

# Plot returns series
plt.plot(returns)
plt.show()

# Plot autocorrelation function of white noise returns
plot_acf(returns, lags=20)
plt.show()

## 7. Video: Random Walk

In financial data, the 'random walk' hypothesis expresses the idea that price movements are random and unpredictable. According to this hypothesis, past price movements or data cannot be used to predict future price movements. In other words, prices move up or down randomly, and these movements are unpredictable. So, if there is a random walk in stock prices, we cannot use past data for predictions.

### Video 7 Exercise 1

In [None]:
# Generate 500 random steps with mean=0 and standard deviation=1
steps = np.random.normal(loc=0, scale=1.0, size=500)

# Set first element to 0 so that the first price will be the starting stock price
steps[0]=0

# Simulate stock prices, P with a starting price of 100
P = 100 + np.cumsum(steps)

# Plot the simulated stock prices
plt.plot(P)
plt.title("Simulated Random Walk")
plt.show()

### Video 7 Exercise 2

In [None]:
# Generate 500 random steps 
steps = np.random.normal(loc=0.001, scale=0.01, size=500) + 1

# Set first element to 1
steps[0]=1

# Simulate the stock price, P, by taking the cumulative product
P = 100 * np.cumprod(steps)

# Plot the simulated stock prices
plt.plot(P)
plt.title("Simulated Random Walk with Drift")
plt.show()

### Video 7 Exercise 3

In [None]:
# Import the adfuller module from statsmodels
from statsmodels.tsa.stattools import adfuller

# Run the ADF test on the price series and print out the results
results = adfuller(AMZN['Adj Close'])
print(results)

# Just print out the p-value
print('The p-value of the test on prices is: ' + str(results[1]))

### Video 7 Exercise 4

In [None]:
# Import the adfuller module from statsmodels
from statsmodels.tsa.stattools import adfuller

# Create a DataFrame of AMZN returns
AMZN_ret = AMZN.pct_change()

# Eliminate the NaN in the first row of returns
AMZN_ret = AMZN_ret.dropna()

# Run the ADF test on the return series and print out the p-value
results = adfuller(AMZN_ret['Adj Close'])
print('The p-value of the test on returns is: ' + str(results[1]))

### Video 7 Exercise 5

In [None]:
# Import the adfuller module from statsmodels
from statsmodels.tsa.stattools import adfuller

# Create a DataFrame of AMZN returns
AMZN_ret = AMZN.pct_change()

# Eliminate the NaN in the first row of returns
AMZN_ret = AMZN_ret.dropna()

# Run the ADF test on the return series and print out the p-value
results = adfuller(AMZN_ret['Adj Close'])
print('The p-value of the test on returns is: ' + str(results[1]))

## Video 8: Stationarity

In financial data, 'stationarity' refers to a condition where the statistical properties of a time series do not change or remain constant over time. Stationarity is a critical concept in financial analysis and time series data modeling. A time series being stationary implies that its statistical characteristics remain stable over a specific period. This stability affects the reliable usability of historical data for predicting future price movements.

Without the assumption of stationarity, analyzing and predicting financial time series becomes more complex. This is because in a non-stationary series, variations are observed at all time points, necessitating a large number of prediction parameters. For example, a random walk is a common example of non-stationary series. Similarly, seasonal series are often non-stationary.

Therefore, when financial analysts and data scientists examine time series data, it is important to assess the stationarity status. Various methods, such as using lagged values or taking logarithms to eliminate effects like seasonality, can be employed to transform a non-stationary time series into a stationary one. A stationary time series enables more reliable predictions and yields more precise results in financial analysis

### Video 8 Exercise 1

In [None]:
# Import the plot_acf module from statsmodels
from statsmodels.graphics.tsaplots import plot_acf

# Seasonally adjust quarterly earnings
HRBsa = HRB.diff(4)

# Print the first 10 rows of the seasonally adjusted series
print(HRBsa.head(10))

# Drop the NaN data in the first four rows
HRBsa = HRBsa.dropna()

# Plot the autocorrelation function of the seasonally adjusted series
plot_acf(HRBsa)
plt.show()

## Video 9: Describe AR Model

The AR (AutoRegressive) model is a model used in statistics to analyze and predict time series data. This model is widely used in the analysis of data over time, especially in fields such as finance, economics, and natural phenomena. Essentially, the AR(p) model is defined as follows:

Y(t) = c + φ₁Y(t-1) + φ₂Y(t-2) + ... + φp*Y(t-p) + ε(t)

Where:

Y(t) represents the value at a specific time point in the time series.
Y(t-1), Y(t-2), ... Y(t-p) represent the values at past time points.
φ₁, φ₂, ... φp are the coefficients of the autoregressive terms, indicating how much past values influence future values.
ε(t) is the error term, indicating how much the prediction can deviate from the actual value.
c represents a constant term in the model.

Here are the fundamental explanations of the AR model:

What is AutoRegressive (AR)?

The AR model attempts to predict future values using past values of a time series, hence it is called "autoregressive."
The AR(p) model focuses only on the last p periods of the time series and makes predictions using the past values within those periods.
Parameters of the AR Model:

The AR(p) model includes p lag terms, which represent past values of the time series.
Another important parameter of the model is an error term ε (epsilon), which accounts for the errors in predictions and is used to assess the model's fitness.
Model Selection:

The choice of the p parameter for the AR model is a selection that requires an examination of the data and evaluation of the model's fitness. This is crucial for the model to make accurate predictions.
Statistical criteria such as AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) are commonly used to select the appropriate p value.
Limitations of the AR Model:

AR models can introduce uncertainties in future predictions because they rely on past data of the time series. Predicting sudden changes, in particular, can be challenging.
AR models may not account for certain characteristics of time series data, such as trends or seasonal components. Therefore, more complex models like ARIMA (AutoRegressive Integrated Moving Average) models can be used.
Importance of Time Series Data:

AR models are a tool for examining and predicting time series data. However, the accuracy of the model can decrease, especially if time series data is noisy or statistically irregular.
In conclusion, the AR model is a fundamental statistical model used in the analysis and prediction of time series data. However, like any model, it is important to understand the nature and characteristics of time series data and select appropriate parameters.

### Video 9 Exercise 1

In [None]:
# import the module for simulating data
from statsmodels.tsa.arima_process import ArmaProcess

# Plot 1: AR parameter = +0.9
plt.subplot(2,1,1)
ar1 = np.array([1, -0.9])
ma1 = np.array([1])
AR_object1 = ArmaProcess(ar1, ma1)
simulated_data_1 = AR_object1.generate_sample(nsample=1000)
plt.plot(simulated_data_1)

# Plot 2: AR parameter = -0.9
plt.subplot(2,1,2)
ar2 = np.array([1, 0.9])
ma2 = np.array([1])
AR_object2 = ArmaProcess(ar2, ma2)
simulated_data_2 = AR_object2.generate_sample(nsample=1000)
plt.plot(simulated_data_2)
plt.show()

### Video 9 Exercise 2

In [None]:
# Import the plot_acf module from statsmodels
from statsmodels.graphics.tsaplots import plot_acf

# Plot 1: AR parameter = +0.9
plot_acf(simulated_data_1, alpha=1, lags=20)
plt.show()

# Plot 2: AR parameter = -0.9
plot_acf(simulated_data_2, alpha=1, lags=20)
plt.show()

# Plot 3: AR parameter = +0.3
plot_acf(simulated_data_3, alpha=1, lags=20)
plt.show()

## Video 10 : Estimating and Forecasting AR Model

### Video 10 Exercise 1

In [None]:
# Import the ARIMA module from statsmodels
from statsmodels.tsa.arima.model import ARIMA

# Fit an AR(1) model to the first simulated data
mod = ARIMA(simulated_data_1, order=(1,0,0))
res = mod.fit()

# Print out summary information on the fit
print(res.summary())

# Print out the estimate for phi
print("When the true phi=0.9, the estimate of phi is:")
print(res.params[1])

### Video 10 Exercise 2

In [None]:
# Import the ARIMA and plot_predict from statsmodels
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_predict

# Forecast the first AR(1) model
mod = ARIMA(simulated_data_1, order=(1,0,0))
res = mod.fit()

# Plot the data and the forecast
fig, ax = plt.subplots()
simulated_data_1.loc[950:].plot(ax=ax)
plot_predict(res, start=1000, end=1010, ax=ax)
plt.show()

### Video 10 Exercise 3

In [None]:
# Forecast interst rates using an AR(1) model
mod = ARIMA(interest_rate_data, order=(1,0,0))
res = mod.fit()

# Plot the data and the forecast
fig, ax = plt.subplots()
interest_rate_data.plot(ax=ax)
plot_predict(res, start=0, end='2027', alpha=None, ax=ax)
plt.show()

### Video 10 Exercise 4

In [None]:
# Import the plot_acf module from statsmodels
from statsmodels.graphics.tsaplots import plot_acf

# Plot the interest rate series and the simulated random walk series side-by-side
fig, axes = plt.subplots(2,1)

# Plot the autocorrelation of the interest rate series in the top plot
fig = plot_acf(interest_rate_data, alpha=1, lags=12, ax=axes[0])

# Plot the autocorrelation of the simulated random walk series in the bottom plot
fig = plot_acf(simulated_data, alpha=1, lags=12, ax=axes[1])

# Label axes
axes[0].set_title("Interest Rate Data")
axes[1].set_title("Simulated Random Walk Data")
plt.show()

## Video 11: Choosing the Right Model

When conducting time series analysis, selecting the correct Autoregressive (AR) model is crucial. Autoregressive models are used to predict future values of a time series based on its past values. To choose the right AR model, partial autocorrelation function (PACF) and information criteria are employed.

Partial Autocorrelation Function (PACF):

PACF is a graph or function that illustrates the relationships occurring within a specific time interval of time series data.
PACF helps determine how many past values are associated with an AR model.
The PACF graph shows how partial autocorrelation coefficients change with time lag.
For model selection, sharp drops or the disappearance of significant correlations in the PACF graph can help determine the order of the AR model within a specific time range.

Information Criteria:

Information criteria are statistical measures that aid in comparing different AR models. The goal is to strike a balance between the model's complexity and its goodness of fit to the data.
Two common information criteria are known as the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). A lower AIC or BIC value indicates a better-fitting model.
These criteria can be calculated for different AR models, and the model with the lowest criterion value is selected.
The plot_acf function is a Python tool that helps plot the autocorrelation function (ACF) graph. This graph illustrates how autocorrelation varies with different time lags in a time series. The ACF graph can be used to determine potential AR model orders

### Video 11 Exercise 1

In [None]:
# Import the modules for simulating data and for plotting the PACF
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.graphics.tsaplots import plot_pacf

# Simulate AR(1) with phi=+0.6
ma = np.array([1])
ar = np.array([1, -0.6])
AR_object = ArmaProcess(ar, ma)
simulated_data_1 = AR_object.generate_sample(nsample=5000)

# Plot PACF for AR(1)
plot_pacf(simulated_data_1, lags=20)
plt.show()

# Simulate AR(2) with phi1=+0.6, phi2=+0.3
ma = np.array([1])
ar = np.array([1, -0.6, -0.3])
AR_object = ArmaProcess(ar, ma)
simulated_data_2 = AR_object.generate_sample(nsample=5000)

# Plot PACF for AR(2)
plot_pacf(simulated_data_2, lags=20)
plt.show()

### Video 11 Exercise 2

In [None]:
# Import the module for estimating an ARIMA model
from statsmodels.tsa.arima.model import ARIMA

# Fit the data to an AR(p) for p = 0,...,6 , and save the BIC
BIC = np.zeros(7)
for p in range(7):
    mod = ARIMA(simulated_data_2, order=(p,0,0))
    res = mod.fit()
# Save BIC for AR(p)    
    BIC[p] = res.bic
    
# Plot the BIC as a function of p
plt.plot(range(1,7), BIC[1:7], marker='o')
plt.xlabel('Order of AR Model')
plt.ylabel('Bayesian Information Criterion')
plt.show()

## Video 12 : Describe Model

The MA (Moving Average) model is a type of model used in statistics to analyze and forecast time series data. The MA model focuses on predicting future values by taking the average of data points over a specific time period. Here are the key features of the MA model:

Average Calculation: The MA model calculates the average of data points within a specific time interval (often consecutive time periods). These averages are used to predict future values.

Lag: The MA model evaluates the relationships between time intervals or lags in the data. For example, a 3-lag MA model uses the averages of data points taken over three time periods.

Repetitive Patterns: The MA model is useful for capturing repetitive periods or patterns in the data. Therefore, it is commonly used in the analysis of time series data with a specific pattern.

Heteroskedasticity Correction: The MA model can help correct heteroskedasticity issues (varying variance over time) in time series data and improve the statistical properties of the data.

MA(q) Model: The MA model is often expressed as "MA(q)," where "q" represents the degree of lag in the model. For example, an MA(2) model represents a 2-lag MA model.

The MA model can be used in conjunction with the AR (Autoregressive) model to create ARMA (Autoregressive Moving Average) models. These models are used to better understand time series data and make more accurate predictions of future values

### Video 12 Exercise 1

In [None]:
# Import the plot_acf module from statsmodels
from statsmodels.graphics.tsaplots import plot_acf

# Plot 1: MA parameter = -0.9
plot_acf(simulated_data_1, lags=20)
plt.show()

### Video 12 Exercise 2

In [None]:
# Import the plot_acf module from statsmodels
from statsmodels.graphics.tsaplots import plot_acf

# Plot 1: MA parameter = -0.9
plot_acf(simulated_data_1, lags=20)
plt.show()

## Video 13:Estimation and Forecasting an MA Model

### Video 13 Exercise 1

In [None]:
# Import the ARIMA module from statsmodels
from statsmodels.tsa.arima.model import ARIMA

# Fit an MA(1) model to the first simulated data
mod = ARIMA(simulated_data_1, order=(0,0,1))
res = mod.fit()

# Print out summary information on the fit
print(res.summary())

# Print out the estimate for the constant and for theta
print("When the true theta=-0.9, the estimate of theta is:")
print(res.params[1])

### Video 13 Exercise 2

In [None]:
# Import the ARIMA and plot_predict from statsmodels
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_predict

# Forecast the first MA(1) model
mod = ARIMA(simulated_data_1, order=(0,0,1))
res = mod.fit()

# Plot the data and the forecast
fig, ax = plt.subplots()
simulated_data_1.loc[950:].plot(ax=ax)
plot_predict(res, start=1000, end=1010, ax=ax)
plt.show()

### Video 14: ARMA Models

ARMA (AutoRegressive Moving Average) model is a statistical model used to analyze time series data and make predictions about future values. This model provides a mathematical approach to using the previous values of a time series and error terms to forecast future values.

The ARMA model consists of two fundamental components:

AR (AutoRegressive) Component: This component predicts future values based on previous values of the time series. The AR component uses a weighted sum of past time steps' values. It is denoted as AR(p), where "p" represents the number of past time steps considered.

MA (Moving Average) Component: This component makes predictions using previous error terms. The MA component employs a weighted sum of past error terms. It is denoted as MA(q), where "q" represents the number of past error terms considered.

The ARMA model is expressed as a combination of these two components and is often denoted as ARMA(p, q), where "p" represents the order of the AR component, and "q" represents the order of the MA component.

ARMA models are used to capture the statistical properties of time series data and make forecasts about future values. Data analysis and model fitting tests can be conducted to determine the optimal parameters for the model. ARMA models are widely used in fields such as economics, finance, meteorology, and various other areas for time series analysis.

### Video 14 Exercise 1

In [None]:
# import datetime module
import datetime

# Change the first date to zero
intraday.iloc[0,0] = 0

# Change the column headers to 'DATE' and 'CLOSE'
intraday.columns = ['DATE','CLOSE']

# Examine the data types for each column
print(intraday.dtypes)

# Convert DATE column to numeric
intraday['DATE'] = pd.to_numeric(intraday['DATE'])

# Make the `DATE` column the new index
intraday = intraday.set_index('DATE')

### Video 14 Exercise 2

In [None]:
# Notice that some rows are missing
print("If there were no missing rows, there would be 391 rows of minute data")
print("The actual length of the DataFrame is:", len(intraday))

### Video 14 Exercise 3

In [None]:
# Import plot_acf and ARIMA modules from statsmodels
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.arima.model import ARIMA

# Compute returns from prices
returns = intraday.pct_change()
returns = returns.dropna()

# Plot ACF of returns with lags up to 60 minutes
plot_acf(returns, lags=60)
plt.show()

# Fit the data to an MA(1) model
mod = ARIMA(returns, order=(0,0,1))
res = mod.fit()
print(res.params[1])

### Video 14 Exercise 4

In [None]:
# import the modules for simulating data and plotting the ACF
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.graphics.tsaplots import plot_acf

# Build a list MA parameters
ma = [.8**i for i in range(30)]

# Simulate the MA(30) model
ar = np.array([1])
AR_object = ArmaProcess(ar, ma)
simulated_data = AR_object.generate_sample(nsample=5000)

# Plot the ACF
plot_acf(simulated_data, lags=30)
plt.show()

## Video 15: Cointegration Models

Cointegration analysis is a commonly used method to examine the relationship between time series data and predict future price movements. Essentially, cointegration signifies a relationship where two or more time series exhibit similar movements and influence each other. This analysis is considered an important tool for better understanding financial data and making more informed investment decisions.

For example, this study aims to analyze the cointegrated relationship between Bitcoin and Ethereum.

### Video 15 Exercise 1

In [None]:
# Plot the prices separately
plt.subplot(2,1,1)
plt.plot(7.25*HO, label='Heating Oil')
plt.plot(NG, label='Natural Gas')
plt.legend(loc='best', fontsize='small')

# Plot the spread
plt.subplot(2,1,2)
plt.plot(7.25 *HO-NG, label='Spread')
plt.legend(loc='best', fontsize='small')
plt.axhline(y=0, linestyle='--', color='k')
plt.show()

### Video 15 Exercise 2

In [None]:
# Import the adfuller module from statsmodels
from statsmodels.tsa.stattools import adfuller

# Compute the ADF for HO and NG
result_HO = adfuller(HO['Close'])
print("The p-value for the ADF test on HO is ", result_HO[1])
result_NG = adfuller(NG['Close'])
print("The p-value for the ADF test on NG is ", result_NG[1])

# Compute the ADF of the spread
result_spread = adfuller(7.25 * HO['Close'] - NG['Close'])
print("The p-value for the ADF test on the spread is ", result_spread[1])

### Video 15 Exercise 3

In [None]:
# Import the statsmodels module for regression and the adfuller function
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller

# Regress BTC on ETH
ETH = sm.add_constant(ETH)
result = sm.OLS(BTC,ETH).fit()

# Compute ADF
b = result.params[1]
adf_stats = adfuller(BTC['Price'] - b*ETH['Price'])
print("The p-value for the ADF test is ", adf_stats[1])

## Video 16: Case Study

In the recent course, a case study was conducted involving the analysis of 150 years' worth of temperature data. Initially, an Augmented Dickey-Fuller (ADF) test was performed to determine whether the data exhibited a random walk pattern. Subsequently, first differences were taken to make the data stationary. Autocorrelation and partial autocorrelation functions were examined to analyze the temporal relationships within the data.

Next, time series models including AutoRegressive (AR), Moving Average (MA), and AutoRegressive Moving Average (ARMA) were applied to the data. Information criteria were used to select the best-fitting model among various alternatives. Finally, using the chosen model, temperature forecasts for the upcoming 30 years were made.

This study represents an important example of how statistical analyses can be employed to understand and predict future trends in temperature data.

### Video 16 Exercise 1

In [None]:
# Import the adfuller function from the statsmodels module
from statsmodels.tsa.stattools import adfuller

# Convert the index to a datetime object
temp_NY.index = pd.to_datetime(temp_NY.index, format='%Y')

# Plot average temperatures
temp_NY.plot()
plt.show()

# Compute and print ADF p-value
result = adfuller(temp_NY['TAVG'])
print("The p-value for the ADF test is ", result[1])

### Video 16 Exercise 2

In [None]:
# Import the modules for plotting the sample ACF and PACF
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Take first difference of the temperature Series
chg_temp = temp_NY.diff()
chg_temp = chg_temp.dropna()

# Plot the ACF and PACF on the same page
fig, axes = plt.subplots(2,1)

# Plot the ACF
plot_acf(chg_temp, lags=20, ax=axes[0])

# Plot the PACF
plot_pacf(chg_temp, lags=20, ax=axes[1])
plt.show()

### Video 16 Exercise 3

In [None]:
# Import the module for estimating an ARIMA model
from statsmodels.tsa.arima.model import ARIMA

# Fit the data to an AR(1) model and print AIC:
mod_ar1 = ARIMA(chg_temp, order=(1,0,0))
res_ar1 = mod_ar1.fit()
print("The AIC for an AR(1) is: ", res_ar1.aic)

# Fit the data to an AR(2) model and print AIC:
mod_ar2 = ARIMA(chg_temp, order=(2,0,0))
res_ar2 = mod_ar2.fit()
print("The AIC for an AR(2) is: ", res_ar2.aic)

# Fit the data to an ARMA(1,1) model and print AIC:
mod_arma11 = ARIMA(chg_temp, order=(1,0,1))
res_arma11 = mod_arma11.fit()
print("The AIC for an ARMA(1,1) is: ", res_arma11.aic)

### Video 16 Exercise 4

In [None]:
# Import the ARIMA module from statsmodels
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_predict

# Forecast temperatures using an ARIMA(1,1,1) model
mod = ARIMA(temp_NY, trend='t', order=(1,1,1))
res = mod.fit()

# Plot the original series and the forecasted series
fig, ax = plt.subplots()
temp_NY.plot(ax=ax)
plot_predict(res, start='1872', end='2046', ax=ax)
plt.show()