### FE670 - Assignment #1

**Author**: Sid Bhatia

**Date**: October 5th, 2023

**Pledge**: I pledge my honor that I have abided by the Stevens Honor System.

**Professor**: Dr. Steve Yang

#### Problem #1

We can simulate independent random walk price
processes, each representing the price of a company’s stock, over
1,000 periods using the recursive formulation:

$$P_i(2) = 1 + 0.007 \times \epsilon_i(2)$$
$$P_i(3) = (1 + 0.007 \times \epsilon_i(3) \times (1 + 0.007 \times \epsilon_i(2)))$$
$$\cdots$$
$$P_i(n) = (1 + 0.007 \times \epsilon_i(n)) \times \cdots \times (1 + 0.007 \times \epsilon_i(3)) \times (1 + 0.007 \times \epsilon_i(2))

where $P_i(n)$ represents the $i$-th stock's price at period $n$, $e_i(n)$ are normally normally distributed random processes, and we assume $P_i(1) = 1$. The level of volatility, $0.007$, is compatible with realistic market values.

Let’s take two random price datasets: stock1 data.csv and
stock2 data.csv. Now we will use ChatGPT3.5 to generate a
Python program to test cointegration of these two random stocks.
Please first create a ChatGPT account using the following website: https://chat.openai.com/chat.

Once you created the account and try to ask ChatGPT the
following question: “Please write a Python code to test cointegration of two stocks using both ADF and Johansen test.”
Test the Python code generated by ChatGPT and answer the
following questions:

##### i.

*Does the code generated by ChatGPT work with the data
you generated?*

In [2]:
import pandas as pd
import yfinance as yf
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.vector_ar.vecm import coint_johansen

# Define the stock tickers and the date range
ticker1 = "AAPL"
ticker2 = "MSFT"
start_date = "2020-01-01"
end_date = "2021-12-31"

# Fetch stock price data from Yahoo Finance
df1 = yf.download(ticker1, start=start_date, end=end_date)['Adj Close']
df2 = yf.download(ticker2, start=start_date, end=end_date)['Adj Close']

# Ensure the data is sorted by date
df1 = df1.sort_index()
df2 = df2.sort_index()

# Perform the ADF test for each stock
adf_result1 = adfuller(df1)
adf_result2 = adfuller(df2)

print("ADF Test Results for Stock 1:")
print(f"ADF Statistic: {adf_result1[0]}")
print(f"P-Value: {adf_result1[1]}")
print(f"Critical Values: {adf_result1[4]}")

print("\nADF Test Results for Stock 2:")
print(f"ADF Statistic: {adf_result2[0]}")
print(f"P-Value: {adf_result2[1]}")
print(f"Critical Values: {adf_result2[4]}")

# Combine the stock price data into a DataFrame
price_df = pd.concat([df1, df2], axis=1)
price_df.columns = [ticker1, ticker2]

# Perform the Johansen test for cointegration
johansen_result = coint_johansen(price_df, det_order=0, k_ar_diff=1)

print("\nJohansen Cointegration Test Results:")
print(f"Trace Statistic: {johansen_result.lr1}")
print(f"Critical Values (Trace Statistic 90%): {johansen_result.cvt}")
print(f"Eigenvalues: {johansen_result.eig}")

# Check the cointegration hypothesis based on the Trace Statistic
trace_statistic_critical_value = johansen_result.cvt[:, 0]  # 90% confidence
if johansen_result.lr1[0] > trace_statistic_critical_value[0]:
    print("\nReject the null hypothesis: Cointegration is present.")
else:
    print("\nFail to reject the null hypothesis: No cointegration.")


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
ADF Test Results for Stock 1:
ADF Statistic: -0.19413459769564456
P-Value: 0.9392171559832246
Critical Values: {'1%': -3.4434437319767452, '5%': -2.8673146875484368, '10%': -2.569845688481135}

ADF Test Results for Stock 2:
ADF Statistic: -0.039214857513834016
P-Value: 0.9551663766326768
Critical Values: {'1%': -3.4436568932270095, '5%': -2.8674084917497074, '10%': -2.5698956806372832}

Johansen Cointegration Test Results:
Trace Statistic: [4.11817378 0.0287301 ]
Critical Values (Trace Statistic 90%): [[13.4294 15.4943 19.9349]
 [ 2.7055  3.8415  6.6349]]
Eigenvalues: [8.11321095e-03 5.72296407e-05]

Fail to reject the null hypothesis: No cointegration.


The code by ChatGPT seems to work decently, using the correct libraries (pandas, yfinance, statsmodels). It formulates the data into pandas DataFrames, extracts the relevant metrics needed from the correct tests for unit root stationarity (ADF) as well as cointegration (Johansen) test. It retrieves the critical values necessary to test the respective null hypotheses.

##### ii.

*Please provide ways to improve the Python code generated
by ChatGPT. Please note we have price data not return
data.*

ChatGPT used the price data when conducting the tests, when we normally test for stationarity and cointegration for return data. As such, I would modify the code to test the returns for Apple and Microsoft as opposed to prices, since returns are much more likely to be stationarity than prices.

In addition, there is a lack of abstraction with the code that ChatGPT created. I would create functions and document the code, specifying the assumptions made and the actual hypotheses of the tests in my code as opposed to writing it in a script-like fashion.

##### iii.

*Interpret the ADF test result to see whether the stock1 and
stock 2 returns are cointegrated.*

In [3]:
import pandas as pd
import numpy as np
import yfinance as yf
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.vector_ar.vecm import coint_johansen

# Define the stock tickers and the date range
ticker1 = "AAPL"
ticker2 = "MSFT"
start_date = "2020-01-01"
end_date = "2021-12-31"

# Fetch stock price data from Yahoo Finance
df1 = yf.download(ticker1, start=start_date, end=end_date)['Adj Close']
df2 = yf.download(ticker2, start=start_date, end=end_date)['Adj Close']

# Ensure the data is sorted by date
df1 = df1.sort_index()
df2 = df2.sort_index()

# Calculate the returns of the two stocks
returns1 = df1.pct_change().dropna()
returns2 = df2.pct_change().dropna()

# Perform the ADF test for each stock
adf_result1 = adfuller(returns1)
adf_result2 = adfuller(returns2)

print("ADF Test Results for Stock 1:")
print(f"ADF Statistic: {adf_result1[0]}")
print(f"P-Value: {adf_result1[1]}")
print(f"Critical Values: {adf_result1[4]}")

print("\nADF Test Results for Stock 2:")
print(f"ADF Statistic: {adf_result2[0]}")
print(f"P-Value: {adf_result2[1]}")
print(f"Critical Values: {adf_result2[4]}")

# Perform the Johansen test for cointegration
# Combine the returns of both stocks into a DataFrame
returns_df = pd.concat([returns1, returns2], axis=1)

# Perform the Johansen cointegration test
johansen_result = coint_johansen(returns_df, det_order=0, k_ar_diff=1)

print("\nJohansen Cointegration Test Results:")
print(f"Trace Statistic: {johansen_result.lr1}")
print(f"Critical Values (Trace Statistic 90%): {johansen_result.cvt}")
print(f"Eigenvalues: {johansen_result.eig}")

# Check the cointegration hypothesis based on the Trace Statistic
trace_statistic_critical_value = johansen_result.cvt[:, 0]  # 90% confidence
if johansen_result.lr1[0] > trace_statistic_critical_value[0]:
    print("\nReject the null hypothesis: Cointegration is present.")
else:
    print("\nFail to reject the null hypothesis: No cointegration.")

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
ADF Test Results for Stock 1:
ADF Statistic: -7.222297651838865
P-Value: 2.0960254760026063e-10
Critical Values: {'1%': -3.4436568932270095, '5%': -2.8674084917497074, '10%': -2.5698956806372832}

ADF Test Results for Stock 2:
ADF Statistic: -6.813505641798653
P-Value: 2.0872061315642546e-09
Critical Values: {'1%': -3.4436568932270095, '5%': -2.8674084917497074, '10%': -2.5698956806372832}

Johansen Cointegration Test Results:
Trace Statistic: [453.91966334 205.24544811]
Critical Values (Trace Statistic 90%): [[13.4294 15.4943 19.9349]
 [ 2.7055  3.8415  6.6349]]
Eigenvalues: [0.39125494 0.33613174]

Reject the null hypothesis: Cointegration is present.


After specifying ChatGPT to use the return data, we see that we reject the null hypothesis that the time series contains a unit root and, thus, is not stationary. As such, both stock 1 and stock 2 are unit root stationary. at the 1% significance level or 99% confidence level since the p-values are less than 0.01 as well as their respective test statistics are less than -3.44.

##### iv.

*Interpret the Johansen test result to see whether the stock1
and stock 2 returns are cointegrated.*

As seen in the results above, given that the critical values for the trace statistic for 90%, we reject the null hypothesis that the there is no cointegration between the two stocks. As such, we accept the alternative hypothesis that cointegration exists between Apple and Microsoft.

##### v.

*Discuss whether you can trust the results from the (3) and
(4) questions, and explain the reason.*

I believe we can trust the results from *iii.* and *iv.* questions due to the fact that these are the correct tests and libraries being used for these questions. In addition, the logic that ChatGPT is correct in establishing the correct hypotheses and methodology for testing whether or not we should reject the null hypothesis or if we fail to reject the null hypothesis.

After checking its code with other code examples from documentation online, it seems correct.