$$r_t=c+\theta_1\epsilon_{t-1}+\epsilon_t$$
- $r_t$ : The values of $r$ in the current period 
- $\theta_1$ : A numeric coefficient for the value associated with the $1^{st}$ lag
- $\epsilon_t$ : Residuals for the period $t$
- $MA(1)\approx AR(∞)$ with certain restrictions
- The $MA$ model relies on the residual instead of the variable by itself
- $\vert \theta_n\vert <1$ : To prevent compounded effects exploding in magnitude
- With $MA$, we rely to $ACF$. The reason is $MA$ models are based on past period returns. Therefore, determining which lagged values have a significant direct effect on the present-day ones is not relevant.

## Importing the relevant packages

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.graphics.tsaplots as sgt
import statsmodels.tsa.stattools as sts
from statsmodels.tsa.arima.model import ARIMA
from scipy.stats.distributions import chi2
from math  import sqrt
import seaborn as sns
from google.colab import drive
import warnings
warnings.filterwarnings("ignore")
sns.set()

In [5]:
drive.mount("/content/drive")

MessageError: ignored

## Importing the Data and Pre-processing

In [None]:
raw_csv_data = pd.read_csv("/content/drive/MyDrive/Formations/Time Series/Index2018.csv", index_col="date", parse_dates=True, dayfirst=True)
df_comp = raw_csv_data.copy()
df_comp = df_comp.asfreq("b")
df_comp = df_comp.fillna(method="ffill")

In [None]:
df_comp["market_value"] = df_comp.ftse

In [None]:
df_comp.drop(columns=["spx", "dax", "ftse", "nikkei"], inplace=True)
size = int(len(df_comp)*0.8)
df, df_test = df_comp.iloc[:size], df_comp[size:]

## The LLR Test

In [3]:
"""mod_1, mod_2 : models we want to compare
DF : degrees of freedom"""
def LLR_test(mod_1, mod_2, DF=1):
  L1 = mod_1.fit().llf 
  L2 = mod_2.fit().llf
  LR = 2*(L2-L1) 
  p = chi2.sf(LR, DF).round(3)
  return p 

## Creating Returns

In [4]:
df["returns"] = df.market_value.pct_change(1)*100

NameError: ignored

## ACF for Returns

In [None]:
sgt.plot_acf(df.returns[1:], zero = False, lags = 40)
plt.title("ACF for Returns", size=24)
plt.ylim(-0.085, 0.06)
plt.show()

## MA(1) for Returns

In [None]:
model_ret_ma_1 = ARIMA(df.returns[1:], order=(0, 0, 1))
results_ret_ma_1 = model_ret_ma_1.fit()
results_ret_ma_1.summary()

## Higher-Lag MA Models for Returns

In [None]:
model_ret_ma_2 = ARIMA(df.returns[1:], order=(0, 0, 2))
results_ret_ma_2 = model_ret_ma_2.fit()
print(results_ret_ma_2.summary())
print("\nLLR test p-value = "+str(LLR_test(model_ret_ma_1, model_ret_ma_2)))

The new coefficient has a p-value of zero, which makes it significant. The same can also be said about the second lag, but not the first. Since the ACF suggests the first period coefficient should not be significant, we can predict that the correct model will have a high p-value for one period ago error term.

In [None]:
model_ret_ma_3 = ARIMA(df.returns[1:], order=(0, 0, 3))
results_ret_ma_3 = model_ret_ma_3.fit()
print(results_ret_ma_3.summary())
print("\nLLR test p-value = "+str(LLR_test(model_ret_ma_2, model_ret_ma_3)))

In [None]:
model_ret_ma_4 = ARIMA(df.returns[1:], order=(0, 0, 4))
results_ret_ma_4 = model_ret_ma_4.fit()
print(results_ret_ma_4.summary())
print("\nLLR test p-value = "+str(LLR_test(model_ret_ma_3, model_ret_ma_4)))

In [None]:
model_ret_ma_5 = ARIMA(df.returns[1:], order=(0, 0, 5))
results_ret_ma_5 = model_ret_ma_5.fit()
print(results_ret_ma_5.summary())
print("\nLLR test p-value = "+str(LLR_test(model_ret_ma_4, model_ret_ma_5)))

In [None]:
model_ret_ma_6 = ARIMA(df.returns[1:], order=(0, 0, 6))
results_ret_ma_6 = model_ret_ma_6.fit()
print(results_ret_ma_6.summary())
print("\nLLR test p-value = "+str(LLR_test(model_ret_ma_5, model_ret_ma_6)))

The results don't change the pattern, this behavior goes on until we go seven periods back to the $MA(7)$.

The $MA(7)$ model produces a non significant coefficient and fails the LR test.

In [None]:
model_ret_ma_7 = ARIMA(df.returns[1:], order=(0, 0, 7))
results_ret_ma_7 = model_ret_ma_7.fit()
print(results_ret_ma_7.summary())
print("\nLLR test p-value = "+str(LLR_test(model_ret_ma_6, model_ret_ma_7)))

We add this iteration because if we look at ACF function for returns we examined earlier, we had the coefficient of the $7^{th}$ lag be non significant, but the one for the eighth lag was. Therefore, to be completely through, we shoud bend the general rules of model selection we said earlier, and see how an $MA$ eight model fits the data set.

In [None]:
model_ret_ma_8 = ARIMA(df.returns[1:], order=(0, 0, 8))
results_ret_ma_8 = model_ret_ma_8.fit()
print(results_ret_ma_8.summary())
print("\nLLR test p-value = "+str(LLR_test(model_ret_ma_7, model_ret_ma_8)))

At this point, we have :
$$MA(8)>MA(7) \\ MA(6) > MA(7)$$

So w should test :
$$MA(8)\space??\space MA(6)$$

* After estimation, we found $LLR \space test < 0.05$, therefore, the more complicated model performs better than the simpler one even it contains an additional non significant coefficient.

In [None]:
LLR_test(model_ret_ma_6, model_ret_ma_8, DF=2)

## Residuals for Returns

In [None]:
df["res_ret_ma_8"] = results_ret_ma_8.resid[1:]

In [None]:
print(f"mean : {round(df.res_ret_ma_8.mean(), 3)}")
print(f"std : {round(df.res_ret_ma_8.std(), 3)}")

In [None]:
df.res_ret_ma_8[1:].plot(figsize = (20, 5))
plt.title("Residual of Returns", size = 24)
plt.show()

In [None]:
sts.adfuller(df.res_ret_ma_8[2:])

* The first $8$ coefficients are incorporated in the model, so, it's not surprising they're essentially $0$.
* The following $9$ lags are also insignificant and this is a testament to how well our model prforms. The further back in time we go, the less relevant the values and the errors become. 
* Significant coefficients some $18$ lags ago shouldn't play a major role in estimations. This stems from the fact that markets adjust to shocks. So, values far in the past lose relevance.

In [None]:
sgt.plot_acf(df.res_ret_ma_8[2:], zero = False, lags = 40)
plt.title("ACF of Residuals for Returns", size = 24)
plt.ylim(-0.05, 0.05)
plt.show()

## Normalized Returns

In [None]:
bench_ret = df.returns.iloc[1]
df["norm_ret"] = df.returns.div(bench_ret).mul(100)

In [None]:
sgt.plot_acf(df.norm_ret[1:], zero = False, lags = 40)
plt.title("ACF of Normalized Returns", size = 24)
plt.ylim(-0.08, 0.08)
plt.show()

In [None]:
model_norm_ret_ma_8 = ARIMA(df.norm_ret[1:], order = (0, 0, 8))
results_norm_ret_ma_8 = model_norm_ret_ma_8.fit()
results_norm_ret_ma_8.summary()

In [None]:
df["res_norm_ret_ma_8"] = results_ret_ma_8.resid[1:]

In [None]:
df.res_norm_ret_ma_8[1:].plot(figsize=(20, 5))
plt.title("Residuals of Normalized Returns", size=24)
plt.show()

In [None]:
sgt.plot_acf(df.res_norm_ret_ma_8[2:], zero = False, lags = 40)
plt.title("ACF of Residuals for Normalized Returns", size=24)
plt.ylim(-0.05, 0.05)
plt.show()

## MA Models For Prices
* Autoregressive (AR) models are less reliable when estimating non-stationary data.
* The coefficients for all $40$ lags seem to be significant. This suggests that any higher lag model would be preffered to any lower lag $1$. This leads us to beleive that we'd have to use an infinite $MA$ model to fit this data. Since no such thing exists, it seems that no moving average model would be a good estimator of prices but before we completely discredit this reult, let's try to fit several $MA$ models for prices and examine their results.

In [None]:
sgt.plot_acf(df.market_value, zero = False, lags = 40)
plt.title("ACF for Prices", size = 20)
plt.show()

We see that the one lag moving average parameter is equal to $0.9573$. This is incredibly close to $1$, which means our model tries to keep almost the entire magnitude of the error from the past period
$$x_{t-1}=c+\theta_1\epsilon_{t-2}+ϵ_{t-1}$$
$$x_{t}=c+\theta_1\epsilon_{t-1}+ϵ_{t}$$
By substitution :
$$x_{t}=c+\theta_1(x_{t-1}-c-\theta_1\epsilon_{t-2})+ϵ_{t}$$
If $\theta_1\approx 1$ :
$$x_{t}=c+x_{t-1}-c-\epsilon_{t-2}+ϵ_{t}$$
$$x_{t}=x_{t-1}-\epsilon_{t-2}+ϵ_{t}$$

This model is autoregressive that takes into account the error from two terms ago.

**Conclusion** :

$MA$ models don't perform well for non-stationary data.

**Solution** :

Combinate $AR$ and $MA$ models.

In [None]:
model_ma_1 = ARIMA(df.market_value, order=(0, 0, 1))
results_ma_1 = model_ma_1.fit()
results_ma_1.summary()