<a href="https://colab.research.google.com/github/SushiFou/Time-Series-Financial-Data/blob/main/TP2_Time_Series_Kervella.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Time Series for Financial Data - TP n° 2 (GARCH Modeling)
---

Yann Kervella

# Importations

In [3]:
!pip install pyreadr

Collecting pyreadr
[?25l  Downloading https://files.pythonhosted.org/packages/11/ae/74e99f7fe3652f5976acc35543fdf17abe9b887478218c9e277b509c0a45/pyreadr-0.4.0-cp37-cp37m-manylinux2014_x86_64.whl (410kB)
[K     |▉                               | 10kB 9.4MB/s eta 0:00:01[K     |█▋                              | 20kB 7.4MB/s eta 0:00:01[K     |██▍                             | 30kB 6.9MB/s eta 0:00:01[K     |███▏                            | 40kB 5.1MB/s eta 0:00:01[K     |████                            | 51kB 6.1MB/s eta 0:00:01[K     |████▉                           | 61kB 7.1MB/s eta 0:00:01[K     |█████▋                          | 71kB 6.9MB/s eta 0:00:01[K     |██████▍                         | 81kB 7.7MB/s eta 0:00:01[K     |███████▏                        | 92kB 7.2MB/s eta 0:00:01[K     |████████                        | 102kB 7.8MB/s eta 0:00:01[K     |████████▉                       | 112kB 7.8MB/s eta 0:00:01[K     |█████████▋                      | 12

In [6]:
import pyreadr

# GARCH modelling
We recall that a GARCH(p,q) process is defined as : 
$$
r_t = \eta_t \sigma_t,
$$
$$
\sigma_t^2 = a_0 + \sum_{k=1}^p a_k r_{t-k}^2 + \sum_{k=1}^q b_k \sigma_{t-k}^2,
$$
where $\eta_t \stackrel{i.i.d}{\sim} \mathcal{N}(0,1)$, $a_0>0$ and $a_1,...,a_p,b_1,...,b_q \geq0$. 
Let $p=q=1$ and suppose that $a_1 + b_1 < 1$, we know that a weakly stationary GARCH(1,1) has zero mean and finite variance  :
$$
\text{Var}(r_t) = \frac{a_0}{1 - (a_1 + b_1)}.
$$
Suppose now that : 
$$
b_1^2+2 a_1 b_1 + 3 a_1^2 <1,
$$
then $\mathbb{E}[\sigma_t^4]<\infty$ and we can compute the kurtosis of $r_t$ : 
$$
\mathcal{K} := \frac{\mathbb{E[r_t^4]}}{(\mathbb{E[r_t^2])^2}} = 3 + \frac{6a_1^2}{1-(b_1^2+2a_1 b_1 + 3a_1^2)}.
$$
**1) Simulate a GARCH(1,1) process of size $N=500$. Plot the time series using the function ts.plot(). What happens if the first inequality is not verified ? Return its variance, skewness and kurtosis. Compare with the formulas above.**

Let $r_1, r_2,...,r_N$ be observations of a GARCH(1,1) process. 

**2) Show that the (conditional) negated log likelihood for the GARCH(1,1) can be written as :**
$$
- L_n (\theta) = \frac{1}{2} \sum_{k=2}^{N}\left(\log(2\pi\sigma_k^2) + \frac{r_k^2}{\sigma_k^2}  \right).
$$



**3) Write a function *garchlogl()* that takes the GARCH parameters *(a0, a1, b1)* as inputs and returns the conditional negated log-likelihood. Simulate a GARCH(1, 1) with *a0 = 0.1, a1 = 0.2, b1=0.3,N=500* and use the function *optim()* to retrieve the parameters by minimizing the negated log-likelihood.**

# Analysis of the Default data set
We start by loading some packages.
```{r eval=FALSE}
library(forecast) 
library(rugarch)
```


## Description of the data
**4) Import the data set 'gle.Rdata' and look at the data.**
 
```{r eval=FALSE}
load(url('https://m2:map658@perso.telecom-paristech.fr/roueff/edu/tsfd/data/gle.Rdata'))
summary(gle)
```
We denote by $P_t$ the open price time series and $r_t = \log(\frac{P_t}{P_{t-1}})$ the daily log-returns. 

In [8]:
!curl 'https://m2:map658@perso.telecom-paristech.fr/roueff/edu/tsfd/data/gle.Rdata' --output data.RData

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 30836  100 30836    0     0  22875      0  0:00:01  0:00:01 --:--:-- 22875


In [9]:
data = pyreadr.read_r('data.RData')

In [10]:
print(data.keys())

odict_keys(['gle'])


In [11]:
df = data['gle']
df.head()

Unnamed: 0,Date,Open,High,Low,Last,Volume,Turnover
0,2018-09-24,37.865,38.195,37.455,37.85,3340032.0,126289300.0
1,2018-09-21,38.105,38.43,37.725,37.85,8921993.0,338644500.0
2,2018-09-20,37.245,38.155,37.245,37.865,7372771.0,278961000.0
3,2018-09-19,36.41,37.16,36.365,37.16,6089761.0,225069900.0
4,2018-09-18,36.46,36.655,36.23,36.36,3412940.0,124203200.0


**5) Using the 'acf()' and 'pacf()' functions, plot the autocorrelation and partial autocorrelation for both $r_t$ and $r_t^2$. Comment. Is an ARMA model appropriate here ? Explain.**

**6) Using both 'qqplot()' and 'qqline()', discuss the normality and symmetry of the log-returns distribution. Compare with a normal distribution.**

Using that $r_t^2 - \sigma_t^2$ is a weak white noise, we know that $r_t^2$ has an ARMA($p\vee q,q$) representation. 
Up to a certain constant, the AIC criteria is defined as follows : 
$$
\text{AIC}(p,q) \sim \log( \hat{\sigma}(p,q) ) +2(p+q)/T,
$$
where $T$ is the number of observations, $\hat{\sigma}(p,q)$ the estimated variance for a model ARMA$(p,q)$. 

**7)Using the function 'Arima()', give the order for a GARCH $(p,q)\in\{1,..5\}^2$ that minimizes the AIC criteria.**

**8)Estimate the coefficients of the GARCH model associated. Hint : look at the functions ugarchspec(), ugarchfit().**

**9)Using the function 'forecast()', test the quality of the 1 ahead prediction of the model for the last 200 observations of $r_t^2$ and give the standard deviation of the prediction error.**

**10)Proceed as before using this time the function 'ugarchforecast()'.**

# Exponential GARCH
In order to take into account possible assymetry effects, we introduce the following exponential GARCH model : 
$$
r_t = \eta_t \sigma_t  
$$

$$
\log(\sigma_t^2 ) = a_0 + \sum_{k=1}^p  (a_k \eta_{t-k} + h_k(\eta_{t-k})) + \sum_{k=1}^q b_k \log(\sigma_{t-k}^2)
$$
where $h_k(\eta) = \gamma_k ( |\eta| - \mathbb{E} | \eta | )$ and $\eta_t \stackrel{i.i.d}{\sim} \mathcal{N}(0,1)$.


**11)Using both functions 'ugarchspec()' and 'ugarchfit()' estimate an EGARCH model on the data (choose the same order as before).Which parameter of the EGARCH model gives an assymetrical distribution for the returns? Is this parameter signifiant for the data ? Comment.**

**12) Repeat Q10) for this time the EGARCH model. Compare the prediction errors obtained for the GARCH and EGARCH. Comment.** 

## Tiebreaker open question ##
**13)What are the orders $p$ and $q$ returned when using Cross-Validation on both the previous GARCH and EGARCH models? Comment.**