# Course assessment: project part 1 (M. Garcin)

## Numerical processing of financial data

#### Objective
The goal of this project is to assess the existence of serial information in a time series of prices and in a time series of volatilities.

- Please write commented code. You can also use text cells for your comments and explanations.
- The code must follow the good coding principles (it must be clean, readable, etc.).
- Graphs must have a title, a legend and their axes must be labeled.
- At the end, submit your notebooks __and a pdf file of 5 to 8 pages__ into the Moodle Assignment you will find in the section "Assessment: projects".
This separate pdf file is intended to be a (very) short paper : a standalone document that summarizes the work you have carried out in the notebooks and the results you have obtained (being they positive or negative), with figures. It has to be self-contained in the sense that one has to be able to read your report and understand its content without having to look into your notebooks, which will be checked afterwards.


In [None]:
import numpy as np
import math
import matplotlib.pyplot as plt

# Dataset

You will use the dataset (Excel files) of one-minute observations (OHLC prices) provided in Moodle for TP1 (Session 1). The dataset consists in the 40 constituents of the French stock index CAC 40 (as of 2022/2023). The biggest capitalizations are: LVMH (MC), L'Oréal (OR), TotalEnergies (TTE), Sanofi (SAN), Hermès (RMS), Airbus (AIR)... You have to choose one of these stocks and work with it for all the project.

The time range is September 2022-March 2023. Pay attention to the fact that some stocks don't have a price every minute (less liquid, like SW): avoid them.



# Parkinson volatility (Part 1)

We are interested in the two estimators
$$\widehat{D_{1,n}}=\frac{1}{n\sqrt{\tau}}\sum_{i=1}^n d_{(i-1)\tau,i\tau}\sqrt{\frac{\pi}{8}}$$
and
$$\widehat{D_{2,n}}=\left(\frac{1}{n\sqrt{\tau}}\sum_{i=1}^n d^2_{(i-1)\tau,i\tau}\frac{1}{4\ln(2)}\right)^{1/2},$$
where $d_{(i-1)\tau,i\tau}$ is the log high-low range (if necessary, see the first part of the course, from page 12).

* **1.a/** Write a function calculating the Parkinson annualized volatility $\widehat{D_{1,n}}$ of your asset using high-low ranges at a 1-minute interval.

* **1.b/** Do the same for the estimator $\widehat{D_{2,n}}$.

* **1.c/** Plot the two time series of Parkinson volatilities.



# Serial information (Part 2)

We are interested in the serial information of two financial daily time series: price returns and volatilities.

In order to get the serial information (if necessary, read again the first part of the course, from page 22), we need several functions.

First, we need Gray's binary code.
Indeed, for a given length $L<n$, $2^L$ binary sequences (of 0s and 1s) are possible. Gray's binary code makes it possible to order them (Gray's code is the index $i$ in the following vectors $G^L_i$). If $L=3$, the 8 sequences are $(G^3_1,...,G^3_8)=((0,0,0),(0,0,1),(0,1,1),(0,1,0),(1,1,0),(1,0,0),(1,0,1),(1,1,1))$.

We provide below the function returning Gray's binary code.

In [None]:
def binary_Gray_code(binary_array):
    # binary_array contains a binarized sequence of returns
    res = 0
    for val in binary_array:
        res = 2 * res + val
    return res

* **2.a/** Write a function returning the empirical estimator of the probability $p_i^L$ of sub-series of length $L$:
$$p^L_i=\mathbb P((X_{.},X_{.+1},...,X_{.+L-1})=G^L_i).$$
The arguments of this function are:
  * time_series, an array containing all the binarized 1-minute price returns (so it is a sequence containing 0s and 1s),
  * $L$, the length of the sub-series to be considered; these sub-series are all the sub-series of consecutive observations in the array time_series.

* **2.b/** Write a function returning the entropy $H^L$:
$$H^L=-\sum_{i=1}^{2^L} p^L_i\log_2\left(p^L_i\right).$$

* **2.c/** Write a function returning the serial information $I^{L+1}=1+H^L-H^{L+1}$, for $L\in [\![1,L^{\max}]\!]$. The two arguments of this function are time_series and $L^{\max}$.

* **2.d/** Write a function returning the asymptotic confidence bound for a zero serial information, using the Gamma distribution presented in the course: we recall that under the assumption of serial independence, $\widehat{I^{L+1}_n}$ follows a gamma distribution $\Gamma(k,\theta)$ of shape parameter $k=2^{L-1}$ and scale parameter $\theta=1/\ln\left(2\right)n$. The arguments of the function are:
  * $L$,
  * $n$,
  * a significance level $\alpha$, equal by default to 0.95.

* **2.e/** From your one-minute dataset of close prices of a stock, build a time series of daily price returns (between two consecutive ends of days). Apply the above functions to calculate the serial information of this time series. Plot this information as a function of $L$, with $L^{\text{max}}=6$. Is there any statistically significant serial information?



* **2.f/** Calculate the serial information of the time series of daily log-variations of one of the two Parkinson volatilities. Plot this information as a function of $L$, with $L^{\text{max}}=6$. Is there any statistically significant serial information?
