Python for Finance --- Final Exam
----

**MSc in Mathematics and Finance, Imperial College London**

Autumn Term 2023-2024

Monday 11 December 2023
***


## GENERAL INSTRUCTIONS


- For each question, you are asked to create a function with specific inputs and outputs.

- You should copy / paste all your functions, one after the others, in a single file named `CID.py`

- You may only use the libraries below

- Grading details:
    + Clarity of the code (name of temporary variables, comments)
    + Efficiency of the code (speed)
    
- At the end of the examination, you should update your CID.py file into the Shared Drive folder
xxxx


---

In [1]:
import platform
print("Current Python Version",platform.python_version())
if platform.python_version()<"3.10":
    print("ERROR: you are using a Python version lower than 3.10")

Current Python Version 3.10.15


### Allowed libraries ONLY

In [2]:
import numpy as np
from abc import ABC,abstractmethod
import time as time
import pandas as pd

# PROBLEM I : OOP and Bonds (Q1 10 POINTS | Q2 10 POINTS | Q3 10 POINTS)
---

Consider the following two dynamic models for the short rate, for $t\geq 0$,:

\begin{align} r_t &=r_0,\quad &\text{(Model 1: Constant)} \\
r_t &= r_0 e^{-a t} +  b\left(1- e^{-a t}\right) + \sigma e^{-a t}\int_0^t e^{a s}\,dW_s,\quad &\text{(Model 2: Vasicek)}\end{align}

where $(W_t)_{t\geq 0}$ is a standard Brownian motion.

For a given maturity, the price of a Zero Coupon Bond (ZCP) is given by:

**(Model 1: Constant)**
\begin{align}
  ZCB^{BM}(T) &= e^{-r_0 T}.
\end{align}

**(Model 2: Bachelier)**
\begin{align}
  ZCB^{Vasicek}(T) &=  e^{A(T) - B(T)r_0}, \text{where} \\
B(T) &:= \frac{1 - e^{-aT}}{a}\\
A(T) &= \left(b - \frac{\sigma^2}{2a^2}\right)\left[B(T) - T)\right] - \frac{\sigma^2}{4a}B^2(T)
\end{align}


We will implement a InterestRateEngine base class that computes sensitivities of ZCB to the time to maturity parameter.
The base class will derive in two subclassess 1) C0nstant and 2) Vasicek.

## Question 1:

Implement a `price_zero_coupon_bond(self,T:float)` method for each model following the equations described above for $ZCB(T)$ and the class template given below

Input:
- T: float time to maturity


Output:
- float zero coupon bond price


In [3]:
class InterestRateEngine(ABC):
    def __init__(self, r0):
        self.r0=r0
    
    def theta_zero_coupon_bond(self,T:float,epsilon:float)->float:
        '''
        #Inputs:
        T: time to maturity
        epsilon: finite difference bump parameter
        #Outputs:
        finite difference theta value
        '''
        T_plus=T+epsilon
        T_minus=T-epsilon
        epsilon_plus_price = self.price_zero_coupon_bond(T_plus)
        epsilon_minus_price=self.price_zero_coupon_bond(T_minus)
    
        theta = (epsilon_plus_price - epsilon_minus_price) / 2.0/epsilon

        return theta
    
        
    @abstractmethod
    def price_zero_coupon_bond(self,T:float)->float:
        '''
        #Inputs:
        T: time to maturity
        #Outputs:
        zero coupon bond price
        '''
        
        pass

    def compute_term_structure(self,T_array:np.array)->np.array:
        return np.array([self.price_zero_coupon_bond(T) for T in T_array])
    
class Constant(InterestRateEngine): 
    def __init__(self, r0):
        super().__init__(r0)
        
    def price_zero_coupon_bond(self,T:float)->float:
        """
        # Inputs:
        T: time to maturity
        #Output: 
        float zero coupon bond price in constant interest rate
        """
        
        return np.exp(-self.r0*T)
    

class Vasicek(InterestRateEngine):
    def __init__(self, r0,sigma,a,b):
        """
         # Constructor Inputs:
         r_0,sigma,a,b need to be initiated!!!
        """
        super().__init__(r0)
        self.sigma=sigma
        self.a=a
        self.b=b
        pass
        
    def price_zero_coupon_bond(self,T:float)->float:
        """
        # Inputs:
        T: time to maturity
        # Outputs:
        zero coupon bond price calculated by Vasicek
        """
        # calculate auxiliary variables
        B=(1-np.exp(-self.a*T))/self.a
        sigma_squared=self.sigma**2
        A=(self.b-sigma_squared/2/(self.a**2))*(B-T)-sigma_squared/4/self.a*(B**2)
        
        # calculate ZCB price
        
        return np.exp(A-B*self.r0)
    

#### You may also use the following script to verify your results:

In [4]:
def test_function_problem1_Constant(r0:float,T:float):
    engine=Constant(r0)
    return engine.price_zero_coupon_bond(T)

In [5]:
def test_function_problem1_Vasicek(r0:float,sigma:float,a:float,b:float,T:float):
    engine=Vasicek(r0,sigma,a,b)
    return engine.price_zero_coupon_bond(T)

`test_function_problem1_Constant(0.05,2)` should return `0.9048374180359595`

`test_function_problem1_Vasicek(0.05,1,0.5,1,2)` should return `0.8810807717724546`

`test_function_problem1_Vasicek(0.01,1,0.5,1,1)` should return `0.900824962165929`

In [6]:
test_function_problem1_Constant(0.05,2)

0.9048374180359595

In [7]:
test_function_problem1_Vasicek(0.05,1,0.5,1,2)

0.8810807717724546

In [8]:
test_function_problem1_Vasicek(0.01,1,0.5,1,1)

0.900824962165929

## Question 2:

Write a `theta_zero_coupon_bond(self,T:float,epsilon:float)` base class method with the folowing specifications:

Input:
- T: float time to maturity
- epsilon: float finite difference parameter

Output:
- Finite-difference teta computed using the following formula:

$$
\Theta(T,\epsilon)=\frac{ZCB(T+\epsilon)-ZCB(T-\epsilon)}{2\epsilon}, \quad \text{for }\epsilon>0.
$$


#### You may also use the following script to check your results:

In [9]:
def test_function_problem2_Constant(r0:float,T:float,epsilon:float):
    engine=Constant(r0)
    return engine.theta_zero_coupon_bond(T,epsilon)

In [10]:
def test_function_problem2_Vasicek(r0:float,sigma:float,a:float,b:float,T:float,epsilon:float):
    engine=Vasicek(r0,sigma,a,b)
    return engine.theta_zero_coupon_bond(T,epsilon)

`test_function_problem2_Constant(0.05,2,0.001)` should return `-0.04524187092069809`

`test_function_problem2_Vasicek(0.01,1,0.5,1,1,0.001)` should return `-0.08098269117329249`

`test_function_problem2_Vasicek(0.05,1,0.5,1,2,0.001)` should return `0.1309623404895377`

In [11]:
test_function_problem2_Constant(0.05,2,0.001)

-0.04524187092069809

In [12]:
test_function_problem2_Vasicek(0.01,1,0.5,1,1,0.001)

-0.08098269117329249

In [13]:
test_function_problem2_Vasicek(0.05,1,0.5,1,2,0.001)

0.1309623404895377

## Question 3:

Write a `compute_term_structure(self,T_array:np.array)->np.array` base class method with the folowing specifications:

Input:
- T_array: np.float time to maturity np.array

Output:
- Array of zero coupon bond prices for different maturities given by T_array e.g. $\{ZCB_{T_i}\}_{i=1,...n}$ where T_array$:=[T_1,...,T_n]$



#### You may also use the following script to check your results:

In [14]:
def test_function_problem3_Constant(r0:float,T_array:np.array)->np.array:
    engine=Constant(r0)
    return engine.compute_term_structure(T_array)

In [15]:
def test_function_problem3_Vasicek(r0:float,sigma:float,a:float,b:float,T_array:np.array)->np.array:
    engine=Vasicek(r0,sigma,a,b)
    return engine.compute_term_structure(T_array)

In [16]:
test_function_problem3_Constant(0.05,np.array([0.1,0.5,1.0]))

array([0.99501248, 0.97530991, 0.95122942])

In [17]:
test_function_problem3_Vasicek(0.01,1,0.5,1,np.array([0.1,0.5,1.0]))

array([0.99673165, 0.95630287, 0.90082496])

In [18]:
test_function_problem3_Vasicek(0.05,1,0.5,1,np.array([0.1,0.5,1.0]))

array([0.99285033, 0.93952905, 0.87291084])

`test_function_problem3_Constant(0.05,np.array([0.1,0.5,1.0]))` should return `array([0.99501248, 0.97530991, 0.95122942])`

`test_function_problem3_Vasicek(0.01,1,0.5,1,np.array([0.1,0.5,1.0]))` should return `array([0.99673165, 0.95630287, 0.90082496])`

`test_function_problem3_Vasicek(0.05,1,0.5,1,np.array([0.1,0.5,1.0]))` should return `array([0.99285033, 0.93952905, 0.87291084])`

---
---

# PROBLEM II: Numpy (Q1 10 POINTS | Q2 15 POINTS | Q3 15 POINTS)
--- 

### We wish to compute the expected value of $\mathbb{E}\left[\Phi(\mu,\sigma)\right]$, where 
### $$ \Phi(\mu,\sigma) := f\left(\mu+\sigma X\right),\quad  where X\sim\mathcal{N}(0,1),$$


## Question 1: Implement a Gaussian pdf function `gaussian_pdf(mu:float,sigma:float,x:np.array)->np.array` 

### where you implement the map: $$\phi(\mu,\sigma,x)=\frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2}$$

### Input:
- mu: mean of the Gaussian distribution
- sigma: std of the gaussian distribution
- x; np.array of elements at which we want to evaluate the gaussian distribution

### Output:
- np.array of phi(x)



In [19]:
def gaussian_pdf(mu:float,sigma:float,x:np.array)->np.array:
    return 1.0/sigma/np.sqrt(2*np.pi)*np.exp(-0.5*np.power((x-mu)/sigma,2))
    

## Question 2: Perform the numerical integration 

### $$\mathbb{E}\left[\Phi(\mu,\sigma)\right]=\int_{-\infty}^{\infty} f\left(\mu+\sigma x\right) \phi(x) d x=\sum_{i=1}^N f\left(\mu+\sigma x_i\right) \phi(x_i) \Delta x$$

### where $\Delta x:=\frac{20}{N-1}$ and  $x_i=-10+(i-1)\times\Delta x$ 

### such that $x_1=-10$ and $x_{N-1}=10$ and $\{x_i\}_{i=1,...,N}$ is an equidistant grid with exactly N elements


###  Implement a Gaussian integration function `gaussian_integration(mu:float,sigma:float,N:int,f)->float`

### where you implement the numerical integration presented above

### Input:
- mu: mean of the Gaussian distribution
- sigma: std of the gaussian distribution
- N; integer representing the points in the numerical integration
- f: function we wanto to apply to the standrad gaussian

### Output:
- float value of the integral

### **Remark:** Note that $\Delta x:=\frac{20}{N-1}$


In [20]:
def gaussian_integration(mu:float,sigma:float,N:int,f)->float:
    x=np.linspace(-10,10,N)
    delta_x=20.0/(N-1)
    return np.sum(f(x)*gaussian_pdf(mu,sigma,x))*delta_x


`gaussian_integration(0,1,1000,lambda x:x**4)` should return `2.999999999999999`

`gaussian_integration(2,2,20,lambda x:x**2)` should return `7.998992690548414`


In [21]:
gaussian_integration(0,1,1000,lambda x:x**4)

2.999999999999999

In [22]:
gaussian_integration(2,2,20,lambda x:x**2)

7.998992690548414

## Question 3: Perform the expected value by simulation
### $$\mathbb{E}\left[\Phi(\mu,\sigma)\right]\approx \frac{1}{N} \sum_{i=1}^N f\left(\mu+\sigma X_i\right) $$ where $X_i\sim\mathcal{N}(0,1)$ are iid samples 

###  Implement a Gaussian simulation function `gaussian_simulation(mu:float,sigma:float,random_array:np.array,f)->float`

### Input:
- mu: mean of the Gaussian distribution
- sigma: std of the gaussian distribution
- random_array: 1D np.array with standard normal e.g. N(0,1) variables
- f: function we wanto to apply to the standrad gaussian

### Output:
- float value of the integral

In [23]:
def gaussian_integration(mu:float,sigma:float,random_array:np.array,f)->float:
    
    return np.mean(f(mu+sigma*random_array))

In [24]:
np.random.seed(0)
random_array=np.random.normal(0,1,1000)

### With the above random variables 
`gaussian_integration(0,1,random_array,lambda x:x**4)` should return `2.809093434133051`
`gaussian_integration(2,2,random_array,lambda x:x**2)` should return `7.543076843618466`

In [25]:
gaussian_integration(0,1,random_array,lambda x:x**4)

2.809093434133051

In [26]:
gaussian_integration(2,2,random_array,lambda x:x**2)

7.543076843618466

---
---

# PROBLEM III: Pandas and Data (Q1 15 POINTS | Q2 15 POINTS)
---

The file `stock_data.csv`contains financial data of stock prices for different symbols

The columns of the csv file represent different stocks or indices:
"SP500" is the S&P 500 index
"TSLA","V","AMD","COST" are single stocks

you can read the file to pandas using the code below

In [27]:
df=pd.read_csv("stock_data.csv",index_col=0)
df.head()

Unnamed: 0_level_0,SP500,TSLA,V,AMD,COST
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-06-29,1041.23999,1.592667,17.862499,7.48,55.630001
2010-06-30,1030.709961,1.588667,17.6875,7.32,54.830002
2010-07-01,1027.369995,1.464,18.215,7.39,54.900002
2010-07-02,1022.580017,1.28,18.295,7.17,54.23
2010-07-06,1028.060059,1.074,18.067499,7.04,54.0


## Question 1: CAPM

### The CAPM model is given by $$r^{stock}_{i}=\beta r^{SP500}_i+\alpha,\quad for\quad i=1,...,n$$

### where $r_i=(S_i-S_{i-1})/S_{i-1}$ are stock returns either for the SP500 or a single stock. Therefore, the CAPM is a simple linear regression of a stock against the SP500

---
### Implement a function `compute_CAPM_coefficients(df:pd.DataFrame,:list_of_symbols)->pd.DataFrame` with the following specifications:

### Inputs:
  - df:pd.DataFrame input dataframe (same as the one provided above)
 - list_of_symbols:list of strings representign symbols e.g. ["TSLA","AMD"]

### Outputs:
 - Pandas Dataframe with the same columns as the input list_of_symbols and 2 rows first one being the slope and the second one being the intercept of a CAPM regression against the SP500 column

---

### **Note:** to perform the linear regression use `p=np.polyfit(x, y, 1)` where `p[0]` is the slope and `p[1]` the intercept

### **Remark:** beware that the CAPM regression is performed on the returns and not the stock values. You will need to transform stock prices into returns first and get rid of any NaN arising from that operation (if you end up with any NaN `np.polyfit` will also return NaN)
`

In [28]:
def compute_CAPM_coefficients(df:pd.DataFrame,list_of_symbols:list)->pd.DataFrame:
    """
    following specification
    Input:
    - df:pd.DataFrame input dataframe (same as the one provided above)
    - list_of_symbols:list of strings representign symbols e.g. ["TSLA","AMD"]

    Output:
    - Pandas Dataframe with the same columns as the input list_of_symbols 
    and 2 rows first one being the intercept and the second one being the slope of a CAPM regression
    """

    df_returns=df.pct_change().dropna()
    output_df=pd.DataFrame(index=["slope","intercept"],columns=list_of_symbols)
    
    for symbol in list_of_symbols:
        p=np.polyfit(df_returns[symbol],df_returns["SP500"],1)
        output_df[symbol]=p
    return output_df

In [29]:
compute_CAPM_coefficients(df,["SP500"])

Unnamed: 0,SP500
slope,1.0
intercept,2.5296690000000002e-18


In [30]:
compute_CAPM_coefficients(df,["AMD","TSLA"])

Unnamed: 0,AMD,TSLA
slope,0.153239,0.131481
intercept,0.000268,0.00021


### You may also use the following example to verify your results:

`compute_CAPM_coefficients(df,["AMD","TSLA"])` should return 

|    | AMD   | TSLA   |  
|---:|:-------------|:-------------|
|  slope	 | 0.153239   | 0.131481         |
|  intercept | 0.000268  | 0.000210        | 

`compute_CAPM_coefficients(df,["SP500"])` should return 

|    | SP500   | 
|---:|:-------------|
|  slope	 | 1.000000e+00   |
|  intercept | 2.529669e-18  | 

## Question 2:

### Write a function `find_max_price_per_month(df:pd.DataFrame,list_of_symbols:list)->pd.DataFrame:` with the following specifications:

### Inputs:
-df:pd.DataFrame input dataframe

-list_of_symbols:list of strings representign symbols e.g. ["TSLA","AMD"]
 
### Outputs:

- Pandas Dataframe with the same columns as the input list_of_symbols 
    and rows with an index in the form YYYY-MM e.g. 2022-01. The values returned should represent the maximum stock value per month

### **Hint** As a first step it might be usefull to create an additional column named "YYYY-MM" where you transform the date index into that format

In [31]:
def  find_max_price_per_month(df:pd.DataFrame,list_of_symbols:list)->pd.DataFrame:
    """
    Input:
         - df:pd.DataFrame input dataframe
         - list_of_symbols:list of strings representign symbols e.g. ["TSLA","AMD"]
 
    Output:

        - Pandas Dataframe with the same columns as the input list_of_symbols 
        and rows with an index in the form YYYY-MM e.g. 2022-01. The values returned shoul drepresent the maximum stock value per month
    """
    output_df=df.copy()
    output_df["YYYY_MM"]=[ date[:7] for date in df.index]
    
    return output_df.groupby(["YYYY_MM"]).max()[list_of_symbols]

### You may also use the following example to verify your results:

In [32]:
find_max_price_per_month(df,["TSLA","V"])

Unnamed: 0_level_0,TSLA,V
YYYY_MM,Unnamed: 1_level_1,Unnamed: 2_level_1
2010-06,1.592667,17.862499
2010-07,1.464000,19.344999
2010-08,1.463333,18.817499
2010-09,1.465333,18.602501
2010-10,1.456000,20.157499
...,...,...
2023-07,293.339996,243.990005
2023-08,261.070007,246.229996
2023-09,276.040009,248.110001
2023-10,263.619995,241.199997


In [33]:
find_max_price_per_month(df,["SP500"])

Unnamed: 0_level_0,SP500
YYYY_MM,Unnamed: 1_level_1
2010-06,1041.239990
2010-07,1115.010010
2010-08,1127.790039
2010-09,1148.670044
2010-10,1185.640015
...,...
2023-07,4588.959961
2023-08,4576.729980
2023-09,4515.770020
2023-10,4376.950195


`find_max_price_per_month(df,["TSLA","V"])` should return 

| YYYY_MM   |      TSLA |        V |
|:----------|----------:|---------:|
| 2010-06   |   1.59267 |  17.8625 |
| 2010-07   |   1.464   |  19.345  |
| 2010-08   |    1.46333 |  18.8175 |
| 2010-09   |   1.46533 |  18.6025 |
| 2010-10   |    1.456   |  20.1575 |
| ...   |   ... |  ... |
| 2023-07   | 293.34    | 243.99   | 
| 2023-08   |  261.07    | 246.23   |
| 2023-09   |  276.04    | 248.11   |
| 2023-10   |  263.62    | 241.2    |
| 2023-11   |  219.96    | 243.6    |

162 rows × 2 columns

`find_max_price_per_month(df,["SP500])` should return 

| YYYY_MM   |   SP500 |  
|:----------|--------:|
| 2010-06   | 1041.24 |
| 2010-07   | 1115.01 |
| 2010-08   | 1127.79 |
| 2010-09   | 1148.67 |
| 2010-10   | 1185.64 |
| ...   | ... |   ... | 
| 2023-07   | 4588.96 |
| 2023-08   | 4576.73 | 
| 2023-09   | 4515.77 |
| 2023-10   | 4376.95 |
| 2023-11   | 4365.98 |


162 rows × 1 columns