## Lecture 2 — Pandas (Series/DataFrames) + Real Data (BCRP + Yahoo) 
### Goal
Build a small **data pipeline** using the same sources as the lecture notebook:

- **BCRP daily FX**: PD04637PD (buy) and PD04638PD (sell)  
- **Yahoo Finance (yfinance)**: SPY, QQQ, TLT, GLD, EEM  
- **BCRP monthly policy rate**: PD12301MD  

**Final outputs**
1. One **clean daily dataset** (both **long** and **wide** formats)  
2. One **monthly dataset** with **policy rate + monthly SPY** (merged)  
3. **Five quick consistency checks** (validations)

1. Define `START` and `END` as in the notebook and explain why that date range is reasonable.  


2. Download BCRP series PD04637PD and PD04638PD (JSON) and build a DataFrame with: `date`, `fx_buy`, `fx_sell`.  


3. Convert `date` to `datetime`, sort by date, and verify the index/column is monotonic.  


4. Download Yahoo prices for SPY, QQQ, TLT, GLD, EEM and build a **long** DataFrame: `date`, `ticker`, `close`.  


5. Create a dictionary `{ticker: last_close}` using `groupby(...).last().to_dict()` and convert it to a Series sorted descending.  


6. *(Series)* Show three indexing methods for that Series: by label, by position, and by slice.  


7. Convert the long price DataFrame into **wide** format (columns = tickers) and verify expected dimensions.  


8. Apply a filter to keep (i) a subset of tickers (e.g., SPY, TLT, GLD) and (ii) a subset of dates (e.g., 2024+).  


9. Compute NA counts per column and discuss whether imputing or trimming the panel makes more sense.  


10. Impute missing `close` values by ticker (forward fill) and explain the risk of doing so.  


11. Create a `ret` (simple return) column by ticker using `pct_change` (after sorting by `ticker` and `date`).  


12. Detect and remove duplicates by key `(date, ticker)` if any exist. Explain how you detected them.  


13. Using `groupby(ticker)`, compute: mean return, volatility, and % of positive-return days.  


14. Group by month (derived from `date`) and compute `mean(close)` for SPY.  


15. Reshape: put returns into **wide** format (`date x ticker`), then convert back to long using `melt`.  


16. Download BCRP monthly policy rate PD12301MD and build a DataFrame: `date`, `policy_rate`.  


17. Convert SPY to monthly frequency (e.g., monthly average of `close`) as in the notebook.  


18. Perform an **inner merge** between monthly `policy_rate` and monthly `SPY_close_avg`. Report the number of rows.  


19. Export the monthly merged dataset to CSV (as in the notebook) and confirm you did **not** export the index.  


20. Write **5 validation checks** (assert-style), such as: “no duplicate dates”, “date is datetime”, “policy_rate is numeric”, etc.


# Exercise 1:

### The file ```AMZN_options.csv``` contains options data for amazon. For those of you not familiar with option data, an option is a financial derivatives that pays at expiration time $T$ and strike $K>0$:

$$C(K,T)=(S_T-K)_+ \text{  if a call option}$$
$$P(K,T)=(K-S_T)_+ \text{  if a put option}$$

It is well known, that Put-Call parity implies the following relation:

$$C(K,T)-P(K,T)=S(T)-K\cdot DF(T)$$
where $S(T)$ is the underlying price and $DF(T)$ is the discount factor.
### a) Compute a new column labelled ```mid_price``` which corresponds to:
$$\text{mid price}=\frac{\text{bid}+\text{ask}}{2}$$
### b) For each available ```expiration_date``` perform a linear regression using the ```mid_price``` only for options ```whose trade_volume>25``` (note that both calls and puts need to satisfy this condition):
$$C(K,T)-P(K,T)=a+b K$$ where $a$ corresponds to $S(T)$ and $b$ corresponds to $DF(T)$.

### you can use ```numpy.polyfit(x, y, deg=1)``` to fit a linear regression and obtain the coefficients
### c) Plot $S(T)$ and  $DF(T)$ as a function of $T$

### Note: you will need to transform dates into time, $T$ in years to to us eyou will need to use the ```datetime``` library. Here goes an example

# a)

# b) first we find the different expirations available in the dataset

## We will now group_by our dataframe by expiration_date, option_type and strike which gives one record per multi-index combination

# The following function performs the linear regression
## Note that we need at least 2 points to perform the regression, try/except is helpful here

### Finally we plot results

### As you (first hand) see, data can be a challenging business. We see some unexpected behaviour right? Why is this happening? let's check our samples

### Something seems to happen in "2021-01-15" right?

### Here we go, so we found that mid_price has a zero value. How can we fix this? Well we can surely check that mid_price is greater than zero!

### Bottomline here: you will face this kind of "issue" when dealing with data. Be prepared to debug and adapt your code to detect these anomalies. You can of course leave the task without the cleaning part, but as a good future quant you should question these anomalies

# Exercise 2:

### a) Using the forward prices $F(T)$ and Discount Factors $DF(T)$ obtained previously, calculate the implied volatility of each option using the  ```mid_price```. (Note that you will need to use the implied volatility calculation that you did in the Session 2 assignment)

Recall that in the Black-Scholes model, the value of a European Call option on $(S_t)_{t\geq 0}$ is given at inception by,
    $$C^{\mathrm{BS}}(S_0, K, T;\sigma) = S(T)\left(\mathcal{N}(d_{+}) - DF(T)K\mathcal{N}(d_{-})\right)$$
    $$d_{\pm} = \frac{\log\left(\frac{F(T)}{K}\right)}{\sigma\sqrt{T}} \pm\frac{\sigma\sqrt{T}}{2}$$
    
 Where $F(T)=\frac{S(T)}{DF(T)}$
  

### Remark1 : Try to optimize your code to execute efficiently
### Remark2 : Note that some mid prices might lead to arbitrage and the solution for implied volatility might not exist


### a) let us first deal with the fact that we were not able to compute $DF$ and $S$ for all expirations. To fix this we will need to apply some kind of interpolation

## For simplicity we will apply linear interpolation

# Let us now fill our original option_chain dataframe with $DF$ and $S$

### We will create a smaller dataframe with the columns we need to perform Black-Scholes to Implied Volatility trasform