<hr style="border-width:2px;border-color:#094780">
<center><h1> Working with numbers and time series in Python </h1></center>
<hr style="border-width:2px;border-color:#094780">

### Introduction
In the world of finance, the ability to efficiently manipulate and analyze vast amounts of data is crucial. Financial data, whether it be stock prices, trading volumes, or economic indicators, often comes in large, complex datasets that need to be processed quickly and accurately. This is where Pandas and NumPy, two powerful Python libraries, come into play.

**NumPy** provides the foundation for **numerical computing** in Python. Its support for large, multi-dimensional arrays and matrices, along with a rich collection of mathematical functions, allows for the efficient computation of complex financial models. Whether you're calculating returns, risks, or optimizing portfolios, NumPy enables these operations to be performed quickly and with minimal code.

**Pandas**, on the other hand, is built on top of NumPy and is specifically designed for **data manipulation and analysis**. It introduces data structures like Series and DataFrames, which are ideal for handling tabular financial data. Pandas makes it easy to clean, filter, aggregate, and analyze time series data, which is a common task in finance. With Pandas, you can effortlessly transform raw data into actionable insights, enabling more informed decision-making.

Together, Pandas and NumPy form the backbone of financial data analysis in Python, empowering finance professionals and engineers to tackle a wide range of challenges, from simple data exploration to building complex financial models.

### Summary :

 - <a href="#C1">Start by reading and writing files</a>
 - <a href="#C2">Introduction to NumPy</a>
 - <a href="#C3">Introduction to Pandas</a>

## <a name="C1">Start by reading and writing files</a>
When working with financial data, you may encounter different file formats such as TXT, CSV, and JSON. Python’s built-in libraries provide straightforward methods to read from and write to these files. 

Here is an example of reading a CSV file:

In [None]:
import csv

file = open('financial_data.csv', 'r')
reader = csv.reader(file)
for row in reader:
    print(row)  # Print each row as a list
file.close()

With this approach, if an error occurs between opening and closing the file, the file may not be closed. The main problem with this is that each open file consumes a small amount of memory (RAM) for buffering and handling the file's data. If multiple files are left open, this can gradually increase memory usage. Therefore, it's more efficient to close the file even when errors occur in the code. Based on what we covered in the last session, we can use a `try-finally` clause to ensure this happens.

In [None]:
import csv

# Reading a CSV file
file = open('financial_data.csv', 'r')
try:
    reader = csv.reader(file)
    for row in reader:
        print(row)  # Print each row as a list
finally:
    file.close()

There is a simpler and more concise way to read a file than using a `try-finally` clause, and that is by using the `with` clause as shown below. Using the `with` clause ensures that the file is properly closed and prevents any human oversight in forgetting to close it.

In [None]:
import csv

# Reading a CSV file
with open('financial_data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)  # Print each row as a list


In [None]:
## to compare opening file with and without 'with' clause

import csv

file = open('financial_data.csv', 'r')
reader = csv.reader(file)
for row in reader:
    print(row)  # Print each row as a list
raise
file.close()

In [None]:
print(file.closed)
file.close()
print(file.closed)

In [None]:
import csv

# Reading a CSV file
with open('financial_data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)  # Print each row as a list
    raise

In [None]:
print(file.closed)

Just as we can read files in Python, we can also create and write to new files.

In [None]:
import csv

# Writing to a CSV file
data = [['Date', 'Price', 'Volume'], ['2024-09-01', '100.50', '1500'], ['2024-09-02', '101.00', '1600']]
with open('processed_data.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)  # Write multiple rows at once


## <a name="C2">Introduction to NumPy</a>
### **Importance of NumPy in Scientific Computing**

NumPy (Numerical Python) is one of the core libraries in Python for numerical and scientific computing. Its importance comes from several key features that make it indispensable for handling large-scale scientific and engineering problems. 

#### 1. **Efficient Array Manipulation**
At the heart of NumPy is the **ndarray** object, which allows for fast and efficient handling of multi-dimensional arrays (matrices, vectors). Unlike Python's built-in lists, which can store different data types and are slow for mathematical operations, NumPy arrays are **homogeneous** (all elements are of the same type) and optimized for performance. 


In [None]:
!pip install numpy

In [None]:
import numpy as np
import time

# Create large list and NumPy array
size = 10**6
python_list = list(range(size))
numpy_array = np.arange(size)

# Time the addition for Python list
start_time = time.time()
python_list_result = [x + 1 for x in python_list]
python_list_time = time.time() - start_time

# Time the addition for NumPy array
start_time = time.time()
numpy_array_result = numpy_array + 1
numpy_array_time = time.time() - start_time

# Print results
print(f"Python list took: {python_list_time:.5f} seconds")
print(f"NumPy array took: {numpy_array_time:.5f} seconds")


In [None]:
import numpy as np
import sys

# Create a large list and NumPy array
size = 10**6
python_list = list(range(size))
numpy_array = np.arange(size)

# Memory size of Python list
python_list_memory = sys.getsizeof(python_list[0]) * len(python_list)

# Memory size of NumPy array
numpy_array_memory = numpy_array.nbytes

# Print memory usage
print(f"Python list memory usage: {python_list_memory / 1024 / 1024:.5f} MB")
print(f"NumPy array memory usage: {numpy_array_memory / 1024 / 1024:.5f} MB")


#### 2. **Vectorization and Broadcasting**
NumPy's operations are **vectorized**, meaning you can apply mathematical functions to entire arrays without writing loops, resulting in cleaner, faster code. 

Additionally, NumPy supports **broadcasting**, a technique that automatically expands smaller arrays to match the size of larger ones during operations. This leads to more concise code and eliminates the need for manual resizing or complex loops.

In [None]:
# The vectorization's importance
import numpy as np
import time

# Large dataset of stock returns and weights
n = 10**8

returns = np.random.random(n)
weights = np.random.random(n)

# Non-vectorized loop approach
start_time = time.time()
portfolio_return = 0
for i in range(n):
    portfolio_return += returns[i] * weights[i]
loop_time = time.time() - start_time

# Vectorized NumPy approach
start_time = time.time()
portfolio_return_vectorized = np.dot(returns, weights)
vectorized_time = time.time() - start_time

print(f"Loop approach took: {loop_time:.5f} seconds")
print(f"Vectorized approach took: {vectorized_time:.5f} seconds")


**Key Differences:**
- **Performance:** NumPy optimizes this operation at a lower level (C code) making it much **faster** than manually looping through Python lists.
- **Readability:** The vectorized code is **cleaner** and **easier to understand**. Instead of writing an explicit loop, you simply express the operation as a dot product between two arrays.


In [None]:
# The broadcasting's importance
import numpy as np

# Portfolio stock prices over 3 days (3x4 matrix for 4 stocks)
prices = np.array([[100, 102, 98, 105],
                   [101, 103, 99, 106],
                   [102, 104, 100, 107]])

# Percentage changes (1x4 vector)
pct_changes = np.array([1.01, 1.02, 0.99, 1.03])

# Broadcasting to apply the percentage changes to each day's prices
new_prices = prices * pct_changes
print(new_prices)


In [None]:
import numpy as np

# A 2x3 matrix
matrix = np.array([[1, 2, 3], [4, 5, 6]])

# A vector of shape (3,)
vector = np.array([10, 20, 30])

# Broadcasting the vector across the matrix rows
result = matrix + vector
print(result)


#### 3. **Linear Algebra and Matrix Operations**
Scientific computing heavily relies on **linear algebra**, and NumPy provides robust support for matrix operations, including:
- **Matrix multiplication**.
- **Dot products**.
- **Eigenvalues and eigenvectors**.



In [None]:
# Matrix inversion
import numpy as np

# Define a 2x2 matrix
A = np.array([[4, 7],
              [2, 6]])

# Compute the inverse of the matrix
A_inv = np.linalg.inv(A)

print("Matrix A:\n", A)
print("Inverse of A:\n", A_inv)

# Verify by multiplying the matrix with its inverse (should give identity matrix)
I = A @ A_inv
print("A @ A_inv (Identity Matrix):\n", np.round(I, decimals=2))

In [None]:
# Determinant of a Matrix
import numpy as np

# Define a 3x3 matrix
B = np.array([[6, 1, 1],
              [4, -2, 5],
              [2, 8, 7]])

# Compute the determinant
det_B = np.linalg.det(B)

print("Matrix B:\n", B)
print("Determinant of B:", det_B)


In [None]:
# Eigenvalues and Eigenvectors
import numpy as np

# Define a 2x2 matrix
C = np.array([[4, -2],
              [1, 1]])

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(C)

print("Matrix C:\n", C)
print("Eigenvalues:\n", eigenvalues)
print("Eigenvectors:\n", eigenvectors)


In [None]:
# Solving a System of Linear Equations: A . x = b
import numpy as np

# Coefficient matrix A
A = np.array([[3, 1], [1, 2]])

# Constant vector b
b = np.array([9, 8])

# Solve the system A @ x = b
x = np.linalg.solve(A, b)

print("Solution to the system (x):", x)



#### 4. **Wide Range of Mathematical Functions**
NumPy provides an extensive set of mathematical functions, including:
- **Trigonometric**, **logarithmic**, and **exponential** functions.
- **Statistical** functions like mean, median, standard deviation, and variance.
- **Fourier transforms** for signal processing.

In [None]:
import numpy as np

# Define angles in radians
angles = np.array([0, np.pi/2, np.pi])

# Compute sine, cosine, and tangent
sin_values = np.sin(angles)
cos_values = np.cos(angles)
tan_values = np.tan(angles)

print("Sine:", sin_values)
print("Cosine:", cos_values)
print("Tangent:", tan_values)


x = np.array([1, 2, 3])

# Exponential function
exp_values = np.exp(x)
print("Exponential:", exp_values)  

# Natural logarithm (log base e)
log_values = np.log(x)
print("Natural Log:", log_values)  


data = np.array([10, 20, 30, 40, 50])

# Mean
mean_value = np.mean(data)
print("Mean:", mean_value)

# Median
median_value = np.median(data)
print("Median:", median_value)

# Standard Deviation
std_deviation = np.std(data)
print("Standard Deviation:", std_deviation)


**Exercise 1**
- Convert the following list to a numpy array:
```python
l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
```

- Use the result to extract a numpy array containing the even elements from arr.


**Exercise 2**
- Given the numpy array from exercise 1, reshape the array so that it becomes a 2 by 5 matrix.


**Exercise 3**
- Simulate a random uniform(0, 1) with 100 elements.
- Simulate a random normal(0, 1) with 100 elements.


**Exercise 4**
Write a one-liner to compute the transpose of the following matrix:
```python
M = np.array([[1,2,3], [4,5,6], [7,8,9]])
```


**Exercise 5**

- Given the following matrix M:
```python
M = np.array([[1,2,3], [2,1,4], [3,4,1]])
```

- Find the following matrix decomposition M = PDP^-1, where D is a diagonal matrix with the eigenvalues of M and P is an invertible (change of basis) matrix with columns the eigen vectors of M.

- Perform the multiplication M = PDP^-1. Do you get the matrix M back ?
 
- Display the dimensions of the martrix M


**Exercise 6**
- Read the data from linreg_data.csv using np.genfromtxt("linreg_data.csv", delimiter=",", dtype=np.float64) and find the coefficients beta = [beta0, beta1] that minimizes ||y - AX||, where ||.|| is the l2 norm. Here, y is a vector of yi where (y-coordinates), A is an N by 2 matrix with ones in the first columns and xi in second column, and i goes from 1 to N

**Exercise 8**

**Implementing a Simple Monte Carlo Simulation for Option Pricing**

The goal of this task is to implement a **Monte Carlo simulation** to estimate the price of a **European call option** using **NumPy**. A European call option gives the holder the right (but not the obligation) to buy an asset at a specified strike price $K$ on a specified expiration date $T$.

In the Monte Carlo method, we simulate multiple possible future paths for the asset's price based on a stochastic model, typically the **Geometric Brownian Motion (GBM)** model, and compute the average payoff of the option over those simulated paths. The average payoff is then discounted to present value to estimate the option price.

**Steps to Implement:**
1. **Inputs**:
   - Initial stock price $S_0$.
   - Strike price $K$.
   - Time to maturity $T$ (in years).
   - Risk-free interest rate $r$.
   - Volatility $\sigma$ of the underlying asset.
   - Number of simulations $N$.

2. **Model**: The asset price follows **Geometric Brownian Motion (GBM)**, described by the following stochastic differential equation:

   $$ dS = r S \, dt + \sigma S \, dW $$

   Where:
   - $S$ is the asset price.
   - $r$ is the risk-free rate.
   - $\sigma$ is the volatility of the asset.
   - $dW$ represents the Wiener process (random component).

3. **Monte Carlo Simulation**:
   - Simulate $N$ different paths for the asset price at maturity $T$.
   - Compute the payoff for each path: $ \max(S_T - K, 0) $ for a call option.
   - Average the payoffs and discount them to the present value using the risk-free rate.

4. **Output**: The estimated price of the call option.

**Formula:**

The asset price at maturity $T$ can be simulated using:

$$
S_T = S_0 \times \exp \left( (r - 0.5 \sigma^2) T + \sigma \sqrt{T} Z \right)
$$

Where:
- $Z$ is a random variable from the standard normal distribution $ Z \sim N(0, 1) $.

The payoff of the call option for each path is:

$$
\text{Payoff} = \max(S_T - K, 0)
$$

The option price is the present value of the average payoff:

$$
\text{Option Price} = e^{-rT} \times \frac{1}{N} \sum_{i=1}^{N} \max(S_T^{(i)} - K, 0)
$$


## <a name="C3">Introduction to Pandas</a>
Pandas is one of the most powerful and widely-used Python libraries for data analysis and manipulation, making it an essential tool in the world of finance. Its ability to handle large datasets efficiently and its rich set of functions for data manipulation makes it a key library for financial professionals who deal with time series, historical data, and complex financial computations.

In Pandas, we have 2 fundamental data structures: **Series** and **dataframes**.


A **Pandas Series** is a **one-dimensional** array-like object that can hold various data types, including integers, floats, strings, and more. Each element in a Series is associated with an index, similar to a dictionary in Python, where each value is mapped to a specific key.

A **Pandas DataFrame** is a **two-dimensional** table-like structure where each column can be a different data type (similar to a spreadsheet or SQL table). It consists of rows and columns, with both row and column labels.


| Feature           | **Series**                                      | **DataFrame**                                   |
|-------------------|-------------------------------------------------|-------------------------------------------------|
| **Dimensions**     | 1-dimensional (like a column)                   | 2-dimensional (like a table)                    |
| **Indexing**       | Single index (for rows)                         | Dual index (row and column labels)              |
| **Slicing**        | Label-based (`loc`), position-based (`iloc`)     | Label-based (`loc`), position-based (`iloc`)    |
| **Structure**      | A labeled array or vector                       | A table of rows and columns                     |
| **Operations**     | Element-wise operations supported               | Column-wise and row-wise operations supported   |
| **Missing Data**   | Can handle missing data (`NaN`)                 | Can handle missing data (`NaN`)                 |
| **Use Case**       | A single column of data (e.g., stock prices)     | A dataset with multiple variables (e.g., a financial report with multiple columns) |

Here is the list of the main Pandas methods and functions:

| **Category**                     | **Function/Method**         | **Description**                                    |
|-----------------------------------|----------------------------|----------------------------------------------------|
| **Data Creation and Loading**     | `pd.read_csv()`            | Load data from a CSV file.                         |
|                                   | `pd.read_excel()`          | Load data from an Excel file.                      |
|                                   | `pd.DataFrame()`           | Create a DataFrame from lists, dictionaries, etc.  |
|                                   | `pd.Series()`              | Create a Pandas Series object.                     |
| **Basic Data Exploration**        | `head()`                   | View the first rows of the DataFrame.              |
|                                   | `tail()`                   | View the last rows of the DataFrame.               |
|                                   | `info()`                   | Get a concise summary of the DataFrame.            |
|                                   | `describe()`               | Generate descriptive statistics.                   |
|                                   | `shape`                    | Check dimensions (rows, columns) of the DataFrame. |
|                                   | `columns`                  | View the column names.                             |
|                                   | `index`                    | View the row indexes.                              |
| **Indexing and Selection**        | `loc[]`                    | Access rows/columns by labels or boolean conditions.|
|                                   | `iloc[]`                   | Access rows/columns by integer-location indexing.  |
|                                   | `at[]` / `iat[]`           | Access a single value by label/position.           |
| **Data Cleaning and Handling**    | `isnull()`                 | Identify missing values.                           |
|                                   | `notnull()`                | Identify non-missing values.                       |
|                                   | `dropna()`                 | Remove missing values.                             |
|                                   | `fillna()`                 | Fill missing values.                               |
|                                   | `replace()`                | Replace specific values in the DataFrame.          |
|                                   | `astype()`                 | Change the data type of a column.                  |
| **Data Manipulation and Transformation** | `apply()`         | Apply a function along an axis of the DataFrame.   |
|                                   | `map()`                    | Apply a function element-wise on a Series.         |
|                                   | `applymap()`               | Apply a function element-wise on the DataFrame.    |
|                                   | `assign()`                 | Add new columns or modify existing ones.           |
|                                   | `rename()`                 | Rename columns or row indexes.                     |
|                                   | `drop()`                   | Remove columns or rows.                            |
| **Sorting and Ordering**          | `sort_values()`            | Sort by values along either axis.                  |
|                                   | `sort_index()`             | Sort by the DataFrame index.                       |
| **Aggregation and Grouping**      | `groupby()`                | Group data for aggregation.                        |
|                                   | `agg()`                    | Aggregate data using different functions.          |
|                                   | `pivot_table()`            | Create a pivot table from the data.                |
| **Merging and Combining**         | `concat()`                 | Concatenate multiple DataFrames.                   |
|                                   | `merge()`                  | Merge two DataFrames based on a common key.        |
|                                   | `join()`                   | Join two DataFrames based on their indexes.        |
| **Time Series Handling**          | `pd.to_datetime()`         | Convert a column to datetime format.               |
|                                   | `resample()`               | Resample time series data.                         |
|                                   | `shift()`                  | Shift index by a desired number of periods.        |
| **Exporting Data**                | `to_csv()`                 | Export the DataFrame to a CSV file.                |
|                                   | `to_excel()`               | Export the DataFrame to an Excel file.             |
|                                   | `to_json()`                | Export the DataFrame to JSON format.               |




In [None]:
!pip install pandas

In [None]:
import pandas as pd

# Creating a Pandas Series
s = pd.Series([100, 200, 300, 400], index=['A', 'B', 'C', 'D'])

# Reaching elements in a serie
print(s)
print(s.loc["A"])
print(s.iloc[0])


In [None]:
import pandas as pd

# Creating a Pandas DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000]
}

df = pd.DataFrame(data)

# Reaching elements in a dataframe
print(df)
print("--------------------")
print(df.loc[1])
print("--------------------")
print(df.iloc[1])
print("--------------------")
print(df.iloc[1]["Name"])


### Data exploration
When working with data, we start by exploring it, checking the amount of data (number of rows), null values, range of dates, column types, etc. These are just examples of checks to perform before starting to manipulate the data. `Pandas` provides some tools to help with this. Let's discover them.

In [None]:
import pandas as pd

# Reading csv file
df_stocks = pd.read_csv("all_stocks_5yr.csv")

# Displaying samples of data
display(df_stocks.head()) # displaying the first 5 rows
display(df_stocks.tail(3)) # Displaying the last 3 rows
print(df_stocks.dtypes)

# Counting the number of rows
print("##########################################################")
print(f"The number of (rows, columns) in our dataframe: {df_stocks.shape}")
print(f"The number of non null elements in every column: {df_stocks.count()}")

# Displaying statistical information about our data
print("##########################################################")
display(df_stocks.describe())
print(f"The first date in the data: {df_stocks['date'].min()}")
print(f"The last date in the data: {df_stocks['date'].max()}")
print(f"The different stocks' names: {df_stocks['Name'].unique()}")
print(f"The number of different stocks: {len(df_stocks['Name'].unique())}")

# Explore null values
print("##########################################################")
for col in df_stocks.columns:
    print(f"The number of null values in the column {col}: {df_stocks[col].isnull().sum()}") # since True is equivalent to 1 and False is equivalent to 0, when applying sum(), it counts the number of True values
display(df_stocks[df_stocks.isnull().any(axis=1)]) # filter rows which any value == null (at least one null value)


### Data cleansing
After reading the data, we may apply some operations to clean it. The statements below are some examples:

- The date column should be of the `datetime` type.
- The analysis will be conducted from the beginning of 2014 until the end of 2017.
- If the high price is missing, it should be filled with the maximum of the open and close prices.
- If the low price is missing, it should be filled with the minimum of the open and close prices.
- If the open price is missing, it should be filled with the close price of the previous day.


In [None]:
import numpy as np

# The date column should be of the `datetime` type.
df_stocks['date'] = pd.to_datetime(df_stocks['date'])

# The analysis will be conducted from the beginning of 2014 until the end of 2017.
df_stocks = df_stocks[(df_stocks["date"].dt.year <= 2017) & (df_stocks["date"].dt.year >= 2014)]


# If the open price is missing, it should be filled with the close price of the previous day.
df_stocks['close_lagged'] = (df_stocks.sort_values(by=['Name', 'date'], ascending=True)).groupby(['Name'])['close'].shift(1)
df_stocks.loc[df_stocks['open'].isna(), 'open'] = df_stocks['close_lagged']
del df_stocks['close_lagged']

# If the high price is missing, it should be filled with the maximum of the open and close prices.
df_stocks.loc[df_stocks['high'].isna(), 'high'] = np.fmax(df_stocks['close'], df_stocks["open"]) # there is another function np.maximum that return the max of 2 values, but we didn't use it because it doesn't ignore Nan value

# If the low price is missing, it should be filled with the minimum of the open and close prices.
df_stocks.loc[df_stocks['low'].isna(), 'low'] = np.fmin(df_stocks['close'], df_stocks["open"]) 


display(df_stocks[df_stocks.isnull().any(axis=1)])


### Data analysis
There are many other operations we can apply to a DataFrame for BI or machine learning purposes, such as generating new columns based on existing ones, grouping data to create an aggregated view, or joining it with other data sources to add more precision.

In [None]:
## display the mean of close price by YYYY-mm for the stock AAL

# create a new column containing the year and the month YYYY-mm
df_stocks['year_month'] = df_stocks['date'].dt.strftime('%Y-%m')
# filter on AAL stock and group by the new column with computing the average of the column close
mean_values = df_stocks[df_stocks["Name"] == "AAL"].groupby('year_month')['close'].mean().reset_index()

display(mean_values)


In [None]:
## display the mean of close price by YYYY-mm and the sum of the volume for each stock

# create a new column containing the year and the month YYYY-mm
df_stocks["year_month"] = df_stocks["date"].dt.strftime("%Y-%m")
# group by the new column with computing the average of the column close and the sum of the volume for each stock
mean_sum_values = df_stocks.groupby(["year_month", "Name"]).agg({'close': 'mean', 'volume': 'sum'}).reset_index()

display(mean_sum_values)


In data analysis, it's common to work with multiple datasets that need to be combined to extract meaningful insights. Pandas provides powerful methods such as `merge` and `join` to perform these data junctions efficiently.

Like SQL, Pandas supports different types of joins: `inner`, `outer`, `left`, and `right`. These allow you to control how data from multiple DataFrames is combined, based on matching keys.

`merge` is a versatile method that allows you to join DataFrames based on one or more columns, offering flexibility for various data structures. In contrast, `join` is ideal for simpler cases where you want to combine data based on the DataFrame's index."

In [None]:
## joining the stocks dataframe to another one to have more information about the stocks

df_stocks_info = pd.read_csv("stock_info.csv")
df_stocks_enriched = df_stocks.merge(df_stocks_info, how="left", left_on="Name", right_on="abbreviation")
display(df_stocks_enriched)


In [None]:
## Computing the moving average to have a smooth curve of close prices
df_stocks_enriched['5_day_moving_avg'] = (df_stocks_enriched.sort_values(by=['Name', 'date'], ascending=True)).groupby('Name')['close'].transform(lambda x: x.rolling(window=5).mean())
df_stocks_enriched


**Exercise 1**

**Implementing Eigen Portfolios (PCA)**

The goal here is to construct **Eigen Portfolios**.

- Load the all_stocks_5yr.csv data into a Pandas dataframe.

- Clean the dataframe from any null or nan values you encounter using linear interpolation. Be careful not to interpolate with another stock price, (hint: you may want to use groupby).

- Reach the result from the example in the previous cell and construct the covariance matrix of all stocks available in the data using the smoothed out 5-day moving average prices.

- Use what you have done in exercise 7 from the NumPy section to find the matrices P and D associated with the covariance matrix.
Each vector in your matrix P is an eigen portfolio associated to the corresponding eigenvalue (variance) of your diagonal matrix D.
Your first vector (ie. P[:,0]) is your first eigen portfolio, and it explains the most variance, your second vector (ie. P[:,1]) is your first eigen portfolio, and it explains the second most variance, etc...

- Compute the realized return and variance of each eigen portfolio, what do you see ?