### FE630 - Midterm Project

**Author**: Sid Bhatia

**Date**: March 25th, 2023

**Pledge**: I pledge my honor that I have abided by the Stevens Honor System.

**Professor**: Papa Momar Ndiaye

##### Question 1. (15 pts)

The supplied $\texttt{data.zip}$ file contains 30 space-delimited text files that contain price and volume data for 30 companies. Each row of each file contains date, opening price, closing price, high price, low price, volume, and adjusted closing price (last column). You will need that data for question 1. 

Write a program $\texttt{processdata}$ to:

1. Read all daily price files;
2. Create a price matrix $\texttt{P}$ by aligning the data’s dates and placing the adjusted closing prices side-by-side in columns;
3. From the $\texttt{P matrix}$, create a matrix of simple (not logarithmic) daily returns $\texttt{R}$;
4. Compute the vector of average daily returns mu for the companies using the $\texttt{mean}$ function (do not use loops);
5. Compute the covariance matrix $\texttt{Q}$ from the return matrix using the $\texttt{cov}$ function; and
6. Save the return vector $\texttt{mu}$ and the covariance matrix $\texttt{Q}$ in the native format for your programming language in a file called $\texttt{inputs.ext}$, where $\texttt{ext}$ is the appropriate extension for a binary file in your language.

In [9]:
import pandas as pd
import numpy as np
import os
from typing import List

def processdata(data_dir: str = 'data') -> None:
    """
    Processes stock price data to compute and save the average daily return vector and covariance matrix.
    
    This function reads stock price data from text files, each containing data for a company, then:
    1. Creates a price matrix with adjusted close prices,
    2. Calculates the daily return matrix,
    3. Computes the vector of average daily returns for each company,
    4. Computes the covariance matrix of the return matrix,
    5. Saves the average daily returns vector and the covariance matrix to binary files.
    
    Parameters:
    - data_dir (str): The directory containing the stock price files. Default is 'data'.
    
    Returns:
    - None. The function saves two files: 'inputs_mu.pkl' and 'inputs_Q.pkl' with the results.
    """
    # List to store the adjusted close price data for each company.
    price_data: List[pd.Series] = []

    # Loop through each file in the specified directory.
    for file in os.listdir(data_dir):
        if file.endswith('.txt'):
            filepath = os.path.join(data_dir, file)
            # Read data, assuming space-separated values without an explicit header.
            df = pd.read_csv(filepath, sep=' ', header=None,
                             names=['Date', 'Open', 'Close', 'High', 'Low', 'Volume', 'Adj Close'])
            # Set date as the index for easy alignment later.
            df.set_index('Date', inplace=True)
            # Append the adjusted close price series to the list.
            price_data.append(df['Adj Close'])

    # Concatenate all the adjusted close prices side-by-side, aligning by date.
    P = pd.concat(price_data, axis=1)
    P.sort_index(inplace=True)  # Ensure the dates are in order.


[Date
20130102    57.020516
20130103    57.263157
20130104    57.855200
20130107    58.097841
20130108    58.427832
              ...    
20150921    76.739998
20150922    75.709999
20150923    75.629997
20150924    74.690002
20150925    75.099998
Name: Adj Close, Length: 689, dtype: float64, Date
20130102     72.458451
20130103     72.834518
20130104     73.041355
20130107     71.574694
20130108     69.694366
               ...    
20150921    136.020004
20150922    133.990005
20150923    131.669998
20150924    129.750000
20150925    131.009995
Name: Adj Close, Length: 689, dtype: float64, Date
20130102    11.796938
20130103    11.728294
20130104    11.875388
20130107    11.855776
20130108    11.747906
              ...    
20150921    15.700000
20150922    15.570000
20150923    15.720000
20150924    15.550000
20150925    15.890000
Name: Adj Close, Length: 689, dtype: float64, Date
20130102    86.985691
20130103    87.822988
20130104    88.306756
20130107    88.576552
20130108    87.4