# Portfolio Optimization Using Mean-Variance method

---

## Introduction

In our portfolio optimization lecture, we will create a diversified portfolio spanning various industries utilizing the Mean-Variance method. We will use `PyPortfolioOpt` to solve this optimizing problem.

We now try to find a portfolio $w=(w_1,\cdots,w_n)$ that minimizes risk and maximizes return for a particular set of assets.

Note that we only consider **Long Position** 

---

## Algorithm overview

<img src="/Users/baobach/CQF_June_Cohort/pictures/portfolio_opt_algo.png" height="400">


---

### Math

The vector of asset expected returns $\mu$ is defined as:

$$
\mu = \begin{pmatrix} \mu_{1} \\ \vdots \\ \mu_{i} \\ \vdots \\ \mu_{n} \end{pmatrix}
$$

The vector of weights is:
$$
w = 
\begin{pmatrix} w_{1} \\ \vdots \\ w_{i} \\ \vdots \\ w_{n} \end{pmatrix}
$$

> Such that $\sum\limits_{i=1}^{n}w_i = 1$

Then **Expected Portfolio Return** is then the dot product of the expected returns and their weights:

$$
\mu_\pi = w^\intercal \cdot \mu
$$

And **Expected Portfolio Variance** is then the multidot product of weights and the covariance matrix:

$$
\sigma_\pi^2 = w^\intercal \cdot \Sigma \cdot w
$$

The covariance matrix $\Sigma$ is given by:

$$
\Sigma =
\begin{pmatrix}
\sigma^{2}_{1} & \rho_{12} \sigma_{1} \sigma_{2} & \cdots & \rho_{1n} \sigma_{1} \sigma_{n} \\ \rho_{21} \sigma_{2} \sigma_{1} & \sigma^{2}_{2} & \cdots & \rho_{2n} \sigma_{2} \sigma_{n}  \\ \vdots \\ \rho_{n1} \sigma_{n} \sigma_{1} & \cdots & \cdots & \sigma^{2}_{n}
\end{pmatrix}
$$

Now we will find portfolio with the minimum risk (Minimum Volatility) $\sigma_\pi$ given target return $\mu^*$:

$$
\begin{align*}
&\underset{w}{\text{min}} &w^\intercal \cdot \Sigma \cdot w \\
&\text{subject to} &w^\intercal \cdot \mu &\geq \mu^* \\
& &w^\intercal \cdot \mathbf{1} &= 1 \\
& &w_i &\geq 0
\end{align*}
$$

If we vary the target return, we will get a different set of weights (i.e a different portfolio) – the set of all these optimal portfolios is referred to as the efficient frontier.

---

### Programing

Instead of simulating a lot of portfolio and try to find the best $w$ vector, we are going to use Efficient frontier

Preparing historical asset price data to construct a portfolio following these steps:

1. Select stock from diversified industries like Healthcare, Tech, Retail and Finance: 

    * Healthcare: Moderna (MRNA), Pfizer (PFE), Johnson & Johnson (JNJ)
    * Tech: Google (GOOGL), Facebook (FB), Apple (AAPL)
    * Retail: Costco (COST), Walmart (WMT),  Kroger Co (KR)
    * Finance: JPMorgan Chase & Co (JPM), Bank of America (BAC), HSBC Holding (HSBC)  
  
2. Using `yfinance` package, pull historical trading data and calculate the returns 

3. Using `PyPortfolioOpt` to calculating covariance matrix

4. Apply Efficient frontier method to find the $w$ vector that has max Sharp ratio

#### Preparing historical data

Import working libraries

In [1]:
# Import pandas & yfinance
import pandas as pd
import yfinance as yf
# Import numpy
import numpy as np
# Import date_time
import datetime

Create a list of stocks that we are intrested as mentioned and determine time constrains

In [2]:
# Stock list
stocks = ["MRNA", "PFE", "JNJ", "GOOGL", "META", "AAPL", "COST", "WMT", "KR", "JPM", "BAC", "HSBC"]
# Time constrains for 3 years of data
start = datetime.datetime(2020,7,25)
end = datetime.datetime(2023,7,25)

Let's use `yfinance` to retrive data for just `MRNA`

In [3]:
temp = yf.download("MRNA", start = start, end = end, progress=False)
temp.head()

KeyboardInterrupt: 

It's working fine, let’s wrap this logic in a function that we can easily reuse since we will be pulling several stocks:

In [None]:
def get_stock(ticker):
    data = yf.download(f'{ticker}', start = start, end = end, progress=False)
    data[f'{ticker}'] = data["Close"]
    data = data[[f'{ticker}']] 
    print(data.head())
    return data

Now, let’s pull for Pfizer (PFE) and Johnson & Johnson (JNJ):

In [None]:
pfizer = get_stock("PFE")
jnj = get_stock("JNJ")

                  PFE
Date                 
2020-07-27  35.616699
2020-07-28  37.020874
2020-07-29  37.248577
2020-07-30  36.755219
2020-07-31  36.508537
                   JNJ
Date                  
2020-07-27  147.179993
2020-07-28  146.830002
2020-07-29  146.539993
2020-07-30  146.839996
2020-07-31  145.759995


Unnamed: 0_level_0,JNJ
Date,Unnamed: 1_level_1
2020-07-27,147.179993
2020-07-28,146.830002
2020-07-29,146.539993
2020-07-30,146.839996
2020-07-31,145.759995


Let’s define another function that takes a list of stocks and generate a single data frame of stock prices for each stock:

In [None]:
from functools import reduce

def combine_stocks(tickers):
    data_frames = [get_stock(i) for i in tickers]
    # Similar to SQL join, we are accumulating the new array to the existing array.
    # `pd.merge` has better performance than `list.append` method
    df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['Date'], how='outer'), data_frames)
    print(df_merged.head())
    return df_merged


In [None]:
# Create portfolio of stocks
portfolio = combine_stocks(stocks)

NameError: name 'combine_stocks' is not defined

We now have a single dataframe of returns for our stocks. Let’s write this dataframe to a csv so we can easily read in the data without repeatedly having to pull it using the Pandas-Datareader.

In [None]:
# write data to file for future use
df.to_csv('data/india_stocks.csv')