Python offers several straightforward techniques for putting together an optimized portfolio of investments. Here’s a guide to getting started with them.

In investing, portfolio optimization is the task of selecting assets such that the return on investment is maximized while the risk is minimized. For example, an investor may be interested in selecting five stocks from a list of 20 to ensure they make the most money possible. 

Portfolio optimization methods, applied to private equity, can also help manage and diversify investments in private companies. More recently, with the rise in cryptocurrency, portfolio optimization techniques have been applied to investments in Bitcoin and Ethereum, among others.

In each of these cases, the task of optimizing assets involves balancing the trade-offs between risk and return, where return on a stock is the profits realized after a period of time and risk is the standard deviation in an asset's value. Many of the available methods of portfolio optimization are essentially extensions of diversification methods for assets in investing. The idea here is that having a portfolio of different types of assets is less risky than having ones that are similar.

Finding the right methods for portfolio optimization is an important part of the work done by investment banks and asset management firms. One of the early methods is called **mean variance optimization**, which was developed by Harry Markowitz and, consequently, is also called the Markowitz Method or the HM method. The method works by assuming investors are risk-averse. Specifically, it selects a set of assets that are least correlated (i.e., different from each other) and that generate the highest returns. This approach means that, given a set of portfolios with the same returns, we will select the portfolio with assets that have the least statistical relationship to one another.

For example, instead of selecting a portfolio of tech company stocks, we should pick a portfolio with stocks across disparate industries. In practice, **the mean variance optimization algorithm** may select a portfolio containing assets in tech, retail, healthcare and real estate instead of a single industry like tech. Although this is a fundamental approach in modern portfolio theory, it has many limitations such as assuming that historical returns completely reflect future returns.

Additional methods like **hierarchical risk parity (HRP)** and **mean conditional value at risk (mCVAR)** address some of the limitations of the mean variance optimization method. Specifically, HRP does not require inverting of a covariance matrix, which is a measure of how stock returns move in the same direction. The mean variance optimization method requires finding the inverse of the covariance matrix, however, which is not always computationally feasible.

Further, the **mCVAR** method does not make the assumption that mean variance optimization makes, which happens when returns are normally distributed. Since **mCVAR** doesn’t assume normally distributed returns, it is not as sensitive to extreme values like **mean variance optimization**. This means that if a stock has an anomalous increase in price, mCVAR will be more robust than mean variance optimization and will be better suited for asset allocation. Conversely, mean variance optimization may naively suggest we disproportionately invest most of our resources in an asset that has an anomalous increase in price.

The Python package `PyPortfolioOpt` provides a wide variety of features that make implementing all these methods straightforward. Here, we will look at how to apply these methods to construct a portfolio of stocks across industries.

### PORTFOLIO OPTIMIZATION METHODS IN PYTHON

* Mean Variance Optimization
* Hierarchical Risk Parity (HRP)
* Mean Conditional Value at Risk (mCVAR)

### Accessing Stock Price Data 

We will pull stock price data using the **Pandas-Datareader** library. We can easily install the library using `pip` in a terminal command line:

In [1]:
# !pip install --upgrade pandas

In [2]:
# !pip install --upgrade pandas-datareader 

In [3]:
import pandas_datareader.data as web
import datetime

We should pull stocks from a few different industries, so we’ll gather price data in 
* healthcare, 
* tech, 
* retail and 
* finance. 

We will pull three stocks for each industry. Let’s start by pulling a few stocks in healthcare. We will pull two years of stock price data for 
* Moderna, 
* Pfizer and 
* Johnson & Johnson.

First, let’s import Pandas and relax the display limits on rows and columns:

In [4]:
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Let’s import the datetime module and define start and end dates:

In [5]:
start = datetime.datetime(2019,9,15)
end = datetime.datetime(2021,9,15)

Now we have everything we need to pull stock prices. Let’s get data for Moderna (MRNA):

In [6]:
data = web.DataReader(name ="MRNA" ,data_source = "yahoo",start =start, end = end )

In [7]:
data.head()

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-09-16,17.309999,16.690001,16.73,17.049999,3053900,17.049999
2019-09-17,17.950001,17.190001,17.459999,17.66,5239600,17.66
2019-09-18,18.0,17.35,17.77,17.780001,5834500,17.780001
2019-09-19,18.42,17.617001,17.709999,17.9,5866800,17.9
2019-09-20,18.389999,17.75,18.139999,18.07,24387400,18.07


In [8]:
def get_stock(ticker):
    data = web.DataReader(f"{ticker}","yahoo",start,end)
    data[f'{ticker}'] = data["Close"]
    data = data[[f'{ticker}']] 
    print(data.head())
    return data 

In [9]:
pfizer = get_stock("PFE")
jnj = get_stock("JNJ")

                  PFE
Date                 
2019-09-16  34.943073
2019-09-17  34.629982
2019-09-18  34.516129
2019-09-19  34.639469
2019-09-20  34.810246
                   JNJ
Date                  
2019-09-16  129.539993
2019-09-17  129.669998
2019-09-18  130.410004
2019-09-19  130.110001
2019-09-20  131.649994


Let’s define another function that takes a list of stocks and generate a single data frame of stock prices for each stock:

In [10]:
from functools import reduce

def combine_stocks(tickers):
    data_frames = []
    for i in tickers:
        data_frames.append(get_stock(i))
        
    df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['Date'], how='outer'), data_frames)
    print(df_merged.head())
    return df_merged

In [11]:
stocks = ["MRNA", "PFE", "JNJ"]
combine_stocks(stocks)

                 MRNA
Date                 
2019-09-16  17.049999
2019-09-17  17.660000
2019-09-18  17.780001
2019-09-19  17.900000
2019-09-20  18.070000
                  PFE
Date                 
2019-09-16  34.943073
2019-09-17  34.629982
2019-09-18  34.516129
2019-09-19  34.639469
2019-09-20  34.810246
                   JNJ
Date                  
2019-09-16  129.539993
2019-09-17  129.669998
2019-09-18  130.410004
2019-09-19  130.110001
2019-09-20  131.649994
                 MRNA        PFE         JNJ
Date                                        
2019-09-16  17.049999  34.943073  129.539993
2019-09-17  17.660000  34.629982  129.669998
2019-09-18  17.780001  34.516129  130.410004
2019-09-19  17.900000  34.639469  130.110001
2019-09-20  18.070000  34.810246  131.649994


Unnamed: 0_level_0,MRNA,PFE,JNJ
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-09-16,17.049999,34.943073,129.539993
2019-09-17,17.66,34.629982,129.669998
2019-09-18,17.780001,34.516129,130.410004
2019-09-19,17.9,34.639469,130.110001
2019-09-20,18.07,34.810246,131.649994
2019-09-23,17.83,34.383301,131.740005
2019-09-24,17.82,34.165085,131.550003
2019-09-25,17.18,34.060722,130.990005
2019-09-26,16.34,33.946869,128.850006
2019-09-27,15.9,34.364326,128.600006


Now, let’s pull stocks for the remaining industries:

* `Healthcare`: Moderna (MRNA), Pfizer (PFE), Johnson & Johnson (JNJ)

* `Tech`: Google (GOOGL), Facebook (FB), Apple (AAPL)

* `Retail`: Costco (COST), Walmart (WMT),  Kroger Co (KR)

* `Finance`: JPMorgan Chase & Co (JPM), Bank of America (BAC), HSBC Holding (HSBC)

In [12]:
stocks = ["MRNA", "PFE", "JNJ", "GOOGL", 
          "FB", "AAPL", "COST", "WMT", "KR", "JPM", 
          "BAC", "HSBC"]
portfolio = combine_stocks(stocks)

                 MRNA
Date                 
2019-09-16  17.049999
2019-09-17  17.660000
2019-09-18  17.780001
2019-09-19  17.900000
2019-09-20  18.070000
                  PFE
Date                 
2019-09-16  34.943073
2019-09-17  34.629982
2019-09-18  34.516129
2019-09-19  34.639469
2019-09-20  34.810246
                   JNJ
Date                  
2019-09-16  129.539993
2019-09-17  129.669998
2019-09-18  130.410004
2019-09-19  130.110001
2019-09-20  131.649994
                  GOOGL
Date                   
2019-09-16  1231.630005
2019-09-17  1229.880005
2019-09-18  1232.650024
2019-09-19  1238.750000
2019-09-20  1229.839966
                    FB
Date                  
2019-09-16  186.220001
2019-09-17  188.080002
2019-09-18  188.139999
2019-09-19  190.139999
2019-09-20  189.929993
                 AAPL
Date                 
2019-09-16  54.974998
2019-09-17  55.174999
2019-09-18  55.692501
2019-09-19  55.240002
2019-09-20  54.432499
                  COST
Date                  
20

We now have a single dataframe of returns for our stocks. Let’s write this dataframe to a `csv` so we can easily read in the data without repeatedly having to pull it using the Pandas-Datareader. 

In [13]:
portfolio.to_csv("portfolio.csv", index=False)

In [14]:
# let’s read in our csv:

portfolio = pd.read_csv("portfolio.csv")

### Mean Variance Optimization

Now we are ready to implement the mean variance optimization method to construct our portfolio. Let’s start by installing the `PyPortfolioOpt` library:

In [17]:
!pip install PyPortfolioOpt

Collecting PyPortfolioOpt
  Downloading PyPortfolioOpt-1.4.2-py3-none-any.whl (60 kB)
[?25l[K     |█████▍                          | 10 kB 18.3 MB/s eta 0:00:01[K     |██████████▉                     | 20 kB 23.8 MB/s eta 0:00:01[K     |████████████████▎               | 30 kB 12.9 MB/s eta 0:00:01[K     |█████████████████████▋          | 40 kB 9.9 MB/s eta 0:00:01[K     |███████████████████████████     | 51 kB 5.5 MB/s eta 0:00:01[K     |████████████████████████████████| 60 kB 3.5 MB/s 
Collecting cvxpy<2.0.0,>=1.1.10
  Downloading cvxpy-1.1.15-cp37-cp37m-manylinux_2_24_x86_64.whl (2.7 MB)
[K     |████████████████████████████████| 2.7 MB 10.6 MB/s 
Installing collected packages: cvxpy, PyPortfolioOpt
  Attempting uninstall: cvxpy
    Found existing installation: cvxpy 1.0.31
    Uninstalling cvxpy-1.0.31:
      Successfully uninstalled cvxpy-1.0.31
Successfully installed PyPortfolioOpt-1.4.2 cvxpy-1.1.15


Now, let’s calculate the covariance matrix and store the calculated returns in variables `S` and `mu`, respectively:

In [18]:
from pypfopt.expected_returns import mean_historical_return
from pypfopt.risk_models import CovarianceShrinkage


mu = mean_historical_return(portfolio)
S = CovarianceShrinkage(portfolio).ledoit_wolf()

Next, let’s import the `EfficientFrontier` module and calculate the weights. Here, we will use the max Sharpe statistic. The Sharpe ratio is the ratio between returns and risk. The lower the risk and the higher the returns, the higher the Sharpe ratio. The algorithm looks for the maximum Sharpe ratio, which translates to the portfolio with the highest return and lowest risk. Ultimately, the higher the Sharpe ratio, the better the performance of the portfolio. 

In [19]:
from pypfopt.efficient_frontier import EfficientFrontier

ef = EfficientFrontier(mu, S)
weights = ef.max_sharpe()

cleaned_weights = ef.clean_weights()
dict(cleaned_weights)

{'AAPL': 0.16967,
 'BAC': 0.0,
 'COST': 0.0,
 'FB': 0.0,
 'GOOGL': 0.29459,
 'HSBC': 0.0,
 'JNJ': 0.0,
 'JPM': 0.0,
 'KR': 0.04698,
 'MRNA': 0.48875,
 'PFE': 0.0,
 'WMT': 0.0}

In [20]:
ef.portfolio_performance(verbose=True)

Expected annual return: 225.7%
Annual volatility: 44.6%
Sharpe Ratio: 5.02


(2.256796070833841, 0.44569641863511517, 5.018653902770244)

Finally, let’s convert the weights into actual allocations values (i.e., how many of each stock to buy). For our allocation, let’s consider an investment amount of `$100,000`:

In [21]:
from pypfopt.discrete_allocation import DiscreteAllocation, get_latest_prices

latest_prices = get_latest_prices(portfolio)

da = DiscreteAllocation(weights, latest_prices, total_portfolio_value=100000)

allocation, leftover = da.greedy_portfolio()
print("Discrete allocation:", allocation)
print("Funds remaining: ${:.2f}".format(leftover))

Discrete allocation: {'MRNA': 112, 'GOOGL': 10, 'AAPL': 113, 'KR': 114}
Funds remaining: $928.79


Our algorithm says we should invest in
* 112 shares of MRNA, 
* 10 shares of GOOGL, 
* 113 shares of AAPL and 
* 114 shares of KR. 

We see that our portfolio performs with an expected annual return of 225 percent. This performance is due to the rapid growth of Moderna during the pandemic. Further, the Sharpe ratio value of 5.02 indicates that the portfolio optimization algorithm performs well with our current data. Of course, this return is inflated and is not likely to hold up in the future. 

**Mean variance optimization** doesn’t perform very well since it makes many simplifying assumptions, such as returns being normally distributed and the need for an invertible covariance matrix. Fortunately, methods like **HRP and mCVAR** address these limitations. 

### Hierarchical Risk Parity (HRP)

The HRP method works by finding subclusters of similar assets based on returns and constructing a hierarchy from these clusters to generate weights for each asset. 

Let’s start by importing the HRPOpt method from Pypfopt:

In [22]:
from pypfopt import HRPOpt

We then need to calculate the returns:

In [23]:
returns = portfolio.pct_change().dropna()

Then run the optimization algorithm to get the weights:

In [24]:
hrp = HRPOpt(returns)
hrp_weights = hrp.optimize()

We can now print the performance of the portfolio and the weights:

In [25]:
hrp.portfolio_performance(verbose=True)
print(dict(hrp_weights))

Expected annual return: 24.5%
Annual volatility: 20.0%
Sharpe Ratio: 1.12
{'AAPL': 0.06750181837588995, 'BAC': 0.032702474761202666, 'COST': 0.1079310520011364, 'FB': 0.03796857436543153, 'GOOGL': 0.05412809467361964, 'HSBC': 0.12835763886902124, 'JNJ': 0.1698990935504296, 'JPM': 0.05526437117766945, 'KR': 0.1512794310088098, 'MRNA': 0.020278330646296628, 'PFE': 0.07963655595514835, 'WMT': 0.09505256461534473}


We see that we have an expected annual return of 24.5 percent, which is significantly less than the inflated 225 percent we achieved with mean variance optimization. We also see a diminished Sharpe ratio of 1.12.  This result is much more reasonable and more likely to hold up in the future since HRP is not as sensitive to outliers as mean variance optimization is. 

Finally, let’s calculate the discrete allocation using our weights:

In [26]:
da_hrp = DiscreteAllocation(hrp_weights, latest_prices, total_portfolio_value=100000)

allocation, leftover = da_hrp.greedy_portfolio()
print("Discrete allocation (HRP):", allocation)
print("Funds remaining (HRP): ${:.2f}".format(leftover))

Discrete allocation (HRP): {'JNJ': 102, 'KR': 368, 'HSBC': 499, 'COST': 23, 'WMT': 65, 'PFE': 178, 'AAPL': 45, 'JPM': 35, 'GOOGL': 2, 'FB': 10, 'BAC': 81, 'MRNA': 5}
Funds remaining (HRP): $9.54


We see that our algorithm suggests we invest heavily into 
* Kroger (KR), 
* HSBC, 
* Johnson & Johnson (JNJ) and 
* Pfizer (PFE) and not, as the previous model did, so much into Moderna (MRNA). 

Further, while the performance decreased, we can be more confident that this model will perform just as well when we refresh our data. This is because HRP is more robust to the anomalous increase in Moderna stock prices. 

### Mean Conditional Value at Risk (mCVAR)

The mCVAR is another popular alternative to mean variance optimization. It works by measuring the worst-case scenarios for each asset in the portfolio, which is represented here by losing the most money. The worst-case loss for each asset is then used to calculate weights to be used for allocation for each asset. 

Let’s import the `EEfficicientCVAR` method:

In [27]:
from pypfopt.efficient_frontier import EfficientCVaR

Calculate the weights and get the performance:

In [28]:
S = portfolio.cov()
ef_cvar = EfficientCVaR(mu, S)
cvar_weights = ef_cvar.min_cvar()

cleaned_weights = ef_cvar.clean_weights()
print(dict(cleaned_weights))

{'MRNA': 0.0, 'PFE': 0.0, 'JNJ': 0.0, 'GOOGL': 0.00646, 'FB': 0.0, 'AAPL': 0.0, 'COST': 0.0, 'WMT': 0.0, 'KR': 0.0, 'JPM': 0.99354, 'BAC': 0.0, 'HSBC': 0.0}


Next, get the discrete allocation:

In [29]:
da_cvar = DiscreteAllocation(cvar_weights, latest_prices, total_portfolio_value=100000)

allocation, leftover = da_cvar.greedy_portfolio()
print("Discrete allocation (CVAR):", allocation)
print("Funds remaining (CVAR): ${:.2f}".format(leftover))

Discrete allocation (CVAR): {'JPM': 628, 'MRNA': 1, 'JNJ': 1}
Funds remaining (CVAR): $75.64


  current_weights /= current_weights.sum()


We see that this algorithm suggests we invest heavily into JP Morgan Chase (JPM) and also buy a single share each of Moderna (MRNA) and Johnson & Johnson (JNJ). Also we see that the expected return is **15.5 percent**. As with HRP, this result is much more reasonable than the inflated **225 percent** returns given by mean variance optimization since it is not as sensitive to the anomalous behaviour of the Moderna stock price. 

### Optimize Your Portfolio

Although we only considered healthcare, tech, retail and finance, the methods we discussed can easily be modified to consider additional industries. For example, maybe we are more interested in constructing a portfolio of companies in the energy, real estate and materials industry. An example of this sort of portfolio could be made up of stocks such as Exxonmobil (XOM), DuPont (DD), and American Tower (AMT). We encourage you to play around with different sectors in constructing your portfolio. 

What we discussed provides a solid foundation for those interested in portfolio optimization methods in Python. Having a knowledge of both the methods and the tools available for portfolio optimization can allow quants and data scientists to run quicker experiments for optimizing investment portfolio.