## Overview

In this file, there are two different optimization techniques for determining the weights of a portfolio. One technique is one I created and the other is using a library that uses the efficient frontier to find the weights.

The first is using a method that I created using the "scipy.optimize" library. I obtained the relevant data from Yahoo Finance, cleaned and organized into the correct format, defined the objective function, found all of the parameters from the data, and then optimized. From the weights, I then found the sharpe ratio of the portfolio.

The second uses "pypfopt.efficient_frontier", which includes a built-in function to determine the optimal weights that give you the portfolio with the highest sharpe ratio. This works great and is how the optimization will be performed in this project.

The two techniques are very similar, but there are some subtle differences. Here are some of the differences:
- For the first one, it does not exactly find the result with the highest sharpe ratio. This is because you have to define a risk level factor, which I set to 1. This can also be thought of as a required return as this impacts the overall portfolio return.
- Since the first technique is the one I created, it is able to be customized more than the efficient frontier technique. I would be able to more easily add a value-at-risk measure, use a risk parity approach, implement nested clustered optimization, use the Black Litterman model, etc. with the first approach.

For the purposes of this assignment, we are going to continue with the second technique. This method is exactly what we are looking for as it allows us to easily implement the Markowitz approach and perform the necessary analysis on it. It will allow us to choose the most optimal portfolio that has the highest mean relative to the variance.

### Now to discuss the theory:
After performing significant research about Markowitz optimization, I have learned a lot about how it works and potential issues. The Markowitz Approach optimized a model by choosing the weights that have the highest sharpe ratio. There are many different modifications such as shorting stocks and taking into account transaction costs, as well as optimizing solely to minimize risk.

The model was created in 1952 and won the Nobel Prize for Economics in 1990. It has been used widespread in the financial optimization and portfolio construction world. The model provides a quantitative and logically sound way to construct portfolios. It has many applications and modifications. It works by figuring out the risk factor, which is the variance of the portfolio, as well as the estimated portfolio return for each security that is being considered. The optimization will evaluate the different possible portfolio allocations and choose the one with the highest return relative to its risk, in other words, it will choose the one with the highest sharpe ratio. The other conditions are that the sum of the portfolio weights has to be 1 and that each weight is between 0 and 1 (which I narrowed to 0 and 0.5), although it is possible to have an allocation less than 0 to take into account shortselling. It is incredbily important to have the correct data in the right format as the model outputs are only as good as the inputs.

The model appears to be simple, but under the hood, figuring out the correct risk measure and combining that to maximize the sharpe ratio gets very confusing. This is a quadratic optimization problem, which can be very difficult to solve even with the correct software. There are different tricks by introducing auxilliary variables in the model, but in the model that I created, I made a slightly simpler approach due to the complexity of the model. The model I made still maximizes return relative to risk, but it uses a predefined risk aversion level, so based on your opinions the model may vary very slightly. Overall, the model does have a relatively high sharpe ratio, it is just not necessarily the highest possible value. The second model does find the highest possible sharpe ratio value, which is why we decided to use that in this project.

### My Progress
Throughout this project, I spent a significant amount of time doing research about the model before I actually dived into the actual implementation. It was incredibly important to understand the theory so that I thoroughly understood what I was doing. I read through various publications and also spoke to professionals in the field for advice. There were many different websites, articles, journals, textbooks, etc. that I read through that mentioned portfolio construction. I even spoke with Reha Tutuncu, who is on the Industry Advisory Council and is very experienced with portfolio construction (he even wrote the textbook the we are using the the financial optimization class). He recommended some great resources and provided great advice.

After obtaining a significant knowledge about the Markowitz model and its potential modifications, I constructed my own model in Python. I understood the theory, so that was now the difficult aspect, but it took a significant amount of time to figure out how to construct the model in Python. There were specific aspects in Python that I had to adopt to that I was not expecting, such as only being able to minimize an optimization problem, understanding how to optimize using the "scipy.optimize" library, and having the data in the right format while using the correct Python syntax. After I figured all of this out, I was able to easily construct the optimization model.

The way I created the model, it works for any given stock, and all you have to do is indicate which securities you want to include in the code below as well as the start and end dates for the portfolio construction. The code works for any number stocks as long as they have stock price data for the entirety of the dates indicated. The model will also determine the portfolio returns based on the optimal weights and will give you the sharpe ratio as well.

Through my various research, I have also learned not to make the common mistakes, such as having a misalignment between the return and risk models. The return and risk models need to use the same data in the same time period or else the model will be off. In addition, when using the model to rebalance, it is important to check the outputs and make sure the optimal weights have not changes too much as that could lead to large transaction costs. Making sure the portfolio is reasonable is very important.

After constructing my own model, we wanted to take it one step further to make sure that the maximum sharpe ratio is used. We discovered the package "pypfopt.efficient_frontier", which has a built-in function to find the maximum sharpe ratio. You do not need to worry about having the optimization in the right form as this will do the optimization for you.

Using this second model, the optimal portfolio weights are determined for the different predicted values. The model has three stocks, which are AAPL, COKE, and GOOGL. There are three different predictions used, and we determine the portfolio returns based on the optimal weights to maximiuze the sharpe ratio, and we compare each one with the actual returns so we can check to see how each one works.

We can also graph the different models compared to the actual results, as we are also backtesting with historical data. We can see which predictions are best and use those when we optimize in the future for other portfolios.

### Moving Forward
There are some potential issues with the model, which are potential areas to improve upon.
- The model works best if the securities are not correlated, but this is rarely the case in practice. Most stock prices move together, and this will shift the model towards choosing a higher allocation of those assets as it would results in lower variance, which is the risk measure.
    - Nested Clustered Optimzation aims to minimize this risk by first separating the securities into clusters. The model will choose an allocation among the each cluster, and then it will provide allocations within each security within each cluster. This way, it will more efficiently allocate among different industries.
- The model does not differentiate among the different distributions of the returns. Let us consider one stock that has returns that are skewed to the left meaning that it is possible for very large negative returns and another stock has returns that are skewed to the right meaning that it is possible for very large positive returns. We would prefer the stock that does not have the negative returns as most investors are risk averse. However, if these two stocks have returns that are otherwise the same, then the model would not differentiate between the two.
    - This can be taken into account by adding a value-at-risk (VaR) measure or a conditional value-at-risk (CVaR) measure. These will figure out a cutoff for the returns and will only take into account those returns that are above the cutoff. The model will then be able to more appropriately choose among the returns and choose the stock that has a higher cutoff, meaning it will choose the stock with the returns that are more positive.
- The model is uses entirely quantitative ideology based on the numerical returns. These values can be hard to predict, and in this case it is not an issue as there was a very large component dedicated towards estimating the returns. However, it would be nice to be able to take into account your own personal views on each stock. It would add a qualitative component to the model.
    - This is accomplished by implementing a Black-Litterman Model. This is a different model than the Markowitz Model, but it would be interesting to compare results and see how the sharpe ratio differs.

In this project, I chose to keep the Markowitz Model simple when finding the optimal weights. As I was performing research, I had many different directions to take the model and I was excited to be able to add a value-at-risk measure to improve on the model. When I got to the actual implementation, I realized how difficult it actually is to construct the model. I was able to make the basic model, and once I figured out the correct syntax it was not that difficult. However, adding a value-at-risk measure or implementing nested clustered optimization are very complicated techniques that require a very extensive knowledge of not only the theory, but also knowledge of Python as it relates to optimization. There is also a machine learning aspect, and overall, it exceeded what I was able to create. I was planning on performing these extra steps, but I  realized that I did not necessarily have the knowledge required to do so, as I would need to strong understanding of machine learning, among other things, as well. The most difficult part is the implementation in Python as I do believe I understand each of these modifications in theory, but I ran into much difficulty when it came to the implementation.

If anyone works on this project further, which I hope is the case, I would recommend them to explore these ideas. I already have the basic Markowitz Model created, but if someone has the additional knowledge to tackle these modifications, that would help expand on the project. Having taking ISE 447:Financial Optimization would be very helpful for the project.

In [1]:
import numpy as np
import pandas as pd
#import matplotlib.pyplot as plt
#from scipy.stats import norm
#from scipy.stats import linregress
#import statsmodels.formula.api as smf
import csv
import scipy.optimize as opt
import pandas_datareader as pdr
from datetime import datetime

In [2]:
# function to read in data from a csv
def readPricesCSV(file):
    df = pd.read_csv(file)
    dfPrices = df["Adj Close"]
    return dfPrices

In [3]:
# function to read in stock prices from yahoo finance
def readPricesYF(start_date, end_date, tickers):
    stockPrices = pdr.get_data_yahoo(tickers, start=start_date, end=end_date)   # read in the data
    stockPrices = stockPrices.filter(like='Adj Close')   # tickers are columns and data values are adjusted closing prices with the date as the index
    stockPrices.columns = tickers   # change column names to be their tickers
    stockPrices = stockPrices.stack().swaplevel().sort_index().reset_index()    # make the data tall and clean it up so it is easier to perform analysis on
    stockPrices.columns = ['Firm','Date','Adj Close']     # rename the columns
#    stockPrices['Return'] = stockPrices.groupby('Firm')['Adj Close'].pct_change()    # get the daily returns for each ticker
    return stockPrices


In [59]:
# the objective function
def obj(weights):
    
    sum_return = np.dot(weights,means)     #get the return measure given by the weights and mean
    num_stocks = len(means)
    
    #get the risk measure given by the weights
    risk_measure = np.matmul(np.matmul(np.transpose(np.array(weights)), cov_matr), np.array(weights))
#    print("risk measure: ", risk_measure)
    
    #return the objective function value
    return  -(sum_return - (risk_level * risk_measure))


### To Change:
**The following block of code is the only one in this file that should be changed.**
- Add more stocks by including them in the list of tickers
- Change the starting and ending dates

**If there are any changes made, rerun the entire code after making the changes**

**Nothing else should be changed in this file**

In [5]:
#choose which stocks to include in the optimization of the portfolio (list the tickers)
tickers = ['MSFT', 'AAPL', 'AMZN', 'NFLX', 'DIS', 'TSLA']
tickers = sorted(tickers)

#set the start and end dates- choose last six months in this case
start_date = datetime(2020, 8, 25) #year, month, day
end_date = datetime(2021,2,25) #year, month, day

stockPrices = readPricesYF(start_date, end_date, tickers)    # use the function defined above to get data from Yahoo Finance
#stockPrices['ret'] = stockPrices['Adj Close'].pct_change()
stockPrices

Unnamed: 0,Firm,Date,Adj Close
0,AAPL,2020-08-25,124.213104
1,AAPL,2020-08-26,125.902283
2,AAPL,2020-08-27,124.397202
3,AAPL,2020-08-28,124.195694
4,AAPL,2020-08-31,128.407440
...,...,...,...
757,TSLA,2021-02-19,781.299988
758,TSLA,2021-02-22,714.500000
759,TSLA,2021-02-23,698.840027
760,TSLA,2021-02-24,742.020020


In [6]:
#stockPrices

In [6]:
# reformat the data
stock_prices_format = stockPrices.set_index(['Date', 'Firm']).unstack()   # change format so that date and each company are columns
stock_prices_format.columns = tickers
stock_prices_pct_change = stock_prices_format.pct_change()[1:]
stock_prices_pct_change

Unnamed: 0_level_0,AAPL,AMZN,DIS,MSFT,NFLX,TSLA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-08-26,0.013599,0.028496,0.018414,0.021620,0.116087,0.064166
2020-08-27,-0.011954,-0.012159,0.011726,0.024554,-0.038829,0.039746
2020-08-28,-0.001620,0.000529,0.013535,0.010283,-0.004522,-0.011323
2020-08-31,0.033912,0.014451,-0.027077,-0.014766,0.010823,0.125689
2020-09-01,0.039833,0.013956,0.012740,0.007715,0.050967,-0.046697
...,...,...,...,...,...,...
2021-02-19,0.001234,-0.023535,0.003552,-0.011567,-0.014593,-0.007722
2021-02-22,-0.029799,-0.021281,0.044160,-0.026808,-0.011921,-0.085499
2021-02-23,-0.001111,0.004326,0.027795,-0.005288,0.023174,-0.021917
2021-02-24,-0.004052,-0.010947,0.002131,0.005487,0.013293,0.061788


In [7]:
# get the different number of stocks and the number of stock prices 
num_stocks = len(tickers)   # the number of stocks
prices_per_stock = stock_prices_format.shape[0]   # the number of stock prices

In [8]:
# get the covariance matrix
cov_matr = np.cov(np.array(stock_prices_pct_change.iloc[:,range(0,num_stocks)].T))   # get the columns and transpose it so it is in right format, then turn it into covariance matrix
#corr_matr = stock_prices_pct_change.corr()
cov_matr

array([[ 5.59896290e-04,  3.73559102e-04,  4.32429540e-05,
         3.05618601e-04,  3.38402256e-04,  5.54343292e-04],
       [ 3.73559102e-04,  4.41159509e-04,  5.04481211e-05,
         3.06976679e-04,  3.98375581e-04,  4.11422252e-04],
       [ 4.32429540e-05,  5.04481211e-05,  5.77167770e-04,
         8.61969935e-05,  1.64317959e-05, -3.94774403e-05],
       [ 3.05618601e-04,  3.06976679e-04,  8.61969935e-05,
         3.32178259e-04,  2.66808739e-04,  3.76651700e-04],
       [ 3.38402256e-04,  3.98375581e-04,  1.64317959e-05,
         2.66808739e-04,  9.12232700e-04,  4.15956589e-04],
       [ 5.54343292e-04,  4.11422252e-04, -3.94774403e-05,
         3.76651700e-04,  4.15956589e-04,  2.22980118e-03]])

In [9]:
# get the mean price for each stock

means = np.array(stock_prices_pct_change.mean().to_list())    # get means for each column, convert to a list, convert to a numpy array
means

array([ 5.71133153e-05, -4.98897312e-04,  3.34623847e-03,  6.51091710e-04,
        1.30126183e-03,  5.28170546e-03])

In [65]:
# set the risk level
risk_level = 1   # risk loving < 0; risk neutral = 0; risk averse > 0

In [11]:
# set an intial value for the weights
weights = np.array([1/num_stocks]*num_stocks) # set an initial value for the weights, which is an even composition

### At this point, we have the following information:
 - **"tickers"** is a list of the tickers, and it is in alphabetical order
 - **"num_stocks"** is the number of different stocks
 - **"prices_per_stock"** is the different number of observations/the different number of stock prices obtained

For the Optimization:
 - **"cov_matr"** is the covariance matrix
 - **"means"** is a numpy array that consists of the means of each stock price during the time frame **(in alphabetical order of the tickers)**
 - **"risk_level"** is the risk level
 - **"weights"** is a numpy array that consists of the initial value for the weights, which is just an even composition and will be changed later **(in alphabetical order of the tickers)**

In [66]:
# perform the optimization!
lin_constr = opt.LinearConstraint([1]*num_stocks, [1], [1])    # sum of all values are greater than or equal to 1 and less than or equal to 1, so the sum has to be equal to 1
bounds = opt.Bounds([0]*num_stocks, [0.5]*num_stocks)    # each portfolio weight is greater than 0 and less than 0.5
result = opt.minimize(obj, x0=[1/num_stocks]*num_stocks, method="trust-constr", constraints = lin_constr, bounds=bounds)    # actually perform the optimization


optimal_weights = pd.DataFrame({'stock': tickers, 'weights': result.x.tolist()})
optimal_weights
#sum(optimal_weights['weights'])
np.dot(optimal_weights["weights"],means) - (np.matmul(np.matmul(np.transpose(np.array(optimal_weights["weights"])), cov_matr), np.array(optimal_weights["weights"])))

print(-1*result.fun) 
optimal_weights
#print(result.x) 

0.0035937398363597435


Unnamed: 0,stock,weights
0,AAPL,0.00265
1,AMZN,0.002229
2,DIS,0.491988
3,MSFT,0.003388
4,NFLX,0.005351
5,TSLA,0.494393


In [67]:
opt_wghts = np.array(optimal_weights.set_index('stock')).flatten()

weighted_rets = opt_wghts * stock_prices_pct_change
weighted_rets

port_rets = weighted_rets.sum(axis=1)#axis =1 tels pandas to add the rows 
print(port_rets)

mean_ret = port_rets.mean()
std_ret = port_rets.std()
sharpe = mean_ret/std_ret
print("\n\n", sharpe, "  ", mean_ret, "  ", std_ret)


Date
2020-08-26    0.041577
2020-08-27    0.025236
2020-08-28    0.001068
2020-08-31    0.048948
2020-09-01   -0.016383
                ...   
2021-02-19   -0.002237
2021-02-22   -0.020825
2021-02-23    0.002952
2021-02-24    0.031651
2021-02-25   -0.056419
Length: 126, dtype: float64


 0.16455298468054466    0.004265757747744657    0.02592330826466622


Next steps:
- Tweak the model to use the predicted prices instead of historical prices
- Test to see how well the portfolio works
    - Active share
        - take proportion from an existing index and tweak weights
    - Compare to performance of S&P 500 or something like that
    - Can create a graph to visualize the data
    - Calculate the information ratio
- Figure out how to add a Conditional Value at Risk measure
- Explore Nested Clustered Optimization
- Could incorporate elements from stochastic calculus about estimating stock prices

In [28]:
#!pip install PyPortfolioOpt #need to install C++ first by doing (xcode-select --install) in terminal 

#LINK to article: https://towardsdatascience.com/automating-portfolio-optimization-using-python-9f344b9380b9#

In [23]:
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models
from pypfopt import expected_returns
from pypfopt.cla import CLA
from pypfopt import plotting
from matplotlib.ticker import FuncFormatter
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader as web
from matplotlib.ticker import FuncFormatter

In [24]:
nullin_df = pd.DataFrame(stock_prices_format)
print(nullin_df.isnull().sum()) #no null values (good!)
 

AAPL    0
AMZN    0
DIS     0
MSFT    0
NFLX    0
TSLA    0
dtype: int64


In [25]:
#Annualized Return
mu = expected_returns.mean_historical_return(stock_prices_format)
#Sample Variance of Portfolio
Sigma = risk_models.sample_cov(stock_prices_format)


#mu = stock_returns.mean()    # get means for each column, convert to a list, convert to a numpy array
#print(exp_returns)

#Sigma = np.cov(np.array(stock_returns.iloc[:,range(0,len(tickers))].T))   # get the columns and transpose it so it is in right format, then turn it into covariance matrix
#Sigma = pd.DataFrame(cov_matr,columns=tickers)



In [26]:
print(mu,"\n\n",Sigma)
#Sigma
#mu


AAPL   -0.054438
AMZN   -0.165441
DIS     1.165177
MSFT    0.130074
NFLX    0.241877
TSLA    1.842177
dtype: float64 

           AAPL      AMZN       DIS      MSFT      NFLX      TSLA
AAPL  0.141094  0.094137  0.010897  0.077016  0.085277  0.139695
AMZN  0.094137  0.111172  0.012713  0.077358  0.100391  0.103678
DIS   0.010897  0.012713  0.145446  0.021722  0.004141 -0.009948
MSFT  0.077016  0.077358  0.021722  0.083709  0.067236  0.094916
NFLX  0.085277  0.100391  0.004141  0.067236  0.229883  0.104821
TSLA  0.139695  0.103678 -0.009948  0.094916  0.104821  0.561910


In [19]:
# #why doesn't this match up?
# stock_firm = stockPrices.groupby(['Firm'])['ret'].mean()
# stock_firm

In [27]:
#Max Sharpe Ratio - Tangent to the EF
ef = EfficientFrontier(mu, Sigma, weight_bounds=(0,0.5)) #weight bounds in negative allows shorting of stocks
sharpe_pfolio=ef.max_sharpe() #May use add objective to ensure minimum zero weighting to individual stocks
sharpe_pwt=ef.clean_weights()
print(sharpe_pwt)

OrderedDict([('AAPL', 0.0), ('AMZN', 0.0), ('DIS', 0.5), ('MSFT', 0.00984), ('NFLX', 0.09967), ('TSLA', 0.39049)])


In [28]:
sharpe_pfolio

OrderedDict([('AAPL', 0.0),
             ('AMZN', 0.0),
             ('DIS', 0.5),
             ('MSFT', 0.0098416639777681),
             ('NFLX', 0.0996682170432335),
             ('TSLA', 0.3904901189789986)])

In [29]:
optimal_weights

Unnamed: 0,stock,weights
0,AAPL,0.00265
1,AMZN,0.002229
2,DIS,0.491988
3,MSFT,0.003388
4,NFLX,0.005351
5,TSLA,0.494393


In [30]:
stock_returns = stock_prices_format.pct_change()[1:]
stock_returns

weighted_returns = weights * stock_returns
stock_returns

Unnamed: 0_level_0,AAPL,AMZN,DIS,MSFT,NFLX,TSLA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-08-26,0.013599,0.028496,0.018414,0.021620,0.116087,0.064166
2020-08-27,-0.011954,-0.012159,0.011726,0.024554,-0.038829,0.039746
2020-08-28,-0.001620,0.000529,0.013535,0.010283,-0.004522,-0.011323
2020-08-31,0.033912,0.014451,-0.027077,-0.014766,0.010823,0.125689
2020-09-01,0.039833,0.013956,0.012740,0.007715,0.050967,-0.046697
...,...,...,...,...,...,...
2021-02-19,0.001234,-0.023535,0.003552,-0.011567,-0.014593,-0.007722
2021-02-22,-0.029799,-0.021281,0.044160,-0.026808,-0.011921,-0.085499
2021-02-23,-0.001111,0.004326,0.027795,-0.005288,0.023174,-0.021917
2021-02-24,-0.004052,-0.010947,0.002131,0.005487,0.013293,0.061788


In [31]:
#port returns are the sum of the weighted returns 

port_ret = weighted_returns.sum(axis=1)#axis =1 tels pandas to add the rows 
port_ret

Date
2020-08-26    0.043730
2020-08-27    0.002181
2020-08-28    0.001147
2020-08-31    0.023839
2020-09-01    0.013086
                ...   
2021-02-19   -0.008772
2021-02-22   -0.021858
2021-02-23    0.004497
2021-02-24    0.011283
2021-02-25   -0.036111
Length: 126, dtype: float64

In [32]:
#Portfolio statistics (equal weighted)
mean_ret = port_ret.mean()
std_ret = port_ret.std()
sharpe = mean_ret/std_ret
sharpe

#std_ret
#mean_ret



0.08939518325928911

In [39]:
opt_weights = optimal_weights['weights'].to_list()
weighted_ret_opt = opt_weights * stock_returns
weighted_ret_opt

port_ret_opt = weighted_ret_opt.sum(axis=1)#axis =1 tels pandas to add the rows 
port_ret_opt

Date
2020-08-26    0.041577
2020-08-27    0.025236
2020-08-28    0.001068
2020-08-31    0.048948
2020-09-01   -0.016383
                ...   
2021-02-19   -0.002237
2021-02-22   -0.020825
2021-02-23    0.002952
2021-02-24    0.031651
2021-02-25   -0.056419
Length: 126, dtype: float64

In [40]:
#Portfolio statistics (optimally weighted)
mean_ret_opt = port_ret_opt.mean()
std_ret_opt = port_ret_opt.std()
#std_ret
#mean_ret
sharpe_opt = mean_ret_opt/std_ret_opt

mean_diff = mean_ret_opt - mean_ret #want this number to be '+'
sigma_diff = std_ret_opt - std_ret #want this number to be '-' (but doesn't have to be)

sharpe_diff = sharpe_opt - sharpe #DEFINITELY want this number to be '+'
print('mean diff: ' + str((round(mean_diff, 4))) + ' sigma diff: ' + str(round(sigma_diff,4)))
print("sharpe ratio equ weight: " + str(round(sharpe,4)) + ", sharpe ratio optimal weight: " + str(round(sharpe_opt,4)))
print('sharpe ratio diff: ' + str(round(sharpe_diff,4)))


mean diff: 0.0026 sigma diff: 0.007
sharpe ratio equ weight: 0.0894, sharpe ratio optimal weight: 0.1646
sharpe ratio diff: 0.0752


In [102]:
stock_returns

Unnamed: 0_level_0,AAPL,AMZN,DIS,MSFT,NFLX,TSLA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-08-26,0.013599,0.028496,0.018414,0.021620,0.116087,0.064166
2020-08-27,-0.011954,-0.012159,0.011726,0.024554,-0.038829,0.039746
2020-08-28,-0.001620,0.000529,0.013535,0.010283,-0.004522,-0.011323
2020-08-31,0.033912,0.014451,-0.027077,-0.014766,0.010823,0.125689
2020-09-01,0.039833,0.013956,0.012740,0.007715,0.050967,-0.046697
...,...,...,...,...,...,...
2021-02-19,0.001234,-0.023535,0.003552,-0.011567,-0.014593,-0.007722
2021-02-22,-0.029799,-0.021281,0.044160,-0.026808,-0.011921,-0.085499
2021-02-23,-0.001111,0.004326,0.027795,-0.005288,0.023174,-0.021917
2021-02-24,-0.004052,-0.010947,0.002131,0.005487,0.013293,0.061788


In [104]:
#Annualized Return
exp_rets = expected_returns.mean_historical_return(stock_prices_format)
#Sample Variance of Portfolio
cov_matrix = risk_models.sample_cov(stock_prices_format)

In [105]:
ef = EfficientFrontier(exp_rets, cov_matrix, weight_bounds=(0,0.5)) #weight bounds in negative allows shorting of stocks
sharpe_port=ef.max_sharpe() #May use add objective to ensure minimum zero weighting to individual stocks
#sharpe_port
sharpe_weights=ef.clean_weights()
sharpe_weights


OrderedDict([('AAPL', 0.0),
             ('AMZN', 0.0),
             ('DIS', 0.5),
             ('MSFT', 0.00984),
             ('NFLX', 0.09967),
             ('TSLA', 0.39049)])

In [106]:
weighted_returns = sharpe_weights * stock_returns
#weighted_returns

In [107]:
port_ret = weighted_returns.sum(axis=1)#axis =1 tels pandas to add the rows 
port_ret

Date
2020-08-26    0.046047
2020-08-27    0.017755
2020-08-28    0.001996
2020-08-31    0.036475
2020-09-01   -0.006709
                ...   
2021-02-19   -0.002808
2021-02-22   -0.012758
2021-02-23    0.007597
2021-02-24    0.026572
2021-02-25   -0.049442
Length: 126, dtype: float64

In [108]:
mean_ret = port_ret.mean()
std_ret = port_ret.std()
sharpe = mean_ret/std_ret
sharpe


0.17039838911847183

In [162]:
stocks = ['AAPL', 'COKE', 'GOOGL']
ret = pd.read_csv('exp_returns3.csv').set_index('Date')
ret

Unnamed: 0_level_0,AAPL_NN,COKE_NN,GOOGL_NN
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-01,0.060652,-0.000458,0.021834
2019-02,-0.005493,-0.002889,-0.036394
2019-03,0.082441,-0.055919,0.041015
2019-04,-0.007477,-0.093085,-0.038141
2019-05,0.044749,-0.172228,0.007834
2019-06,0.149206,0.059261,0.099788
2019-07,0.107595,0.007572,0.063157
2019-08,0.031272,-0.007347,0.063157
2019-09,0.113121,0.001431,0.063157
2019-10,0.124122,0.018626,0.063157


In [163]:
#Annualized Return
#exp_rets = expected_returns.mean_historical_return(ret)
#Sample Variance of Portfolio
cov_matrix = risk_models.sample_cov(ret)

#print(type(exp_rets))
#print(type(cov_matrix))

exp_returns = ret.mean()    # get means for each column, convert to a list, convert to a numpy array
#print(exp_returns)

cov_matr = np.cov(np.array(ret.iloc[:,range(0,len(stocks))].T))   # get the columns and transpose it so it is in right format, then turn it into covariance matrix
cov_matr = pd.DataFrame(cov_matr,columns=stocks)




In [164]:
ef = EfficientFrontier(exp_returns, cov_matr, weight_bounds=(0,0.5)) #weight bounds in negative allows shorting of stocks
sharpe_port=ef.max_sharpe() #May use add objective to ensure minimum zero weighting to individual stocks
#sharpe_port
sharpe_weights=ef.clean_weights()
sharpe_weights


OrderedDict([('AAPL_NN', 0.5), ('COKE_NN', 0.0), ('GOOGL_NN', 0.5)])

In [165]:
weighted_returns = sharpe_weights * ret
#weighted_returns

In [166]:
port_ret = weighted_returns.sum(axis=1)#axis =1 tels pandas to add the rows 
port_ret

Date
2019-01    0.041243
2019-02   -0.020944
2019-03    0.061728
2019-04   -0.022809
2019-05    0.026291
2019-06    0.124497
2019-07    0.085376
2019-08    0.047215
2019-09    0.088139
2019-10    0.093640
dtype: float64

In [167]:
mean_ret = port_ret.mean()
std_ret = port_ret.std()
sharpe = mean_ret/std_ret
sharpe


1.0799436849133355

Some notes after speaking with Reha Tutuncu:
- look up paper
    - about optimization
    - http://faculty.london.edu/avmiguel/DeMiguel-Garlappi-Uppal-RFS.pdf
    - enhanced portfolio optimization; author (Patterson, Arty Levine)
        - one issue with mean-variance optimization is if the correlations are over estimated, then the optimization takes advantage of that, and that reallocates to the pairs with high correlation (on the positive side of negative side)
        - shrink them towards 0 and put more emphasis on correlations 
    - nested clustered optimization
        - may be useful with multi asset class optimization- useful with bonds and stuff
        - may not be the most useful if just equities
    - mistakes
        - a misalignment between risk model and return model- portfolios may have very large rates as optimization may think there is arbitrage
        - make sure to review optimal portfolio weights and how they change from one time period to the next- keep an eye out for big changes as transaction costs could be big; make sure numbers are reasonable
        - look at results and make sure they do not change too much