# Vanguard Data Analytics Worksheet
# Portfollio Asset Allocation Optimization code


**Christian Rivera**
- Github: https://github.com/crivera2013

### Given a list of securities in a portfolio and a time period, find the optimal allocations of stocks to maximize the Sharpe ratio of the portfolio. 
**Example Portfolio**

| NIKE | GOOGLE | JP MORGAN| NVIDIA |
|-----|--------|----------|--------|
|   25.4%|23.6%|30.4%  |20.6%|



#### We are going to build this:

In [None]:
# run me
from PortAllocation import *
allocations , cum_return, avg_daily_return, stddev, sharpe_ratio = AllocationSample()


### Definitions
#### Sharpe Ratio 
A measure of risk adjusted return. For simplicity, we assume a risk free return of 0 in this exercise. 

$$\text{Sharpe Ratio} = \frac{R_p - R_f}{\sigma_p} $$ 


$$ R_p = \text{expected portfolio return } $$
$$ R_f = \text{risk free rate} $$
$$ \sigma_p = \text{portfolio standard deviation  }$$

#### Volatility 
Measured as the standard deviation of daily returns.  Portfolio volatility is the square root of the expected portfolio variance.

$$ \sigma_p = \sqrt{\sigma_p^2} $$

#### Covariance
The measure of the directional relationship between the returns on two  assets.  The covariance of asset **i** and asset **j**:

$$ \sigma_{ij} = \sum(r_i - \mu_i)(r_j - \mu_j) $$

#### Portfolio Variance
For an portfolio consisting of **N** securities has a **N**x**N** covariance matrix and an array of weight allocations **w** (**w<sup>T</sup>** is the transpose of **w**):

$$ \text{Portfolio Variance} = \sigma_p^2 =  w^T\Sigma w = w^T\begin{pmatrix}
  \sigma_{1,1} & \sigma_{1,2} & \cdots & \sigma_{1,n} \\
  \sigma_{2,1} & a_{2,2} & \cdots & \sigma_{2,n} \\
  \vdots  & \vdots  & \ddots & \vdots  \\
  \sigma_{m,1} & \sigma_{m,2} & \cdots & \sigma_{m,n} 
 \end{pmatrix} w $$

** 3 asset Portfolio Example:**

$$ \sigma_p^2 = \begin{bmatrix} 0.3 & 0.4 & 0.3 \end{bmatrix} \bullet \begin{pmatrix}  \begin{bmatrix} 0.1 & 0.2 & 0.3 \\
0.4 & 0.5 & 0.4 \\ 0.3 & 0.2 & 0.1 \end{bmatrix} \bullet \begin{bmatrix} 0.3 \\ 0.4 \\ 0.3 \end{bmatrix} \end{pmatrix}$$

$$ \sigma_p^2 = \begin{bmatrix} 0.3 & 0.4 & 0.3 \end{bmatrix} \bullet \begin{bmatrix} 0.2 \\ 0.44 \\ 0.2 \end{bmatrix} $$

$$ \sigma_p^2 = 0.296 $$

### With all the math notation out of the way,  lets get coding!

### Import the packages

In [None]:
import numpy as np  # that linear algebra package
import pandas as pd # columns and rows + general data analysis
import datetime as dt # creating datetime variables
import matplotlib.pyplot as plt  # visualizations
import scipy.optimize as spo  # scientific library with optimizer functions


### Define a date range you want to use

In [None]:
# use a the datetime module to create two datetime variables.
# a starting date for January 1st 2010 and an end date of January 1st 2011

starting_date = "2009-06-01"
ending_date = "2010-06-01"

#Don't touch below code
###########################
print(starting_date)
print(type(starting_date))

In [None]:
# Create a pandas series of dates
dates = pd.date_range(starting_date,ending_date)

#Don't touch below code
###########################
print(dates)

### Grab data
In a Production environment, this would come from a SQL database or from a 3rd party product, but the basic principle is the same

In [None]:
def get_stock_data(symbols, dates, addSPY=True, colname = 'Adj Close'):
    """Read stock data (adjusted close) for given symbols from CSV files"""
    
    df = pd.DataFrame(index=dates)
    if addSPY and 'SPY' not in symbols:
        symbols = ['SPY'] + symbols
    
    for symbol in symbols:
        path = "stocks/{}.csv".format(str(symbol))
        df_temp = pd.read_csv(path, index_col='Date',
                              parse_dates=True,
                              usecols=['Date',colname], na_values=['nan'])
        df_temp = df_temp.rename(columns={colname:symbol})
        df = df.join(df_temp)
        if symbol =='SPY':
            df = df.dropna(subset=['SPY'])
    return df


In [None]:
syms = ['NKE','GOOG','JPM', 'NVDA']

stocks_all = get_stock_data(syms, dates)
stocks_all.head(3)

In [None]:
# filter out the SPY column

stocks = stocks_all[ syms  ]

In [None]:
# create a 1-D numpy array of evenly distributed allocation weights.  ex: 2 syms = [0.5,0.5] 

initial_weights = [1 / len(syms)] * len(syms)

#Don't touch below code
###########################
weights = np.array(initial_weights)
print(weights)

In [None]:
# we are going to normalize the returns to zero using natural logarithm
# ln( price_today / price_yesterday )
# probably want to use "np.log" from numpy and something called "shift" in pandas.

normalized_returns = 


#Don't touch below code
###########################
normalized_returns.head(3)

In [None]:
# find the expected portfolio return given daily (252) calculation

exp_port_return = np.sum(normalized_returns.mean() * weights) * 252

###########################
print(exp_port_return)

In [None]:
# Find the portfolio variance of the portfolio.  See math equation above for reference.
# "np.dot" is the function to call the dot product matrix multiplication function
# " dataframe.cov()" returns the covariance of a given matrix

# weights^T dot_product (covariance of normalized returns * 252 trading days dot_product weights)

variance = 

###########################
print(variance)

In [None]:
# Find the portfolio volatility
# and volatility is the square root of variance

exp_port_vola =

###########################
print(exp_port_vola)

In [None]:
# compute the Sharpe ratio with a 0% risk free rate of return

sharpe_ratio =

###########################
print(sharpe_ratio)

### Create a function of what you just did
Let's wrap everything we did so far into a function called "Statistics" that takes an input "weights" and spits out the Sharpe Ratio, portfolio return, and portfolio volatility



In [None]:
# "stocks" is a global variable the function can pull from the outside.
def Statistics(weights):
    
    weights = np.array(weights)
    
    normalized_returns = np.log(stocks / stocks.shift(1))
    
    exp_port_return = np.sum(normalized_returns.mean() * weights) * 252
    
    variance = np.dot(weights.T, np.dot(normalized_returns.cov() * 252, weights))
    
    exp_port_vola = np.sqrt(variance)
    
    sharpe_ratio = exp_port_return / exp_port_vola
    
    return np.array([exp_port_return, exp_port_vola, sharpe_ratio ])

### Now let's implement an optimizer to find the best allocation of weights

#### the optimizer we will be using is a minimizer but we want to find the maximum Sharpe Ratio.  How do we do that?

In [None]:
# create a function for the optimizer to minimize
def min_function_sharpe(weights):
    return -1 * Statistics(weights)[2]

In [None]:
# remember we import a scipy module for optimization above
import scipy.optimize as spo  # scientific library with optimizer functions
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

# we will be using the minimize function that uses the 
#Sequential Least Squares equation, ('SLSQP') for minimizing

# Bounds parameter:  each allocation can only be between 0 and 1
bounds = tuple((0,1) for stock in range(len(syms)))

# Constraints parameter: the sum of the weights must equal 1
constraints = ({ 'type': 'eq', 'fun': lambda w: np.sum(w) - 1})

# fill in your 
opts = spo.minimize(fun = min_function_sharpe,
                   x0 = initial_weights,
                   method = "SLSQP",
                   bounds = bounds,
                   constraints = constraints)


print(opts)


The optimal allocations are held in the **'x'** key of **"opts"**.  Let's attach them to the symbols they represent.

In [None]:
results = dict(zip(syms,opts['x']))
print(results)

In [None]:
# turn that dictionary into a list

allocations = list(results.values())

## We got all of our data!   Now time for some visualizations with pandas and matplotlib

In [None]:
%matplotlib inline 
#the line above enables you to show plots in jupyter without specifically calling matplotlib

import pandas as pd # columns and rows + general data analysis


prices_SPY = stocks_all['SPY']

# normalize around 1
prices_norm = stocks / stocks.iloc[0]
weighted_norm_returns = (prices_norm * allocations).sum(axis=1)

gen_plot = prices_SPY / prices_SPY.iloc[0]

portfolio = (weighted_norm_returns).to_frame().join(gen_plot.to_frame())
portfolio.columns = ['Portfolio','SPY']

portfolio.plot(figsize=(10,6), title='Daily Portfolio Value and SPY')



## Congrats!  You successfully wrote portfolio optimization code in Python.  Wrapping that code into a function will enable you to explore different portfolio allocations over different time frames

In [None]:
from PortAllocation import *
import datetime as dt
startdate = dt.date(2009,1,1)
enddate = dt.date(2012,6,1)
symbols = ['NVDA','JPM','AAPL']

results = optimize_portfolio(sd=startdate, ed=enddate,
                             syms=symbols, 
                             gen_plot=True)