# Smart Beta Portfolio Optimization Project

## Portfolio Construction Overview


- Objective 1:   Minimize portfolio variance
- Objective 2:   Smart beta portfolio can match the returns of an index with less volatility (higher risk-adjusted return)
- Methodology:   The Smart Beta Portfolio weightings is based on the amounts of dividends issued, not the market capitalization
- Constraints:   The Smart Beta Portfolio is long-only
- Evaluation :   The Smart Beta Portfolio will be compared to the performance of the ETF.





### Load Packages

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Research Universe

- The larger dollar volume stocks is selected as the research universe because they are highly liquid.

In [None]:
df = pd.read_csv('../../data/project_3/eod-quotemedia.csv')

percent_top_dollar = 0.2
high_volume_symbols = project_helper.large_dollar_volume_stocks(df, 'adj_close', 'adj_volume', percent_top_dollar)
df = df[df['ticker'].isin(high_volume_symbols)]

close = df.reset_index().pivot(index='date', columns='ticker', values='adj_close')
volume = df.reset_index().pivot(index='date', columns='ticker', values='adj_volume')
dividends = df.reset_index().pivot(index='date', columns='ticker', values='dividends')

# Smart Beta Portfolio

- Methodology: Portfolio weights are based on dividend yields. This portfolio will then be compared to the market cap weighted index to see how well it performs. 


In [None]:
def generate_dollar_volume_weights(close, volume):
    """
    Parameters
    ----------
    close : DataFrame
        Close price for each ticker and date
    volume : str
        Volume for each ticker and date

    Returns
    -------
    dollar_volume_weights : DataFrame
        The dollar volume weights for each ticker and date
    """
    assert close.index.equals(volume.index)
    assert close.columns.equals(volume.columns)
    
    new_list = []
    new_vol = close * volume
    
    for i, row in new_vol.iterrows():
        new_list.append(row.sum())
    new_vol['sum'] = new_list
    dollar_volume_weights = new_vol.iloc[:,:-1].div(new_vol['sum'].values,axis=0)
    return dollar_volume_weights

In [None]:
index_weights = generate_dollar_volume_weights(close, volume)

In [None]:
def calculate_dividend_weights(dividends):
    """
    Calculate dividend weights.

    Parameters
    ----------
    dividends : DataFrame
        Dividend for each stock and date

    Returns
    -------
    dividend_weights : DataFrame
        Weights for each stock and date
    """
    
    dividend_weights = np.cumsum(dividends)
    dividend_weights['sum'] = dividend_weights.sum(axis=1)
    dividend_weights = dividend_weights.iloc[:,:-1].div(dividend_weights["sum"].values,axis=0)
    

    return dividend_weights

In [None]:
etf_weights = calculate_dividend_weights(dividends)

## Returns
- Generate returns data for all the stocks and dates from price data.

In [None]:
def generate_returns(prices):
    """
    Generate returns for ticker and date.

    Parameters
    ----------
    prices : DataFrame
        Price for each ticker and date

    Returns
    -------
    returns : Dataframe
        The returns for each ticker and date
    """
    
    return (prices - prices.shift(1)) / prices.shift(1)


In [None]:
returns = generate_returns(close)

## Weighted Returns

- Create weighted returns using the returns and weights.

In [None]:
def generate_weighted_returns(returns, weights):
    """
    Generate weighted returns.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    weights : DataFrame
        Weights for each ticker and date

    Returns
    -------
    weighted_returns : DataFrame
        Weighted returns for each ticker and date
    """
    assert returns.index.equals(weights.index)
    assert returns.columns.equals(weights.columns)
    
    weighted_returns = returns * weights

    return weighted_returns

In [None]:
index_weighted_returns = generate_weighted_returns(returns, index_weights)
etf_weighted_returns = generate_weighted_returns(returns, etf_weights)


## Cumulative Returns
- Calculate the cumulative returns over time given the returns.

In [None]:
def calculate_cumulative_returns(returns):
    """
    Calculate cumulative returns.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date

    Returns
    -------
    cumulative_returns : Pandas Series
        Cumulative returns for each date
    """
    return returns.sum(axis=1)
    

In [None]:
index_weighted_cumulative_returns = calculate_cumulative_returns(index_weighted_returns)
etf_weighted_cumulative_returns = calculate_cumulative_returns(etf_weighted_returns)

## Tracking Error

- Annualized tracking error between the portfolio and the index

Formula is as follows:
$$ TE = \sqrt{252} * SampleStdev(r_p - r_b) $$

Where $ r_p $ is the portfolio/ETF returns and $ r_b $ is the benchmark returns.

In [None]:
def tracking_error(benchmark_returns_by_date, etf_returns_by_date):
    """
    Calculate the tracking error.

    Parameters
    ----------
    benchmark_returns_by_date : Pandas Series
        The benchmark returns for each date
    etf_returns_by_date : Pandas Series
        The ETF returns for each date

    Returns
    -------
    tracking_error : float
        The tracking error
    """
    assert benchmark_returns_by_date.index.equals(etf_returns_by_date.index)
    return np.sqrt(252) * np.std(etf_returns_by_date - benchmark_returns_by_date, ddof=1)


In [28]:
smart_beta_tracking_error = tracking_error(np.sum(index_weighted_returns, 1), np.sum(etf_weighted_returns, 1))
print('Smart Beta Tracking Error: {}'.format(smart_beta_tracking_error))

Smart Beta Tracking Error: 0.1020761483200753


## Portfolio Optimization: Part 2

Create second portfolio. This portfolio shall be independent of the dividend-weighted portfolio that we created in part 1.

- Some investors evaluate a fund by looking at how well it tracks its index. However, the fund is still expected to deviate from the index within a certain range in order to improve fund performance. 

- Objective: minimize the portfolio variance and closely track a market cap weighted index.  Hence, objectivve is to minimize the distance between the weights of our portfolio and the weights of the index.

$Minimize \left [ \sigma^2_p + \lambda \sqrt{\sum_{1}^{m}(weight_i - indexWeight_i)^2} \right  ]$ where $m$ is the number of stocks in the portfolio, and $\lambda$ is a scaling factor (hyperparameter).

- By minimizing a linear combination of both the portfolio risk and distance between portfolio and benchmark weights, we attempt to balance the desire to minimize portfolio variance with the goal of tracking the index.

## Covariance
-  Calculate the covariance of the returns

In [39]:
def get_covariance_returns(returns):
    """
    Calculate covariance matrices.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date

    Returns
    -------
    returns_covariance  : 2 dimensional Ndarray
        The covariance of the returns
    """
    #TODO: Implement function
    returns_copy = returns.copy()
    returns_copy.fillna(0, inplace=True)
    returns_covariance = np.cov(returns_copy.T)
    print(returns_covariance)
    
    return returns_covariance



[[ 0.89856076  0.7205586   0.8458721 ]
 [ 0.7205586   0.78707297  0.76450378]
 [ 0.8458721   0.76450378  0.83182775]]
Tests Passed


In [43]:
covariance_returns = get_covariance_returns(returns)
covariance_returns = pd.DataFrame(covariance_returns, index = returns.columns, columns = returns.columns)

[[  5.31791451e-04   7.21251856e-05   9.55847963e-05 ...,   4.55526842e-05
    1.26756746e-04   4.27885833e-05]
 [  7.21251856e-05   2.14185487e-04   4.87000321e-05 ...,   3.43023358e-05
    8.17766960e-05   4.13786801e-05]
 [  9.55847963e-05   4.87000321e-05   2.64537496e-04 ...,   3.32734532e-05
    1.11616723e-04   5.40669112e-05]
 ..., 
 [  4.55526842e-05   3.43023358e-05   3.32734532e-05 ...,   1.14424083e-04
    3.48568802e-05   2.79122492e-05]
 [  1.26756746e-04   8.17766960e-05   1.11616723e-04 ...,   3.48568802e-05
    7.17162013e-04   8.42633668e-05]
 [  4.27885833e-05   4.13786801e-05   5.40669112e-05 ...,   2.79122492e-05
    8.42633668e-05   1.29561995e-04]]


### Portfolio variance
Formula: $\sigma^2_p = \mathbf{x^T} \mathbf{P} \mathbf{x}$

### Distance from index weights
L2 norm.  $\sqrt{\sum_{1}^{n}(weight_i - indexWeight_i)^2}$  or $\left \| \mathbf{x} - \mathbf{index} \right \|_2$.

### Objective function
-  minimize both the portfolio variance and the distance of the portfolio weights from the index weights. Choose hyperparameter ('lambda')

### Constraints
Go long only.  Constaint =  [x >= 0, sum(x) == 1].

## Optimized Portfolio

In [None]:
raw_optimal_single_rebalance_etf_weights = get_optimal_weights(covariance_returns.values, index_weights.iloc[-1])
optimal_single_rebalance_etf_weights = pd.DataFrame(
    np.tile(raw_optimal_single_rebalance_etf_weights, (len(returns.index), 1)),
    returns.index,
    returns.columns)

With our ETF weights built, let's compare it to the index. Run the next cell to calculate the ETF returns and compare it to the index returns.

In [None]:
optim_etf_returns = generate_weighted_returns(returns, optimal_single_rebalance_etf_weights)
optim_etf_cumulative_returns = calculate_cumulative_returns(optim_etf_returns)
project_helper.plot_benchmark_returns(index_weighted_cumulative_returns, optim_etf_cumulative_returns, 'Optimized ETF vs Index')

optim_etf_tracking_error = tracking_error(np.sum(index_weighted_returns, 1), np.sum(optim_etf_returns, 1))
print('Optimized ETF Tracking Error: {}'.format(optim_etf_tracking_error))