スマートベータポートフォリオとポートフォリオの最適化

## 概要

スマートベータには広い意味がありますが、実際には、インデックスから株式のユニバースを使用し、時価総額加重以外の加重スキームを適用すると、一種のスマートベータファンドと見なすことができます。 スマートベータポートフォリオは、一般に、投資家に特定の市場への多様な幅広いエクスポージャーを提供しながら、価格を予測すると考えられる1つ以上のタイプの市場特性（またはファクター）へのエクスポージャーまたは「ベータ」を提供します。 スマートベータポートフォリオは通常、モメンタム、収益の質、低ボラティリティ、配当、またはいくつかの組み合わせを対象としています。 スマートベータポートフォリオは、通常、リバランスが頻繁に行われることはなく、受動的に管理される比較的単純なルールまたはアルゴリズムに従います。 これらのタイプのファンドへのモデル変更もまれであり、米国に焦点を当てたミューチュアルファンドまたはETFの場合、米国証券取引委員会への目論見書の提出が必要です。

対照的に、純粋にアルファに焦点を合わせた定量的ファンドは、ポートフォリオを作成するために複数のモデルまたはアルゴリズムを使用する場合があります。 ポートフォリオマネージャーは、モデルのタイプをアップグレードまたは変更する際の裁量と、株式ベンチマークと比較してパフォーマンスを最大化するためにポートフォリオをリバランスする頻度を保持します。 マネージャーは、ポートフォリオの株を不足させる裁量権を持っている場合があります。

ポートフォリオマネージャーとして、いくつかの異なるポートフォリオの重み付け方法を試してみたいとしてます。 

ポートフォリオを設計する1つの方法は、過去の傾向に基づいて、より良い結果を生み出す株式を示す特定の会計基準（ファンダメンタル）を調べることです。 

たとえば、配当を発行する株式は、そうでない株式よりもパフォーマンスが高い傾向があるという仮説から始めます。これは必ずしもすべての企業に当てはまるとは限りません。たとえば、Appleは配当金を発行していませんが、過去の実績は良好です。配当を支払う株式に関する仮説は、次のようになります。

定期的に配当を行う企業は、利用可能な現金をより慎重に配分することもでき、株主の利益を優先することをより意識していることを示している可能性があります。たとえば、CEOは、収益の低いペットプロジェクトに現金を再投資することを決定する場合があります。または、CEOが分析を行い、企業内での再投資は分散ポートフォリオと比較して収益が低いことを確認し、株主が現金（配当の形で）を与えられた方がより良いサービスを受けると判断する場合があります。したがって、この仮説によれば、配当は、会社の業績（収益とキャッシュフローの観点から）の代用であるだけでなく、会社が株主の最善の利益のために行動していることの合図でもあります。もちろん、これが実際に機能するかどうかをテストすることは重要です。 

また、ETFにすることができるポートフォリオを設計したいという別の仮説があるかもしれません。 投資家はパッシブベータファンドに投資したいと思うかもしれませんが、投資のリスクエクスポージャーを減らしたい（ボラティリティを減らしたい）と思うかもしれません。 インデックスと同様のリターンを生み出す低ボラティリティファンドを持つという目標は、投資期間が短い投資家にとって魅力的である可能性があり、リスク回避性が高くなります。 

以上をふまえて、ポートフォリオの目的を、ポートフォリオの分散を最小限に抑えながら、インデックスを厳密に追跡するポートフォリオを設計すること、とします。 また、このポートフォリオがボラティリティの低いインデックスのリターンと一致する場合、リスク調整後リターンは高くなります（同じリターン、ボラティリティが低くなります）。

スマートベータETFは、これら2つの一般的な方法（とりわけ）の両方を使用して設計できます。代替の重み付けと最小ボラティリティETFです。 

In [None]:
import sys
!{sys.executable} -m pip install -r requirements.txt

### Load Packages

In [None]:
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm

In [None]:
from IPython.core.display import display, HTML
import plotly.graph_objs as go
import plotly.figure_factory as ff

import plotly.offline as offline_py
offline_py.init_notebook_mode(connected=True)

color_scheme = {
    'index': '#B6B2CF',
    'etf': '#2D3ECF',
    'tracking_error': '#6F91DE',
    'df_header': 'silver',
    'df_value': 'white',
    'df_line': 'silver',
    'heatmap_colorscale': [(0, '#6F91DE'), (0.5, 'grey'), (1, 'red')],
    'background_label': '#9dbdd5',
    'low_value': '#B6B2CF',
    'high_value': '#2D3ECF',
    'y_axis_2_text_color': 'grey',
    'shadow': 'rgba(0, 0, 0, 0.75)',
    'major_line': '#2D3ECF',
    'minor_line': '#B6B2CF',
    'main_line': 'black'}


def generate_config():
    return {'showLink': False, 'displayModeBar': False, 'showAxisRangeEntryBoxes': True}


def _generate_hover_text(x_text, y_text, z_values, x_label, y_label, z_label):
    float_to_str = np.vectorize('{:.7f}'.format)

    x_hover_text_values = np.tile(x_text.strftime("%Y-%m-%d") , (len(y_text), 1))
    y_hover_text_values = np.tile(y_text, (len(x_text), 1))

    padding_len = np.full(3, max(len(x_label), len(y_label), len(z_label))) - [len(x_label), len(y_label), len(z_label)]

    # Additional padding added to ticker and date to align
    hover_text = x_label + ':  ' + padding_len[0] * ' ' + x_hover_text_values + '<br>' + y_label + ':  ' + padding_len[1] * ' ' + y_hover_text_values.T + '<br>' +  z_label + ': ' + padding_len[2] * ' ' + float_to_str(z_values)

    return hover_text


def _generate_heatmap_trace(df, x_label, y_label, z_label, scale_min, scale_max):
    hover_text = _generate_hover_text(df.index, df.columns, df.values.T, x_label, y_label, z_label)

    return go.Heatmap(
        x=df.index,
        y=df.columns,
        z=df.values.T,
        zauto=False,
        zmax=scale_max,
        zmin=scale_min,
        colorscale=color_scheme['heatmap_colorscale'],
        text=hover_text,
        hoverinfo='text')



def _sanatize_string(string):
    return ''.join([i for i in string if i.isalpha()])

def plot_weights(weights, title):
    config = generate_config()
    graph_path = 'graphs/{}.html'.format(_sanatize_string(title))
    trace = _generate_heatmap_trace(weights.sort_index(axis=1, ascending=False), 'Date', 'Ticker', 'Weight', 0.0, 0.2)
    layout = go.Layout(
        title=title,
        xaxis={'title': 'Dates'},
        yaxis={'title': 'Tickers'})

    fig = go.Figure(data=[trace], layout=layout)
    offline_py.plot(fig, config=config, filename=graph_path, auto_open=False)
    display(HTML('The graph for {} is too large. You can view it <a href="{}" target="_blank">here</a>.'
                 .format(title, graph_path)))

## 市場データ
### データロード
株価のユニバースとして、流動性が高いと考えられる、取引金額ボリュームの多いものを採用します。

In [None]:
df = pd.read_csv('../../data/project_3/eod-quotemedia.csv')

percent_top_dollar = 0.2
high_volume_symbols = project_helper.large_dollar_volume_stocks(df, 'adj_close', 'adj_volume', percent_top_dollar)
df = df[df['ticker'].isin(high_volume_symbols)]

close = df.reset_index().pivot(index='date', columns='ticker', values='adj_close')
volume = df.reset_index().pivot(index='date', columns='ticker', values='adj_volume')
dividends = df.reset_index().pivot(index='date', columns='ticker', values='dividends')

In [None]:
%%time
nikkei255 = pd.read_csv('../nikkei255.csv', index_col=0)
path = os.path.join(os.environ['CSVDIR'] ,'daily')
print(path)

li_close = []
li_volume = []
li_dividend = []

for code, company in tqdm(nikkei255.iterrows()):
    # load CSV files
    df_each_close = pd.read_csv(os.path.join(path, str(code) + '.csv'), usecols=['date', 'close'])
    df_each_volume  = pd.read_csv(os.path.join(path, str(code) + '.csv'), usecols=['date', 'volume'])
    df_each_dividend   = pd.read_csv(os.path.join(path, str(code) + '.csv'), usecols=['date', 'dividend'])

    # Set date time index
    df_each_close['date'] = pd.to_datetime(df_each_close['date'])
    df_each_close = df_each_close.set_index('date')
    df_each_volume['date'] = pd.to_datetime(df_each_volume['date'])
    df_each_volume = df_each_volume.set_index('date')
    df_each_dividend['date'] = pd.to_datetime(df_each_dividend['date'])
    df_each_dividend = df_each_dividend.set_index('date')

    # Set compnay name to each columns
    df_each_close = df_each_close.rename(columns={'close': company['name']})
    df_each_volume = df_each_volume.rename(columns={'volume': company['name']})
    df_each_dividend = df_each_dividend.rename(columns={'dividend': company['name']})

    li_close.append(df_each_close)
    li_volume.append(df_each_volume)
    li_dividend.append(df_each_dividend)

close = pd.concat(li_close, axis = 1)
volume = pd.concat(li_volume, axis = 1)
dividend = pd.concat(li_dividend, axis = 1)

### View Data
To see what one of these 2-d matrices looks like, let's take a look at the closing prices matrix.

In [None]:
close.head()

# Part 1: スマート・ベータ・ポートフォリオ
このパート1では、配当利回り(dividend yield)を使用してポートフォリオを構築し、ポートフォリオのウェイトを選択します。このようなポートフォリオは、スマートベータETFに組み込むことができます。 このポートフォリオを時価総額加重指数と比較して、パフォーマンスを確認します。

実際には、おそらくデータベンダーからインデックスの重みを取得しますが、ここでは、時価総額加重インデックスをシミュレートします。

##インデックスの重み
使用するインデックスは、大量の株式に基づいています。 `generate_yen_volume_weights`を実装して、このインデックスの重みを生成します。 日付ごとに、その日付で取引された金額(円)のボリュームに基づいてウェイトを生成します。 たとえば、以下が終値と出来高データであると仮定します。 
```
                 Prices
               A         B         ...
2013-07-08     2         2         ...
2013-07-09     5         6         ...
2013-07-10     1         2         ...
2013-07-11     6         5         ...
...            ...       ...       ...

                 Volume
               A         B         ...
2013-07-08     100       340       ...
2013-07-09     240       220       ...
2013-07-10     120       500       ...
2013-07-11     10        100       ...
...            ...       ...       ...
```
関数 `generate_yen_volume_weights`から作成された重みは次のようになります。 
```
               A         B         ...
2013-07-08     0.126..   0.194..   ...
2013-07-09     0.759..   0.377..   ...
2013-07-10     0.075..   0.285..   ...
2013-07-11     0.037..   0.142..   ...
...            ...       ...       ...
```

In [None]:
def generate_yen_volume_weights(close, volume):
    """
    Generate dollar volume weights.

    Parameters
    ----------
    close : DataFrame
        Close price for each ticker and date
    volume : str
        Volume for each ticker and date

    Returns
    -------
    yen_volume_weights : DataFrame

    Returns
    -------
    yen_volume_weights : DataFrame
        The yen volume weights for each ticker and date
    """
    assert close.index.equals(volume.index)
    assert close.columns.equals(volume.columns)
    
    yen_volume = close * volume
    date_sum_of_yen_volume = yen_volume.sum(axis=1)
    yen_volume_weights = yen_volume.div(date_sum_of_dollar_volume,axis=0)

    return yen_volume_weights

### データ可視化
`generate_yen_volume_weights`を使用してインデックスの重みを生成し、ヒートマップで表示します。 

In [None]:
index_weights = generate_yen_volume_weights(close, volume)
plot_weights(index_weights, 'Index Weights')

## Portfolio Weights
Now that we have the index weights, let's choose the portfolio weights based on dividend. You would normally calculate the weights based on trailing dividend yield, but we'll simplify this by just calculating the total dividend yield over time.

Implement `calculate_dividend_weights` to return the weights for each stock based on its total dividend yield over time. This is similar to generating the weight for the index, but it's using dividend data instead.
For example, assume the following is `dividends` data:
```
                 Prices
               A         B
2013-07-08     0         0
2013-07-09     0         1
2013-07-10     0.5       0
2013-07-11     0         0
2013-07-12     2         0
...            ...       ...
```
The weights created from the function `calculate_dividend_weights` should be the following:
```
               A         B
2013-07-08     NaN       NaN
2013-07-09     0         1
2013-07-10     0.333..   0.666..
2013-07-11     0.333..   0.666..
2013-07-12     0.714..   0.285..
...            ...       ...
```

In [None]:
def calculate_dividend_weights(dividends):
    """
    Calculate dividend weights.

    Parameters
    ----------
    dividends : DataFrame
        Dividend for each stock and date

    Returns
    -------
    dividend_weights : DataFrame
        Weights for each stock and date
    """
    #TODO: Implement function
    dividends_cumsum = dividends.cumsum()
    date_sum = dividends_cumsum.sum(axis=1)
    dividend_weights = dividends_cumsum.div(date_sum, axis=0)

    return dividend_weights

project_tests.test_calculate_dividend_weights(calculate_dividend_weights)

### View Data
Just like the index weights, let's generate the ETF weights and view them using a heatmap.

In [None]:
etf_weights = calculate_dividend_weights(dividends)
project_helper.plot_weights(etf_weights, 'ETF Weights')

## Returns
Implement `generate_returns` to generate returns data for all the stocks and dates from price data. You might notice we're implementing returns and not log returns. Since we're not dealing with volatility, we don't have to use log returns.

In [None]:
def generate_returns(prices):
    """
    Generate returns for ticker and date.

    Parameters
    ----------
    prices : DataFrame
        Price for each ticker and date

    Returns
    -------
    returns : Dataframe
        The returns for each ticker and date
    """
    #TODO: Implement function

    return (prices/prices.shift(1))-1

project_tests.test_generate_returns(generate_returns)

### View Data
Let's generate the closing returns using `generate_returns` and view them using a heatmap.

In [None]:
returns = generate_returns(close)
project_helper.plot_returns(returns, 'Close Returns')

## Weighted Returns
With the returns of each stock computed, we can use it to compute the returns for an index or ETF. Implement `generate_weighted_returns` to create weighted returns using the returns and weights.

In [None]:
def generate_weighted_returns(returns, weights):
    """
    Generate weighted returns.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    weights : DataFrame
        Weights for each ticker and date

    Returns
    -------
    weighted_returns : DataFrame
        Weighted returns for each ticker and date
    """
    assert returns.index.equals(weights.index)
    assert returns.columns.equals(weights.columns)
    
    #TODO: Implement function

    return returns * weights

project_tests.test_generate_weighted_returns(generate_weighted_returns)

### View Data
Let's generate the ETF and index returns using `generate_weighted_returns` and view them using a heatmap.

In [None]:
index_weighted_returns = generate_weighted_returns(returns, index_weights)
etf_weighted_returns = generate_weighted_returns(returns, etf_weights)
project_helper.plot_returns(index_weighted_returns, 'Index Returns')
project_helper.plot_returns(etf_weighted_returns, 'ETF Returns')

## Cumulative Returns
To compare performance between the ETF and Index, we're going to calculate the tracking error. Before we do that, we first need to calculate the index and ETF comulative returns. Implement `calculate_cumulative_returns` to calculate the cumulative returns over time given the returns.

In [None]:
def calculate_cumulative_returns(returns):
    """
    Calculate cumulative returns.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date

    Returns
    -------
    cumulative_returns : Pandas Series
        Cumulative returns for each date
    """
    #TODO: Implement function
    #print(returns.sum(axis=1)+1)
    return (returns.sum(axis=1)+1).cumprod()

project_tests.test_calculate_cumulative_returns(calculate_cumulative_returns)

### View Data
Let's generate the ETF and index cumulative returns using `calculate_cumulative_returns` and compare the two.

In [None]:
index_weighted_cumulative_returns = calculate_cumulative_returns(index_weighted_returns)
etf_weighted_cumulative_returns = calculate_cumulative_returns(etf_weighted_returns)
project_helper.plot_benchmark_returns(index_weighted_cumulative_returns, etf_weighted_cumulative_returns, 'Smart Beta ETF vs Index')

## Tracking Error
In order to check the performance of the smart beta portfolio, we can calculate the annualized tracking error against the index. Implement `tracking_error` to return the tracking error between the ETF and benchmark.

For reference, we'll be using the following annualized tracking error function:
$$ TE = \sqrt{252} * SampleStdev(r_p - r_b) $$

Where $ r_p $ is the portfolio/ETF returns and $ r_b $ is the benchmark returns.

_Note: When calculating the sample standard deviation, the delta degrees of freedom is 1, which is the also the default value._

In [None]:
def tracking_error(benchmark_returns_by_date, etf_returns_by_date):
    """
    Calculate the tracking error.

    Parameters
    ----------
    benchmark_returns_by_date : Pandas Series
        The benchmark returns for each date
    etf_returns_by_date : Pandas Series
        The ETF returns for each date

    Returns
    -------
    tracking_error : float
        The tracking error
    """
    assert benchmark_returns_by_date.index.equals(etf_returns_by_date.index)
    
    #TODO: Implement function
    error = etf_returns_by_date - benchmark_returns_by_date
    #print(error.std())
    return np.sqrt(252) * error.std()

project_tests.test_tracking_error(tracking_error)

### View Data
Let's generate the tracking error using `tracking_error`.

In [None]:
smart_beta_tracking_error = tracking_error(np.sum(index_weighted_returns, 1), np.sum(etf_weighted_returns, 1))
print('Smart Beta Tracking Error: {}'.format(smart_beta_tracking_error))

# Part 2: Portfolio Optimization

Now, let's create a second portfolio.  We'll still reuse the market cap weighted index, but this will be independent of the dividend-weighted portfolio that we created in part 1.

We want to both minimize the portfolio variance and also want to closely track a market cap weighted index.  In other words, we're trying to minimize the distance between the weights of our portfolio and the weights of the index.

$Minimize \left [ \sigma^2_p + \lambda \sqrt{\sum_{1}^{m}(weight_i - indexWeight_i)^2} \right  ]$ where $m$ is the number of stocks in the portfolio, and $\lambda$ is a scaling factor that you can choose.

Why are we doing this? One way that investors evaluate a fund is by how well it tracks its index. The fund is still expected to deviate from the index within a certain range in order to improve fund performance.  A way for a fund to track the performance of its benchmark is by keeping its asset weights similar to the weights of the index.  We’d expect that if the fund has the same stocks as the benchmark, and also the same weights for each stock as the benchmark, the fund would yield about the same returns as the benchmark. By minimizing a linear combination of both the portfolio risk and distance between portfolio and benchmark weights, we attempt to balance the desire to minimize portfolio variance with the goal of tracking the index.


## Covariance
Implement `get_covariance_returns` to calculate the covariance of the `returns`. We'll use this to calculate the portfolio variance.

If we have $m$ stock series, the covariance matrix is an $m \times m$ matrix containing the covariance between each pair of stocks.  We can use [`Numpy.cov`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html) to get the covariance.  We give it a 2D array in which each row is a stock series, and each column is an observation at the same period of time. For any `NaN` values, you can replace them with zeros using the [`DataFrame.fillna`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html) function.

The covariance matrix $\mathbf{P} = 
\begin{bmatrix}
\sigma^2_{1,1} & ... & \sigma^2_{1,m} \\ 
... & ... & ...\\
\sigma_{m,1} & ... & \sigma^2_{m,m}  \\
\end{bmatrix}$

In [None]:
def get_covariance_returns(returns):
    """
    Calculate covariance matrices.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date

    Returns
    -------
    returns_covariance  : 2 dimensional Ndarray
        The covariance of the returns
    """
    #TODO: Implement function
    #print(returns.fillna(0).cov())
    
    return returns.fillna(0).cov().values

project_tests.test_get_covariance_returns(get_covariance_returns)

### View Data
Let's look at the covariance generated from `get_covariance_returns`.

In [None]:
covariance_returns = get_covariance_returns(returns)
covariance_returns = pd.DataFrame(covariance_returns, returns.columns, returns.columns)

covariance_returns_correlation = np.linalg.inv(np.diag(np.sqrt(np.diag(covariance_returns))))
covariance_returns_correlation = pd.DataFrame(
    covariance_returns_correlation.dot(covariance_returns).dot(covariance_returns_correlation),
    covariance_returns.index,
    covariance_returns.columns)

project_helper.plot_covariance_returns_correlation(
    covariance_returns_correlation,
    'Covariance Returns Correlation Matrix')

### portfolio variance
We can write the portfolio variance $\sigma^2_p = \mathbf{x^T} \mathbf{P} \mathbf{x}$

Recall that the $\mathbf{x^T} \mathbf{P} \mathbf{x}$ is called the quadratic form.
We can use the cvxpy function `quad_form(x,P)` to get the quadratic form.

### Distance from index weights
We want portfolio weights that track the index closely.  So we want to minimize the distance between them.
Recall from the Pythagorean theorem that you can get the distance between two points in an x,y plane by adding the square of the x and y distances and taking the square root.  Extending this to any number of dimensions is called the L2 norm.  So: $\sqrt{\sum_{1}^{n}(weight_i - indexWeight_i)^2}$  Can also be written as $\left \| \mathbf{x} - \mathbf{index} \right \|_2$.  There's a cvxpy function called [norm()](https://www.cvxpy.org/api_reference/cvxpy.atoms.other_atoms.html#norm)
`norm(x, p=2, axis=None)`.  The default is already set to find an L2 norm, so you would pass in one argument, which is the difference between your portfolio weights and the index weights.

### objective function
We want to minimize both the portfolio variance and the distance of the portfolio weights from the index weights.
We also want to choose a `scale` constant, which is $\lambda$ in the expression. 

$\mathbf{x^T} \mathbf{P} \mathbf{x} + \lambda \left \| \mathbf{x} - \mathbf{index} \right \|_2$


This lets us choose how much priority we give to minimizing the difference from the index, relative to minimizing the variance of the portfolio.  If you choose a higher value for `scale` ($\lambda$).

We can find the objective function using cvxpy `objective = cvx.Minimize()`.  Can you guess what to pass into this function?



### constraints
We can also define our constraints in a list.  For example, you'd want the weights to sum to one. So $\sum_{1}^{n}x = 1$.  You may also need to go long only, which means no shorting, so no negative weights.  So $x_i >0 $ for all $i$. you could save a variable as `[x >= 0, sum(x) == 1]`, where x was created using `cvx.Variable()`.

### optimization
So now that we have our objective function and constraints, we can solve for the values of $\mathbf{x}$.
cvxpy has the constructor `Problem(objective, constraints)`, which returns a `Problem` object.

The `Problem` object has a function solve(), which returns the minimum of the solution.  In this case, this is the minimum variance of the portfolio.

It also updates the vector $\mathbf{x}$.

We can check out the values of $x_A$ and $x_B$ that gave the minimum portfolio variance by using `x.value`

In [None]:
import cvxpy as cvx

def get_optimal_weights(covariance_returns, index_weights, scale=2.0):
    """
    Find the optimal weights.

    Parameters
    ----------
    covariance_returns : 2 dimensional Ndarray
        The covariance of the returns
    index_weights : Pandas Series
        Index weights for all tickers at a period in time
    scale : int
        The penalty factor for weights the deviate from the index 
    Returns
    -------
    x : 1 dimensional Ndarray
        The solution for x
    """
    assert len(covariance_returns.shape) == 2
    assert len(index_weights.shape) == 1
    assert covariance_returns.shape[0] == covariance_returns.shape[1]  == index_weights.shape[0]

    #TODO: Implement function
    #cf.https://www.cvxpy.org/tutorial/intro/index.html
    x = cvx.Variable(len(index_weights))
    distance = scale * cvx.norm(x-index_weights, p=2, axis=None)
    std = cvx.quad_form(x, covariance_returns)
    problem = cvx.Problem(cvx.Minimize(std + distance),
                          [x >= 0,sum(x) == 1])
    problem.solve()

    return x.value

project_tests.test_get_optimal_weights(get_optimal_weights)

## Optimized Portfolio
Using the `get_optimal_weights` function, let's generate the optimal ETF weights without rebalanceing. We can do this by feeding in the covariance of the entire history of data. We also need to feed in a set of index weights. We'll go with the average weights of the index over time.

In [None]:
raw_optimal_single_rebalance_etf_weights = get_optimal_weights(covariance_returns.values, index_weights.iloc[-1])
optimal_single_rebalance_etf_weights = pd.DataFrame(
    np.tile(raw_optimal_single_rebalance_etf_weights, (len(returns.index), 1)),
    returns.index,
    returns.columns)

With our ETF weights built, let's compare it to the index. Run the next cell to calculate the ETF returns and compare it to the index returns.

In [None]:
optim_etf_returns = generate_weighted_returns(returns, optimal_single_rebalance_etf_weights)
optim_etf_cumulative_returns = calculate_cumulative_returns(optim_etf_returns)
project_helper.plot_benchmark_returns(index_weighted_cumulative_returns, optim_etf_cumulative_returns, 'Optimized ETF vs Index')

optim_etf_tracking_error = tracking_error(np.sum(index_weighted_returns, 1), np.sum(optim_etf_returns, 1))
print('Optimized ETF Tracking Error: {}'.format(optim_etf_tracking_error))

## Rebalance Portfolio Over Time
The single optimized ETF portfolio used the same weights for the entire history. This might not be the optimal weights for the entire period. Let's rebalance the portfolio over the same period instead of using the same weights. Implement `rebalance_portfolio` to rebalance a portfolio.

Reblance the portfolio every n number of days, which is given as `shift_size`. When rebalancing, you should look back a certain number of days of data in the past, denoted as `chunk_size`. Using this data, compute the optoimal weights using `get_optimal_weights` and `get_covariance_returns`.

In [None]:
def rebalance_portfolio(returns, index_weights, shift_size, chunk_size):
    """
    Get weights for each rebalancing of the portfolio.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    index_weights : DataFrame
        Index weight for each ticker and date
    shift_size : int
        The number of days between each rebalance
    chunk_size : int
        The number of days to look in the past for rebalancing

    Returns
    -------
    all_rebalance_weights  : list of Ndarrays
        The ETF weights for each point they are rebalanced
    """
    assert returns.index.equals(index_weights.index)
    assert returns.columns.equals(index_weights.columns)
    assert shift_size > 0
    assert chunk_size >= 0
    
    #TODO: Implement function
    all_rebalance_weights = []
    for i in range (chunk_size, len(returns), shift_size):
        chunked_cov_returns = get_covariance_returns(returns[i-chunk_size:i])
        # print(chunked_cov_returns)
        # Failed: iWeights = index_weights.iloc[(i-chunk_size):i].mean()
        chunked_weights = index_weights.iloc[i-1]
        #print(iWeights)
        optimal_weights_in_chunk = get_optimal_weights(chunked_cov_returns,
                                                       chunked_weights)
        #print(optimal_weights_in_chunk)
        all_rebalance_weights.append(optimal_weights_in_chunk)
    
    return all_rebalance_weights

project_tests.test_rebalance_portfolio(rebalance_portfolio)

Run the following cell to rebalance the portfolio using `rebalance_portfolio`.

In [None]:
chunk_size = 250
shift_size = 5
all_rebalance_weights = rebalance_portfolio(returns, index_weights, shift_size, chunk_size)

## Portfolio Turnover
With the portfolio rebalanced, we need to use a metric to measure the cost of rebalancing the portfolio. Implement `get_portfolio_turnover` to calculate the annual portfolio turnover. We'll be using the formulas used in the classroom:

$ AnnualizedTurnover =\frac{SumTotalTurnover}{NumberOfRebalanceEvents} * NumberofRebalanceEventsPerYear $

$ SumTotalTurnover =\sum_{t,n}{\left | x_{t,n} - x_{t+1,n} \right |} $ Where $ x_{t,n} $ are the weights at time $ t $ for equity $ n $.

$ SumTotalTurnover $ is just a different way of writing $ \sum \left | x_{t_1,n} - x_{t_2,n} \right | $

In [None]:
def get_portfolio_turnover(all_rebalance_weights, shift_size, rebalance_count, n_trading_days_in_year=252):
    """
    Calculage portfolio turnover.

    Parameters
    ----------
    all_rebalance_weights : list of Ndarrays
        The ETF weights for each point they are rebalanced
    shift_size : int
        The number of days between each rebalance
    rebalance_count : int
        Number of times the portfolio was rebalanced
    n_trading_days_in_year: int
        Number of trading days in a year

    Returns
    -------
    portfolio_turnover  : float
        The portfolio turnover
    """
    assert shift_size > 0
    assert rebalance_count > 0
    
    #TODO: Implement function
    #print(all_rebalance_weights)
    #print(np.flip(all_rebalance_weights, axis=0))
    #print(np.diff(np.flip(all_rebalance_weights, axis=0),axis=0))
    sum_total_turnover = np.sum(
                            np.abs(
                                np.diff(
                                    np.flip(
                                        all_rebalance_weights,
                                        axis=0),
                                    axis = 0)
                            )
                         )
    num_of_rebalance_events = n_trading_days_in_year // shift_size
    portfolio_turnover = (sum_total_turnover * num_of_rebalance_events)/rebalance_count
    
    return portfolio_turnover

project_tests.test_get_portfolio_turnover(get_portfolio_turnover)

Run the following cell to get the portfolio turnover from  `get_portfolio turnover`.

In [None]:
print(get_portfolio_turnover(all_rebalance_weights, shift_size, len(all_rebalance_weights) - 1))

That's it! You've built a smart beta portfolio in part 1 and did portfolio optimization in part 2. You can now submit your project.

## Submission
Now that you're done with the project, it's time to submit it. Click the submit button in the bottom right. One of our reviewers will give you feedback on your project with a pass or not passed grade. You can continue to the next section while you wait for feedback.