# Model Specification
We will use the Black-Litterman (BL) approach for calculating weights of assets in our portfolio. We use the classical approach with some adjustments, that will be discussed in what is to follow.

Under BL, the expected returns of the assets in our portfolio are given by
$$
\mu_{BL} = \left[ (\tau \Sigma)^{-1} + P^\top \Omega^{-1} P \right]^{-1} \left[ (\tau \Sigma)^{-1} \Pi + P^\top \Omega^{-1} Q \right],
$$
where $\mu_{BL}$ is the expected return of the portfolio, $\tau$ is the uncertainty in the prior, $\Sigma$ is the covariance matrix of asset returns, $P$ is a picking matrix, with row vectors that represent the weights used in assets for expressing relative / absolute views, $\Omega$ is the diagonal covariance matrix of confidences of our expected returns $Q$ (view specified by investor), and $\Pi$ is the implied excess return of our assets.

Now,
$$
\Pi = \lambda \Sigma w_{mkt},
$$
where $\lambda$ is the investor risk aversion co-efficient, and $w_{mkt}$ are the market capitalization weights of our assets, determined by TVL in our case (*_todo: expand on this_*).

Before we jump into modelling, let's first explore our data. We extract the largest 40 assets by TVL.

In [9]:
# Exploring the universe of data
import pandas as pd
import numpy as np
from main_app.infrastructure.defi_llama import get_pool_summary_data

pool_map = get_pool_summary_data()
df = pd.DataFrame()
for symbol, pools in pool_map.items():
    tvl = 0
    weighted_apy = 0
    num_pools = len(pools)
    largest_pool_id = ""
    largest_pool_tvl = 0
    for pool in pools:
        tvl += pool.tvlUsd
        weighted_apy += pool.tvlUsd * pool.apy / 100
        if pool.tvlUsd > largest_pool_tvl:
            largest_pool_tvl = pool.tvlUsd
            largest_pool_id = pool.pool

    weighted_apy = weighted_apy / tvl

    df = pd.concat([df, pd.DataFrame({'symbol': [symbol], 'tvlUsd': [tvl], 'apy': [weighted_apy], 'pools': [num_pools], 'largest_pool_id': [largest_pool_id], 'largest_pool_tvl': [largest_pool_tvl], 'largest_pool_pct_of_tvl': [largest_pool_tvl / tvl * 100]})])

df.sort_values(by='tvlUsd', ascending=False)
largest_assets_df = df.head(40)

print(largest_assets_df.head(5))

  symbol       tvlUsd       apy  pools                       largest_pool_id  \
0  STETH  22834298418  0.026812     18  747c1d2a-c668-4682-b9f9-296708a3dd90   
0  WEETH  11436105113  0.018039     55  46bd2bdf-6d92-4066-b482-e885ee172264   
0  WBETH   5898393331  0.025862      7  80b8bf92-b953-4c20-98ea-c9653ef2bb98   
0   WBTC   6977942217  0.000604    132  7e382157-b1bc-406d-b17b-facba43b716e   
0   RETH   3558687185  0.025322     30  d4b3c522-6127-4b89-bedf-83641cdcd2eb   

   largest_pool_tvl  largest_pool_pct_of_tvl  
0       22694796255                99.389067  
0        6111246697                53.438182  
0        5361948613                90.905240  
0        4136468186                59.279198  
0        3355782692                94.298333  


Importantly, note the _largest_pool_pct_of_tvl_, being the percentage of the total TVL (across all pools) that consists of the largest pool. We note that the largest pool is always significant enough to only look at the data from the largest pool as representative.

# Parameter estimation

## Determining the risk aversion co-efficient
We wish to determine the risk aversion coefficient, $\lambda$, that represents the amount of risk an investor is willing to take. Higher $\lambda$ means less risky, and conversely, lower $\lambda$ means higher risk tolerance.

In the Black-Litterman model used for Equities this is often set to 2.5, based on long-run estimates of equity risk premiums and volatility. Works as a good default for institutional settings.

We wish to adapt this for the defi market by considering a DeFi benchmark portfolio, to be used as an index $I$, using the same method by using:
$$
\lambda = \frac{\mathbb{E}[R_I]-r_f}{\sigma_P^2},
$$
where $R_I$ are the index (portfolio) returns, $r_f$ is the risk free rate, and $\sigma$ is the standard deviation of the index returns.

First we load the data for the largest assets that will constitute the index.

In [17]:
import os

from main_app.infrastructure.market_data import get_historical_data_for_symbol, load_symbol_to_address_mapping

symbols = largest_assets_df['symbol'].to_list()
largest_pools = largest_assets_df['largest_pool_id'].to_list()

df_tvl = pd.DataFrame()
df_apy = pd.DataFrame()
df_price = pd.DataFrame()

# Get current path and join with static data path
current_path = os.getcwd()
mapping_file_path = os.path.join(current_path, "..\static_data\symbol_to_contract_address_map.json")
symbol_to_address_mapping = load_symbol_to_address_mapping(mapping_file_path)

for symbol, pool in zip(symbols, largest_pools):
    historic_data_df = get_historical_data_for_symbol(symbol, symbol_to_address_mapping, [pool])

    # Convert the 'date' column to datetime
    historic_data_df['date'] = pd.to_datetime(historic_data_df['date'])

    # Select only the last entry for each day
    historic_data_df = historic_data_df.sort_values(by='date').groupby(historic_data_df['date'].dt.date).last()

    # Create TVL and APY series indexed by date
    df_tvl[symbol] = historic_data_df['tvlUsd']
    df_apy[symbol] = historic_data_df['apy'] / 100
    df_price[symbol] = historic_data_df['price'] / 100

# Only keep rows where there are no NaN values across all symbols
df_tvl.dropna(inplace=True)
df_apy.dropna(inplace=True)
df_price.dropna(inplace=True)

# Make sure that TVL, APY, and price dataframes align on the same set of dates
common_days = df_tvl.index.intersection(df_apy.index).intersection(df_price.index)
df_tvl = df_tvl.loc[common_days]
df_apy = df_apy.loc[common_days]
df_price = df_price.loc[common_days]

  mapping_file_path = os.path.join(current_path, "..\static_data\symbol_to_contract_address_map.json")
  mapping_file_path = os.path.join(current_path, "..\static_data\symbol_to_contract_address_map.json")


TypeError: expected str, bytes or os.PathLike object, not dict

Next we construct the index and perform the calcs to get $\lambda$

In [None]:
# Construct index and perform calcs

# Calculate daily log returns for df_price
df_price_log_returns = np.log(df_price / df_price.shift(1)).dropna().to_numpy()

# Calculate weight_market and target_return
mean_market_weight_vector = df_tvl.mean().to_numpy()

# create index and calculate prices and returns
index_0 = 100
index_weights = mean_market_weight_vector / sum(mean_market_weight_vector)
index_daily_returns = (index_weights.transpose() @ df_price_log_returns.transpose()) * (1 + df_apy.to_numpy().transpose() / 365.0)
index_prices = index_0 * np.cumprod(np.exp(index_daily_returns))

# calculate index mean return and std dev
index_mean_return = np.mean(index_prices)
index_std_dev = np.std(index_prices)

# specify risk-free rate
risk_free_rate = 0.02 / 365

# calculate lambda
l = (index_mean_return - risk_free_rate) / index_std_dev ** 2

# Calculate covariance matrix of target_return with itself
# cov_target_return = np.cov(target_return_matrix, rowvar=False, ddof=0)

# Calculate lambda
#numerator = mean_market_weight_vector.dot(mean_target_return_vector)
#denominator = mean_market_weight_vector.transpose() @ cov_target_return @ mean_market_weight_vector
#l = numerator / denominator

print(f"Lambda = {l}")