# Using the NonLinear Prior

## Basic Non-Linear Portfolio Optimization

### Background

### High-Level Outline

When using non-linear instruments we have to be careful about how we prepare the return distributions we feed into SKfolio's optimizations. For one, we will need to entirely dispose with the usual usage-pattern of feeding a history of returns to the optimization. The structure of these non-linear instruments necessarily means that their future return distributions should differ from their historical distributions (think of a bond whose duration decreases as it approaches maturity, and therefore the volatility of its returns also decreases). Therfore, instead of using return histories, we must use return-distributions designed for a specific point in time. In the case of a bond one approach may be:

1. Transform the bond price history into z-spreads using historical discount curves.
2. Compute the daily returns of the z-spreads.
3. Multiply the current z-spreads by the historical z-spread returns to derive a distribution for tomorrow's z-spread.
4. Take the distribution of tomorrow's z-spread and convert them to bond prices, using tomorrow as the pricing date, to derive a distribution of tomorrow's bond prices.
5. Divide the distribution of tomorrow's bond prices by today's bond prices to derive a distribution of bond-returns.
6. Feed this distribution of bond returns into an optimizer.

#### Using InCoviants

Note that this process relies on the key assumption that the distribution of the z-spread remains the same historically. Transforming prices of such non-linear instruments into quantities with a return distribution that, approximately, stays inCoviant historically, will be the general approach for generating the return distributions of such instruments. 

However, even if the returns of this "inCoviant" Coviable really are invariant, there remains one big bugbear that threatens to throw a wrench in the works of SKfolio's elegant design patterns: while SKfolio's optimizations library only take in returns without regard for absolute prices, the return of a derivative linked to an invariant depends on both the return of the invariant and its original magnitude. For example, a bond's duration changes with its price, meaning that its sensitivity to the return of the z-spread changes with the absolute value. Similarly, a stock option's delta changes with the absolute price of the underlying equity. So even if we have a return distribution for all the relevant invariants, we'll still need some reference prices before we're able to compute the return distribution of our actual portfolio. We will see later that with some careful manipulation of SKfolio's `Prior` interface this becomes relatively painless. 

#### What Type of Historical Return to Use?

However, this assumption may be inappropriate in many cases. Z-spreads, for example, are often thought to be mean-reverting. This means that taking 1-week or 1-month returns, for example, will yield very different price distributions than those you would get by assuming daily z-spread returns were i.i.d and extrapolating. Therefore, when we calculate the returns we will want to be able to specify the differencing period.

#### Using Different Return-Types for Different Instruments

In the above example we held rates constant. This may be reasonable if you have a low-cost, frequently-rebalanced duration hedging programme you're applying to your bond portfolio. But what if we're also interested in the interest rates risk? Then there is an important nuance to the returns on interest-rates we must consider. Because interest rates can go negative and the absolute changes in interest rates are largely independent of the current level of rates, linear returns ($\frac{\text{IR}_{t+1}}{\text{IR}_t} - 1$) are inappropriate. A better choice would be arithmetic returns: $\text{IR}_{t+1} - \text{IR}_t$. Thus we would like to be able to specify one type of return for a given set of market prices and another type of return for another set of market prices, and aggregate them all into one forward-looking return-distribution. 

#### A Note About Model Risk

Something to bear in mind is how this approach changes how we evaluate models out-of-sample. Here the approach would be to generate one return distribution using the **training** dataset and fit the optimizer on this distribution, and then to test the optimization we use the same methodology to generate a new return distribution using the **testing** dataset. Thus we are no longer evaluating the performance of the optimization on a return history either. This would be impossible because we are optimizing our portfolios for returns at a single point in time: there will only ever be one realized return at that point in time making it impossible to evaluate risk measures such as standard deviation. Using the return distribution generated using the out-of-sample dataset is, therefore, the only viable approach. However, this introduces a model risk that is not present when evaluating portfolio performance directly on the out-of-sample return history (as is the norm with equity portfolio optimizations). We are once again relying on the return history of the *invariant* vairable being random independent realizations of the same underlying probability distribution. This is never truly the case in financial time series, but it is necessary to make such an approximation. In short, be sure to understand why the returns of the invariant may not necessarily be i.i.d. and how it introduces a "model risk" into out-of-sample testing results. 

#### Handling Cashflows

Finally, there is the issue of cashflows. When we are generating a return distribution for our portfolio we must make sure not to exclude the return that comes from cash flows received during this period. What should we do with the cash we receive? One option is to set up a cash pocket and let it grow at rates. However, this would require us always feeding in overnight rate data and restructuring SKfolio's architecture a little to add cash pockets. In a perfect world we would model all cashflows precisely and reinvest them. However, this would require having the return distribution at the time of every cashflow, and setting up such a complex multi-period return distribution is likely not to be worthwhile for something that will likely only have a marginal influence on the ultimate portfolio weights. Simply adding the paid cashflows to the final prices computed in the price distribution should be a good enough approximation, even if we do miss out on some growth due to rates on the paid out coupon.

### Implementation

Notice that the above process makes important use of the notion of "today's" prices. This is not something we see in the usual equity-type usage of SKfolio. Notice also that we are using prices from instruments (interest rate swaps to build our discount curves) that are not actually in our portfolio. This is also a deviation from the usual SKfolio, where all instruments are assumed to be a part of the portfolio. These facts significantly complicate the design patterns needed to implement these transformations into the usual SKlearn / SKfolio framework.

The transformation of the prices and market context into a return distribution for the instruments in our portfolio is something we'd like to introudce as a pre-processing layer for the SKfolio optimizations that will also work as part of cross-validation. To do so, we have to design the necessary transformations such that they conform to the SKlearn `TransfomerMixin` interface. 

The `transform` method of the `TransformerMixin` takes in one dataframe and spits out one dataframe. Therefore, although we're dealing with data from many different sources, we'll keep what we can in one single large dataframe and keep what other data we need in global variables that may be updated as side effects of each transformer. Why not use metadata for passing in some of the data? The issue with metadata is that it cannot be conveniently pre-processed within the pipeline itself.

### Demo

#### Setting Up the Portfolio Instruments and Calculating Z-Scores

In [1]:
import QuantLib as ql
import pandas as pd
from sklearn.compose import make_column_selector
from typing import Callable, Any, Dict, Literal
import numpy as np
import plotly.express as px
from tqdm import tqdm
import datetime as dt

In [82]:
%load_ext autoreload
%autoreload 2

from skfolio import Population, RiskMeasure, Portfolio
from skfolio.preprocessing import prices_to_returns
from skfolio.datasets import load_bond_dataset, load_bond_metadata_dataset, load_usd_rates_dataset
from skfolio.prior import (NonLinearPrior, 
                           MarketContext, 
                           PortfolioInstruments, 
                           EmpiricalPrior, 
                           ReturnsProcessor,
                           BlackLitterman,
                           price_df,
                           calculate_sensis)
from sklearn.model_selection import train_test_split
from skfolio.optimization import ObjectiveFunction, MaximumDiversification, EqualWeighted, MeanRisk
from skfolio.moments import LedoitWolf
from skfolio.preprocessing import prices_to_returns
import sklearn

from quantlib_adapter import QLMarketContext, QLInstrumentAdapter, parse_ql_date

sklearn.set_config(enable_metadata_routing=True)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
PCT = 0.01
BPS = PCT * PCT

bond_prices = load_bond_dataset()
bond_metadata = load_bond_metadata_dataset()
rates = load_usd_rates_dataset().reindex(bond_prices.index).interpolate().add_prefix("rate_") * PCT

X = prices_to_returns(bond_prices)

In [4]:
def create_fixed_rate_bond(
        issue_date,
        maturity,
        coupon,
    ) -> ql.FixedRateBond:

    tenor = ql.Period(ql.Semiannual)
    calendar = ql.UnitedStates(ql.UnitedStates.GovernmentBond)
    business_convention = ql.Unadjusted
    date_generation = ql.DateGeneration.Backward
    month_end = False
    settlement_days = 0 # If settlement days is greater than 0, the dirty price of a bond will no longer equal its NPV
    face_value = 100
    day_count = ql.Thirty360(ql.Thirty360.BondBasis)
    
    schedule = ql.Schedule(
        parse_ql_date(issue_date),
        parse_ql_date(maturity),
        tenor,
        calendar,
        business_convention,
        business_convention,
        date_generation,
        month_end
    )
    bond = ql.FixedRateBond(
        settlement_days,
        face_value,
        schedule,
        [coupon * PCT],
        day_count
    )

    return bond

Now we have set up the code to create bonds in quantlib. However, SKfolio does not know anything about QuantLib. That's why we need our separate skfolio_quantlib_adapter package. It contains some relatively basic code that adapts the `Instrument` and `MarketContext` classes to work with QuantLib. All we need to do to make the two compatible here is to pass our QuantLib bonds to the `QLInstrumentAdapter` class constructor before the adding it to our `PortfolioInstruments`.

In [5]:
# Create a portfolio of instruments using the the parameters defined in the bond_metadata dataset
portfolio_instruments = PortfolioInstruments(**{
    isin: QLInstrumentAdapter(create_fixed_rate_bond(
        row["issue_date"],
        row["maturity_date"],
        row["coupon_rate"], 
    )) for isin, row in bond_metadata.iterrows()})

In [6]:
# Calculate the accrued coupons and add them to the clean prices to get dirty prices
accrued_coupons = {}
for date in bond_prices.index:
    ql.Settings.instance().evaluationDate = parse_ql_date(date)
    accrued_coupons[date] = {}
    for isin, bond in portfolio_instruments.items():
        accrued_coupons[date][isin] = bond.accruedAmount()

accrued_coupons = pd.DataFrame.from_dict(accrued_coupons, orient="index")
dirty_bond_prices = bond_prices + accrued_coupons

In [7]:
# Keep track of obersvables used in quantlib pricings
ql_env = {}

# Create discount curve from SOFR OIS rates
index = ql.Sofr()
settlement_days = 2

ois_helpers = ql.RateHelperVector()

cal = ql.BespokeCalendar("my-cal")
cal.addWeekend(ql.Saturday)
cal.addWeekend(ql.Sunday)

for tenor, quote in rates.iloc[0].items():
    q = ql.SimpleQuote(quote)
    ql_env[tenor] = q
    ois_helpers.append(
        ql.OISRateHelper(
            settlement_days,
            ql.Period(tenor),
            ql.QuoteHandle(q),
            index,
            paymentFrequency=ql.Annual,
        )
    )

sofr_curve = ql.PiecewiseLinearZero(
    0,
    cal,
    ois_helpers,
    ql.Actual360(),
)

discount_curve = ql.RelinkableYieldTermStructureHandle(sofr_curve)

times = np.linspace(0.0, discount_curve.maxTime(), 2500)
px.line(x=times, y=[discount_curve.zeroRate(t, ql.Continuous).rate() for t in times])

In [8]:
# Set up the pricing engines for each bond in the portfolio and link them to a zero spreaded curve.
# We store the curve and z-spread quote in the ql_env dictionary for later use
for isin, bond in portfolio_instruments.items():
    spread_quote = ql.SimpleQuote(0.0)
    risky_curve = ql.RelinkableYieldTermStructureHandle(ql.ZeroSpreadedTermStructure(discount_curve, ql.QuoteHandle(spread_quote)))
    ql_env[f"curve_{isin}"] = risky_curve
    bond.setPricingEngine(ql.DiscountingBondEngine(risky_curve))
    ql_env[f"z_spread_{isin}"] = spread_quote

In [9]:
def calculate_z_spread(isin, market_price):
 
    # The root-finding function for finding spread that makes the bond 
    # price close to market price

    spread = ql_env[f"z_spread_{isin}"]
    bond = portfolio_instruments[isin]

    def z_spread_func(z_spread):
        spread.setValue(z_spread)
        return bond.NPV() - market_price
     
    # Create and configure the Brent solver 
    accuracy = 1e-6
    min = -1e-5
    max = 0.1
    solver = ql.Ridder()
    solver.setMaxEvaluations(10000)
 
    # Solve the spread
    z_spread = solver.solve(z_spread_func, accuracy, spread.value(), min, max)
 
    return z_spread

In [10]:
# Convert the bond prices to z_spreads

z_spreads = {}

for date, row in tqdm(list(dirty_bond_prices.join(rates, how="inner").iterrows())):
    market_context = QLMarketContext(date, ql_env=ql_env, **row)
    market_context.update_ql_env()
    z_spreads[date] = {}
    for id, market_price in row[bond_prices.columns].items():
        isin = id.replace("price_", "")
        z_spread = calculate_z_spread(isin, market_price)
        z_spreads[date]["z_spread_" + isin] = z_spread

z_spreads = pd.DataFrame.from_dict(z_spreads, orient="index")
px.line(z_spreads)

100%|██████████| 1231/1231 [01:23<00:00, 14.82it/s]


We're going to do a quick sanity check to make sure we've computed the z-spreads correctly. To do so, we reprice our bonds using our z-spreads and see if our new prices match the original prices within a given error. To do so, we'll be using a little convenience function I've added to SKfolio called `price_df`. What it does is it loops over every row in a dataframe, updates the reference_market_context provided with the data in the row, and reprices the portfolio. It then concatenates all the new portfolio prices into a new df and returns it. 

In [11]:
# Sanity check - reprice bonds using calculated z_spreads
repriced_bonds = price_df(z_spreads.join(rates), portfolio_instruments, QLMarketContext(ql_env=ql_env))

# Check max repricing error
max_pricing_error = np.max(np.abs(repriced_bonds - dirty_bond_prices))
print("Largest pricing error:", max_pricing_error)

# These bonds are quoted to tenths of a cent, thus the maximum repricing error cannot be more than 0.0005 (half a cent)
assert max_pricing_error < 0.0005

Largest pricing error: 1.2663235764875935e-05


Now that we have our pricers set up and transformed our data into something a little closer to an "invariant," we can start building return distributions for our portfolio. We do this using a new prior I introduced, the `NonLinearPrior`. The `NonLinearPrior` is fitted with a history of market quotes, not a history of returns. It then takes the latest market quotes (the reference quotes) and uses it to price the portfolio. These are the reference prices with respect to which returns will be measured. The same reference quotes used to make are reference prices are then multiplied by the historical returns of the quotes to build a distribution of the following day's market quotes. On this distribution of market_quotes we call the `price_df` function to turn them back into prices and then simply divide them by our reference_prices to create a return distribution. 

In [12]:
pricing_context = QLMarketContext(ql_env=ql_env).update_from_series(z_spreads.join(rates).iloc[-1])

prior = NonLinearPrior(
        portfolio_instruments,
        reference_market_context=pricing_context,
    )
returns = pd.DataFrame(prior.fit(X, market_quotes=z_spreads).return_distribution_.returns,
                        columns=bond_prices.columns)

px.histogram(returns, labels={"value": "Bond Return"})

As a sanity check to make sure the above actually works, we compare the above price distribution to that created using the first-order approximation of the bond returns. The first-order approximation is simply the change in z-spread multiplied by the given bond's duration.

In [13]:
z_spread_movements = (z_spreads.iloc[-1] * (1 + prices_to_returns(z_spreads))) - z_spreads.iloc[-1]
durations = calculate_sensis(pricing_context, portfolio_instruments, keys=z_spreads.columns)
estimated_bond_movements = z_spread_movements @ durations
px.histogram(estimated_bond_movements, labels={"value": "Bond Return"})

It looks pretty similar, but as one final check let's plot the difference in the returns estimated by our NonLinearPrior and our first-order approximation against the size of the z-spread move:

In [14]:
z_spread_movements.columns = estimated_bond_movements.columns
df = pd.concat([
        z_spread_movements.stack().droplevel(0), 
        estimated_bond_movements.stack().droplevel(0), 
        returns.stack().droplevel(0)
    ], keys=[
        "z_spread_movement",
        "estimated_bond_movement", 
        "repriced_bond_movement"],
    axis=1) \
    .reset_index().rename({"index": "isin"}, axis=1)

px.scatter(
    x=df["z_spread_movement"],
    y=df["estimated_bond_movement"] - df["repriced_bond_movement"], 
    color=df["isin"], title="Approximation Error vs Z-Spread Movement",
    labels={"x": "Z-Spread Movement", "y": "Approximation Error"})

As expected, the error is close to zero when the z-spread movement is close to 0 and grows quadratically with the size of the z-spread movement. This is just as expected for a first-order approximation and shows that our pricer is accurately capturing the convexity of the bonds.

#### Most Basic Example -- Using Z-Scores as Market Quotes

In [93]:
pricing_context = QLMarketContext(ql_env=ql_env).update_from_series(z_spreads.join(rates).iloc[-1])

max_div_model = MaximumDiversification(
    prior_estimator=NonLinearPrior(
        portfolio_instruments,
        reference_market_context=pricing_context,
    ).set_fit_request(market_quotes=True)
)

max_div_model.fit(X, market_quotes=z_spreads)
return_dist = pd.DataFrame(max_div_model.prior_estimator_.return_distribution_.returns, columns=max_div_model.feature_names_in_)
max_div_portfolio = max_div_model.predict(return_dist)
max_div_portfolio.plot_composition()

In [63]:
max_div_portfolio.plot_returns_distribution()

#### Estimating the Moments of the Market Quotes

We would like to be able to use SKfolio's large suite of moment estimators for the prior of the market_quotes. But how can we relate the market_quotes' moments to the moments of the return distribution of our portfolio instruments? We can only rely on the empirical distribution when our moment estimators are the empirical estimators `EmpiricalMu` and `EmpiricalCovariance`. Otherwise, we will have to make the following approximation. Let $X$ be a vector of market quotes, and $f(X): \mathbb{R}^n \rightarrow \mathbb{R}^k$ is a function that maps the market quotes to the prices of the instruments in our portfolio. We are interested in $\mathbb{E}[f(X)]$ and $\text{Cov}[f(X)] = \mathbb{E}\left[(f(X) - \mathbb{E}[f(X)])(f(X) - \mathbb{E}[f(X)])^T\right]$. If we assume our pricing function is approximately linear $f(X) = a_{(k \times n)}X_{(1 \times n)}+b_{(1 \times k)}$ then we have $\mathbb{E}[f(X)] = a \mathbb{E}[X] + b$ and $\text{Cov}[f(X)] = \text{Cov}[aX + b] = \text{Cov}[aX] = a\text{Cov}[X]a^T$. 

If we wished we could even use a second-order approximation for $\mathbb{E}[f(X)]$. If $f(X) \approx a + bX + cXX^T$ then $\mathbb{E}[f(X)] = a + b\mathbb{E}[X] + c\mathbb{E}[XX^T] = a + b\mathbb{E}[X] + c(\text{Cov}[X] + \mathbb{E}[X]\mathbb{E}[X]^T)$. The more moments of $X$ we have available to us, the better our approximation for the moments of $f(X)$. For now, SKfolio's prior API only includes two moments, so that's as far as we can go for now. However, one can still manipulate higher moments of the distribution of $X$ directly using `EntropyPooling`.

To demonstrate this manipulation of the moments of the market_quotes we decrease the expected returns of the z_spread of one of the bonds in the portfolio using the `BlackLittermanPrior`. We also try to improve the conditioning of the covariance matrix by applying LedoitWolf shrinkage to the MarketQuote returns covariance matrix.

In [100]:
max_sharpe_model = MeanRisk(
    risk_measure=RiskMeasure.STANDARD_DEVIATION,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=NonLinearPrior(
        portfolio_instruments,
        reference_market_context=pricing_context,
        market_quotes_prior=BlackLitterman(
            views=[
                "z_spread_US606822BH67 == -0.02"
            ],
            prior_estimator=EmpiricalPrior(
                covariance_estimator=LedoitWolf()
            )
        )
    ).set_fit_request(market_quotes=True)
)

max_sharpe_model.fit(X, market_quotes=z_spreads)
return_dist = pd.DataFrame(max_sharpe_model.prior_estimator_.return_distribution_.returns, columns=max_sharpe_model.feature_names_in_)
max_sharpe_portfolio = max_sharpe_model.predict(return_dist)
max_sharpe_portfolio.plot_composition()

In [99]:
print("Condition number of Max Sharpe return covariance:", 
      np.linalg.cond(max_sharpe_model.prior_estimator_.return_distribution_.covariance),
      "\nCondition number of Max Div return covariance:", 
      np.linalg.cond(max_div_model.prior_estimator_.return_distribution_.covariance)
      )

Condition number of Max Sharpe return covariance: 216.3440252250777 
Condition number of Max Div return covariance: 464.05106706696336


As expected, the max-sharpe portfolio allocates most of its portfolio to the bond whose z-spread is expected to shrink significantly. Furthermore, the covariance matrix of the returns of the portfolio instruments is better conditioned in the max-sharpe model than in the max-diversification model since we shrunk the covariance of the market quote returns in the former.

#### Adding Rates to the Market Quotes

So far we've only built our returns distribution using z-spread movements, but what if we want to add rates? Doing so is fairly straightforward -- we only have to join the rates df to the z_spreads df and pass the joined df to the market_quotes metadata parameter when fitting. However, we're going to make it a little trickier by setting rates to be almost 0 accross the curve.

In [64]:
# Update the discount curve to be almost 0

pricing_context.update({
    rate: 1 * BPS # Quantlib does not like it when spreads are zero
    for rate in rates.columns
})
pricing_context.update_ql_env()

For reference, we first create a basic returns distribution using only z-spreads. Below we plot the distribution of the returns of the equally weighted portfolio.

In [65]:
prior = NonLinearPrior(
        portfolio_instruments=portfolio_instruments,
        reference_market_context=pricing_context,
    ).set_fit_request(market_quotes=True)

prior.fit(X, market_quotes=z_spreads)
return_dist = pd.DataFrame(prior.return_distribution_.returns, columns=model.feature_names_in_)
Portfolio(return_dist, weights = np.ones(len(return_dist.columns)) / len(return_dist.columns)).plot_returns_distribution()

Now we add rates to our market_quotes. Notice that the distribution does not change. Obviously we're doing something wrong here -- clearly our portfolio should be more volatile when it is also exposed to rates and not just z-spreads. 

What's happening is that because our reference rates are essentially zero, when we build our market_quotes distribution by multiplying the reference rates by the historical returns of the rates, the rates remain pretty much zero. 

In [66]:
prior.fit(X, market_quotes=z_spreads.join(rates))
return_dist = pd.DataFrame(prior.return_distribution_.returns, columns=model.feature_names_in_)
Portfolio(return_dist, weights = np.ones(len(return_dist.columns)) / len(return_dist.columns)).plot_returns_distribution()

A better way to handle this scenario is to use the arithmetic returns $\text{IR}_t - \text{IR}_{t-1}$ of the rates instead of the linear returns $\frac{\text{IR}_t}{\text{IR}_{t-1}} - 1$. To do so we pass a `ReturnsProcessor` instance to the NonLinearPrior. When we create the `ReturnsProcessor` instance we define what returns are to be applied to which columns (among other parameters) and then the NonLinearPrior calls its `df_to_returns` method to convert `market_quotes` to returns and then converts the returns back to quotes again using the `returns_to_df` method. 

In [67]:
select_rates = make_column_selector(pattern="^rate_")
select_z_scores = make_column_selector(pattern="^z_score_")
return_types = {
    select_rates: "arithmetic",
    select_z_scores: "linear"
}

prior = NonLinearPrior(
        portfolio_instruments=portfolio_instruments,
        returns_processor=ReturnsProcessor(return_types=return_types),
        reference_market_context=pricing_context,
    ).set_fit_request(market_quotes=True)

prior.fit(X, market_quotes=z_spreads.join(rates))
return_dist = pd.DataFrame(prior.return_distribution_.returns, columns=model.feature_names_in_)
Portfolio(return_dist, weights = np.ones(len(return_dist.columns)) / len(return_dist.columns)).plot_returns_distribution()

Something worth bearing in mind is that the `ReturnsProcessor` only affects how the `NonLinearPrior` computes the distribution of the market quotes internally and does not affect the type of return used for the ultimate return distribution of the portfolio instruments. The NonLinearPrior only ever uses linear returns for its return distribution. 

#### Stress Test with Vine Copula

In [108]:
from skfolio.distribution import VineCopula
from skfolio.prior import SyntheticData, NonLinearPrior

vine = VineCopula(
    central_assets=["z_score_US606822BR40"], n_jobs=-1, random_state=0
)

def get_copula_model_returns(**copula_params) -> pd.DataFrame:
    return NonLinearPrior(
        portfolio_instruments=portfolio_instruments,
        reference_market_context=pricing_context,
        market_quotes_prior=SyntheticData(
            n_samples=10_000,
            sample_args=copula_params,
        )
    ).set_fit_request(market_quotes=True).fit(X, market_quotes=z_spreads).return_distribution_.returns


portfolio_weights = model.weights_ # Reuse previous portfolio's weights
Population([
    Portfolio(get_copula_model_returns(), 
              name="Unstressed Vine Copula Model", weights=portfolio_weights),
    Portfolio(get_copula_model_returns(conditioning={"z_spread_US606822BR40": 0.2}),
               name="Stressed Vine Copula Model", weights=portfolio_weights),
]).plot_returns_distribution()


When performing conditional sampling, it is recommended to set conditioning assets as central during Vine Copula construction. The following conditioning assets were not set as central: {0}. This can be achieved by using the `central_assets` parameter.



### Using Curves

A general feature of trading (most) non-linear instruments is curves. Rates traders think about swap curves, credit traders about z-spread curves or default probability curves, commodities traders the futures curve, options traders the volatility surface, etc. In fact, it may be preferable to abandon directly using the price history of the given instruments in our portfolio altogether when estimating return distributions for use in portfolio optimizations. 

In the basic example given previously we transformed our bond price history into a distribution of z-spread returns which we treated as an "invariant," that is to say an i.i.d random variable. Now our invariants will be the returns of our credit curve. Bear in mind that the returns of our credit curve are computed from the prices of whichever bonds we like and those bonds may not even include the bonds in our portfolio!  

Before getting into that, we should clarify why one might like to use curves at all. Since the risk profile of instruments such as bonds change with their maturity, it is sensible that their pricing parameters should change as well. If we purely use the price history of a given bond to project its future returns (when it will have a shorter maturity) we will completely be ignoring how the market prices the risk of shorter-dated bonds. How the risk of a short-dated bond is priced versus that of a long-dated may be very different. Typically markets reward longer-dated bonds (which carry more risk) with a higher yield over risk-free rates (z-spread). A part of the carry of this bond is, indeed, its yield, but also the price increase from it becoming shorter-dated, with a correspondingly lower z-spread. We need to understand both to properly compute the carry of the bond, and the latter requires a curve. 

Normally using the price history of short-dated bonds is tricky as, by definition, they don't have a long price history! If we want to run an optimization based on 5-years of data, we will necessarily only be able to use bonds with a maturity greater than 5-years. However, if we instead use a history of curves for our optimization, we don't have to worry about the expiries of the underlying bonds used to generate the curve on any gievn historical date. On each date, we simply pick the most suitable bonds available at the time. Thus our curve price history can stretch back for as long as there are enough bonds in our dataset -- we are no longer limited by issue dates or expiration dates of any given bond in particular. 

#### Dimensionality Reduction

When assessing the risk of a portfolio with respect to a curve, traders typically talk about risk to parallel curve moves or steepening/flattening/rotation moves, or sometimes even curve twists, but less often do they talk about moves in specific points on the curve. This is because individual points on a curve rarely move idiosyncratically. Typically most of the variance in curve movements, often on the order of 90+ percent, is in parallel shifts of the curve, and in steepening/flattening moves. This is typically shown using PCA, where the first PCA vector corresponds very closesly to a parralel move, the next a flattening/steepening move, and the third a twisting move. Impressive considering PCA is entirely non-parametric. However, the implementation of PCA in the setting of a credit-curve is problematic because the instruments used to construct the curve vary over time. It works well for futures curves, where one typically ignores the individual contracts when performing PCA and thinks of contracts only in terms of first-month, second-month, etc. This gives you a fixed number of assets when doing PCA for futures, even if the actual underlying instruments changed many times over the given time period. One may think of applying a similar approach when building a bond curve, having the 1-year duration bond, 2-year duration bond etc., but as old bonds expire and new bonds are issued there is no guarantee we'll find a suitable bond of each duration to fit our curve. Not just the identity of the individual underlying bonds will change throughout our price history, but the absolute number of bonds used to fit the curve may vary significantly as well.

There are a few ways to tackle this problem. One is to parameterize the curve (think Nelson-Siegel, Svensson, etc.) and consider the returns of the parameters. Although I have not tested out this procedure myself I worry whether the notion of a linear return or arithmetic return maps well onto the parameter values. Furthermore, reducing dimensionality by using an interpolation with fewer parameters necessarily means that your curve no longer reprices the bonds used to create it. Another approach is functional PCA. The downside to this approach is that I do not understand it. Lastly, there is the straightforard appraoch of first interpolating your curve, then reading off the interpolated values at a set of fixed tenors (1Y - 10Y with 1Y intervals, for example), and using the returns at these fixed points to perform PCA. Creating a new curve based on the interpolated values from the original curve at the given arbitrary grid of points means the new curve will generally not reprice the original bonds. A small workaround is to add the specific tenors of the bonds you want to reprice to the curve. However, in many cases we may want there to be some basis between the bond prices and the curve we construct. 

#### Handling Basis to the Curve

Often with curves (interest rates being the typical example) we bootstrap them from market prices, meaning we ensure that the resulting curve reprices all the instruments used to construct the curve. However, this may be impossible or undesirable in many cases. Often times, the curve may be "overdetermined" by the market prices we use to construct the curve i.e. there are not enough free parameters given by our interpolation method to ensure the curve reprices all instruments. We may deliberately want to impose a more restrictive parameterization on our curve for any number of reasons, possibly for ease of interpretation or because some believe in the fundamental justification of a given curve parameterization. There is typically a reasonably high variance of spreads within a given sector and credit rating, so bootstrapping methods tend to produce "jagged" curves that poorly align with our fundamental intuition, so it is typically desirable to employ a stricter parameterization of the credit curve. Whatever the reason, we may often times face the problem of our curve not repricing our portfolio accurately. 

Two approaches come to mind to tackle this problem. One is to keep track of how the basis changed historically and use that when building the return distribution. This has the downfall that if the basis is currently large compared to its history it may overstate the probability of the basis widening. This relies on the common assumption that basis with respect to the credit curve is mean-reverting in the long-term. using the The second approach to handling the basis is to assume that the basis versus the curve will tend to 0 at the maturity of the bond and decrease linearly with time. If $T_M$ is the bond's maturity, $T_D$ is the time for which we are creating our return distribution, and $t$ is the reference time, we would use a basis ($B$) of $\left( 1 - \frac{T_D - t}{T_M - t} \right ) B$. This latter type of methodology is appropriate if one is using the curve as an indicator of relative-value, whereas the former approach is likely better suited for risk-management.

#### Generating Our Invariants

In [None]:
curve_nodes = pd.DataFrame()
curve_basis = pd.DataFrame()

#### Basic Curve Setting

In [None]:
from skfolio.prior import EntropyPooling

X = curve_nodes.join(curve_basis)
X_train, X_test = train_test_split(X, test_size=0.5, shuffle=False)
pricing_context = MarketContext(date=dt.date(2016,7,26), discount_curve=discount_curve)

model = MeanRisk(
    risk_measure=RiskMeasure.STANDARD_DEVIATION,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    portfolio_params=dict(name="Max Sharpe", annualized_factor=252),
    prior_estimator=NonLinearPrior(
        portfolio_instruments=portfolio,
        reference_market_context=pricing_context,
        prior_estimator=EntropyPooling(
            mean_views=[
                "basis_US606822BR40": -0.05,
            ]
        )
    )
)

model.fit(X_train)
model.predict(X_test).plot_returns_distribution()

#### Adding Relative Value Views with Entropy Pooling

In [None]:
from sklearn.compose import make_column_selector
from typing import List

select_rates = make_column_selector(pattern="^rate_")
select_z_scores = make_column_selector(pattern="^z_score_")

def column_selectors_to_groups(
        selectors: Dict[make_column_selector, str | List[str] ], 
        X: pd.DataFrame
    ) -> Dict[str, List[str]]:

    groups = {}
    for selector, group_names in selectors.items():
        if isinstance(group_names, str):
            group_names = [group_names]
        selected_columns = selector(X)
        for column in selected_columns:
            groups[column] = groups.get(column, []) + group_names

    return groups

In [None]:
from skfolio.prior import EntropyPooling

X = curve_nodes.join(curve_basis)
X_train, X_test = train_test_split(X, test_size=0.5, shuffle=False)
pricing_context = MarketContext(date=dt.date(2016,7,26), discount_curve=discount_curve)

groups = column_selectors_to_groups({
    select_z_scores: "z_scores",
    select_rates: "rates"
}, X)

model = MeanRisk(
    risk_measure=RiskMeasure.STANDARD_DEVIATION,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    portfolio_params=dict(name="Max Sharpe", annualized_factor=252),
    prior_estimator=NonLinearPrior(
        portfolio_instruments=portfolio,
        reference_market_context=pricing_context,
        prior_estimator=EntropyPooling(
            mean_views=[
                "basis_US606822BR40 == -0.05",
                "z_scores == 0",
                "rates == 0"
            ],
            groups=groups
        )
    )
)

model.fit(X_train)
model.predict(X_test).plot_returns_distribution()

Ellipsis

#### Stressing Curve Principal Components with Entropy Pooling

##### Stressing the First Principal Component with a Curve Steepener Trade

In [None]:
from skfolio.utils.figure import plot_kde_distributions
from plotly.io import show

prior = NonLinearPrior(
        portfolio_instruments=portfolio,
        reference_market_context=pricing_context,
        prior_estimator=EntropyPooling(
            mean_views=[
                "pca_1 == -0.20",
                "pca_2 == 0", # Only look at the effect of first PC
                "rates == 0" # Isolate effect from rates
            ],
            groups=groups
        )
    ).fit(X)

fig = plot_kde_distributions(
    prior.return_distribution_,
    sample_weight=prior.sample_weights_,
    percentile_cutoff=0.1,
    title="Distribution of Asset Returns (Prior vs. Posterior)",
    unweighted_suffix="Prior",
    weighted_suffix="Posterior",
)
show(fig)

##### Stressing the Second Principal Component with a Curve Steepener Trade

In [None]:
prior = NonLinearPrior(
        portfolio_instruments=portfolio,
        reference_market_context=pricing_context,
        prior_estimator=EntropyPooling(
            mean_views=[
                "pca_1 == 0", # Only look at the effect of second PC
                "pca_2 == -0.20",
                "rates == 0"
            ],
            groups=groups
        )
    ).fit(X)

fig = plot_kde_distributions(
    prior.return_distribution_,
    sample_weight=prior.sample_weights_,
    percentile_cutoff=0.1,
    title="Distribution of Asset Returns (Prior vs. Posterior)",
    unweighted_suffix="Prior",
    weighted_suffix="Posterior",
)
show(fig)

## Scenario-Based Portfolio Optimization

A simpler approach for optimizing credit portfolios than estimating return distributions is using a scenario-based optimization. A common feature in risk management is defining stressed market scenarios and applying them to one's portfolio to see how it holds up. This same methodology can be applied to portfolio optimization. Here we follow the methodlogy set out by [Martin, 2020](https://arxiv.org/abs/2004.02312).

An important distinction between the rest of SKfolio's suite of optimizations is that scenario-based optimizations do not estimate the probabilities of each scenario. 
MAKE SURE IT'S POSSIBLE TO USE `GridSearchCV` WITH THE SCENARIOS

### Overview

### Some Example Scenarios



In [None]:
### Basic design pattern

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer, make_column_selector, ColumnSelector
from skfolio.optimization import InverseVolatility, MeanRisk, ObjectiveFunction
from skfolio import Population, RiskMeasure
from skfolio.prior import EmpiricalPrior, BasePrior
from skfolio.moments import ShrunkMu
from sklearn.model_selection import train_test_split



class PricesToZScores(MarketDataProcessor):
    pass

class ZScoresToPrices(MarketDataProcessor):
    pass


RETURN_TYPES = Literal["linear", "log", "arithmetic", "zero"]

class NonLinearReturns(BasePrior):
    def __init__(self,
                 portfolio_instruments: PortfolioInstruments,
                 return_type: RETURN_TYPES | Dict[str, RETURN_TYPES] | Dict[make_column_selector, RETURN_TYPES] = "linear",
                 reference_prices: Literal["last"] | pd.Series = "last",
                 prior_estimator: BasePrior = EmpiricalPrior(prices_to_returns="linear"),
                 reference_market_context: MarketContext | None = None,
                 ):
        self.portfolio_instruments = portfolio_instruments
        self.return_type = return_type  
        self.reference_prices = reference_prices
        self.prior_estimator = prior_estimator

    def fit(self, X: pd.DataFrame, y: pd.DataFrame | None = None):
        return self


z_scores = ZScoresToPrices().fit_transform(X)

# Most basic example

X = z_scores
X_train, X_test = train_test_split(bond_prices.join(rates), test_size=0.33, shuffle=False)
pricing_context = MarketContext(date=dt.date(2016,7,26), discount_curve=discount_curve)

model = MeanRisk(
    risk_measure=RiskMeasure.STANDARD_DEVIATION,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    portfolio_params=dict(name="Max Sharpe", annualized_factor=252),
    prior_estimator=NonLinearReturns(
        portfolio_instruments=portfolio,
        reference_market_context=,
    )
)

model.fit(X)

# Use z-scores as invariants but keep rates constant
X = z_scores.join(rates)

select_rates = make_column_selector(pattern="^rate_")
select_z_scores = make_column_selector(pattern="^z_score_")
return_types = {
    select_rates: "arithmetic",
    select_z_scores: "linear"
}

MeanRisk(
    risk_measure=RiskMeasure.STANDARD_DEVIATION,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    portfolio_params=dict(name="Max Sharpe", annualized_factor=252),
    prior_estimator=NonLinearReturns(
        portfolio_instruments=portfolio,
        return_type=return_types,
        prior_estimator=EmpiricalPrior(
            mu_estimator=ShrunkMu(),
            prices_to_returns=return_types
        )
    )
)

# Basic Curve Methodology
Pipeline([
    ("Curve Nodes from Bond Prices and Rates", SaveMarketContext(reference_market_context)),
    ("Calculate Portfolio Basis to Curve", SaveMarketContext(reference_market_context)),
    ("Set reference curve, rates, and bases", SaveMarketContext(reference_market_context)),
    ("Drop Rates", ColumnTransformer([("Drop Rates", 'drop', select_rates)], remainder='passthrough')),
    ("Get Curve Node Distributions", ExtrapolateReturns()),
    ("Reconstruct Bond Prices from Curve Nodes and Bases", ZScoresToPrices(reference_market_context=reference_market_context)),
    ("Compute Returns", PricesToReturns())
    ])

# Use entropy pooling to 

SyntaxError: invalid syntax. Perhaps you forgot a comma? (3026828915.py, line 32)

## Entropy Pooling, Synthetic Data, and Stress Testing

We want to be able to take two of the most powerful tools in the SKfolio toolbox: entropy pooling and vine copulas, and apply them directly to the invariant returns. 

1. Stress Test on the Z-Scores of Bond Portfolio

2. Add Mean Reversion View to Curve Basis

3. Stress Test Principal Component of Curve Returns

## Monte-Carlo

Thus far we have focused exclusively on return distributions at a given point in time while entirely ignoring how prices might move between now and that one point in time. This is ultimately quite limiting in the world of credit. As we discussed before, the question of reinvesting coupons introduces a mild path-dependency. But path dependency becomes more explicit with products such as callable bonds (not to mention the prevalance of path-dependent products outside of credit e.g. american options and a whole universe of exotic derivatives). Additionally, we have entirely neglected the very path-dependent event of default! Thus far we have been largely non-parametric in our construction of return distributions, however to model return paths of these products one needs to impose some more structure. 

Even when we are not considering products with path-dependencies, risk measures such as maximum drawdown are path-dependent. However, it is not clear how the current optimization framework may be adapted to use multiple paths. 


## References

[Fixed income portfolio optimisation: Interest rates, credit, and the efficient frontier, Richard J. Martin, 2020](https://arxiv.org/abs/2004.02312)

In [None]:
from sklearn.base import BaseEstimator, TransformerMixin
from skfolio.prior import BasePrior
from pandas.tseries.frequencies import to_offset

class TransformPrice(TransformerMixin, BaseEstimator):
    def __init__(self,
                portfolio_dict: Dict[str, ql.Instrument],
                quote_dict: Dict[str, ql.SimpleQuote],
                update_dates: bool = True,
                 ):
        self.portfolio_dict = portfolio_dict
        self.quote_dict = quote_dict
        self.update_dates = update_dates
    
    def update_market_data(self, market_quotes: Dict[str, float]):
        for quote_id, quote in self.quote_dict.items():
            quote.setValue(market_quotes[quote_id])

    def fit(self, X: pd.DataFrame, y: pd.DataFrame | None = None):
        return self
    
    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
        # Save original market data
        original_date = ql.Settings.instance().evaluationDate
        original_quotes = {quote_id: quote.value() for quote_id, quote in self.quote_dict.items()}

        df_rows = X.to_dict(orient='records')
        for i, row in tqdm(list(enumerate(df_rows))):
            if self.update_dates:
                try:
                    raw_date = X.index[i]
                    date = parse_ql_date(raw_date)
                except Exception as e:
                    raise ValueError(f"Could not parse date: {raw_date}, {e}")
                
                # Add a check here to see if the eval date changed because Quantlib does not 
                # check if it's the same date and the update can be quite slow
                if ql.Settings.instance().evaluationDate != date:
                    ql.Settings.instance().evaluationDate = date

            self.update_market_data(row)
            self.transform_row(row)

        # Restore original market data

        # Add a check here to see if the eval date changed because Quantlib does not 
        # check if it's the same date and the update can be quite slow
        if ql.Settings.instance().evaluationDate != original_date:
            ql.Settings.instance().evaluationDate = original_date

        for quote_id, quote in self.quote_dict.items():
            quote.setValue(original_quotes[quote_id])

        return pd.DataFrame.from_records(df_rows, index=X.index)
    
    @abstractmethod
    def transform_row(self, row: Dict[str, float]):
        pass

class QuoteToPrice(TransformPrice):
    def transform_row(self, row: Dict[str, float]):
        for security_id, instrument in self.portfolio_dict.items():
            row[security_id] = instrument.NPV()

class PriceToQuote(TransformPrice):
    def __init__(self,
                portfolio_dict: Dict[str, ql.Instrument],
                quote_dict: Dict[str, ql.SimpleQuote],
                price_transformer: Callable[[ql.Instrument, float], float],
                update_dates: bool = True,
                 ):
        self.price_transformer = price_transformer
        super().__init__(portfolio_dict, quote_dict, update_dates)

    def transform_row(self, row: Dict[str, float]):
        for security_id, instrument in self.portfolio_dict.items():
            row[security_id] = self.price_transformer(instrument, row[security_id])

class PricesToReturns(TransformerMixin, BaseEstimator):
    def __init__(self,
                periods: int = 1,
                freq: str | None = None,
                return_type: Literal["linear", "log", "arithmetic"] = "linear"
                ):
        self.periods = periods
        self.freq = freq
        self.return_type = return_type

    def fit(self, X: pd.DataFrame, y: pd.DataFrame | None = None):
        return self
    
    def linear_returns(self, X: pd.DataFrame) -> pd.DataFrame:
        if np.any(X <= 0):
            raise ValueError("Prices must be positive to compute returns.")

        return X.pct_change(freq=self.freq, periods=self.periods).iloc[self.periods:]
    
    def log_returns(self, X: pd.DataFrame) -> pd.DataFrame:
        return np.log1p(self.linear_returns(X))
    
    def arithmetic_returns(self, X: pd.DataFrame) -> pd.DataFrame:
        return X.diff(periods=self.periods)
    
    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
        if self.return_type == "linear":
            return self.linear_returns(X)
        elif self.return_type == "arithmetic":
            return self.arithmetic_returns(X)
        elif self.return_type == "log":
            return self.log_returns(X)
        else:
            raise ValueError(f"Unknown return_type: {self.return_type}")

class EmpiricalPriceDistribution(TransformerMixin, BaseEstimator):
    def __init__(self,
                 periods: int = 1,
                 freq: str = "D",
                 return_type: Literal["linear", "log", "arithmetic"] = "linear"
                 ):
        self.periods = periods
        self.freq = freq
        self.return_type = return_type

    def fit(self, X: pd.DataFrame, y: pd.DataFrame | None = None):
        return self
    
    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
        returns = PricesToReturns(
            periods=self.periods,
        ).transform(X)

        output_index = None
        if isinstance(X.index, pd.DatetimeIndex):
            last_date = max(X.index)
            reference_prices = X.loc[last_date]
            output_date = last_date + to_offset(self.freq) * self.periods
            output_index = pd.Index([output_date] * len(returns), name=returns.index.name)
        else:
            reference_prices = X.iloc[-1]

        if self.return_type == "arithmetic":
            prices = reference_prices.values + returns.values
        else:
            prices = reference_prices.values * (1 + returns.values)

        return pd.DataFrame(prices, index=output_index, columns=X.columns)

class EmpiricalReturnDistribution(TransformerMixin, BaseEstimator):
    def __init__(self,
                 portfolio_dict: Dict[str, ql.Instrument],
                 return_type: Literal["linear", "log", "arithmetic"] = "linear",
                 ):
        self.return_type = return_type
        self.portfolio_dict = portfolio_dict

    def fit(self, X: pd.DataFrame, y: pd.DataFrame | None = None):
        return self
    
    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
        reference_prices = pd.Series({
        security_id: instrument.NPV() for security_id, instrument in self.portfolio_dict.items()})

        if self.return_type == "linear":
            return X[reference_prices.index] / reference_prices - 1
        elif self.return_type == "log":
            return np.log1p(X[reference_prices.index] / reference_prices - 1)
        elif self.return_type == "arithmetic":
            return X[reference_prices.index] - reference_prices

class columnDropperTransformer():
    def __init__(self,columns):
        self.columns=columns

    def transform(self,X,y=None):
        return X.drop(self.columns,axis=1)

    def fit(self, X, y=None):
        return self 
    
class UpdateQuotes(TransformerMixin, BaseEstimator):
    def __init__(self,
                 quote_dict: Dict[str, ql.SimpleQuote],
                 ):
        self.quote_dict = quote_dict

    def fit(self, X: pd.DataFrame, y: pd.DataFrame | None = None):
        return self
    
    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
        if isinstance(X.index, pd.DatetimeIndex):
            data_row = X.loc[max(X.index)]
        else:
            data_row = X.iloc[-1]
        
        for quote_id, quote in self.quote_dict.items():
            quote.setValue(data_row[quote_id])
        
        return X

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn import set_config
from sklearn.model_selection import train_test_split

from skfolio.optimization import InverseVolatility, MeanRisk, ObjectiveFunction
from skfolio import Population, RiskMeasure

X_train, X_test = train_test_split(bond_prices.join(rates), test_size=0.33, shuffle=False)

returns_period = 1

set_config(transform_output='pandas')
model = Pipeline([
    ("Impute Missing Rates", SimpleImputer(strategy="mean")),
    ('Bond Prices to Z-Spreads', PriceToQuote(portfolio, ois_quotes, bond_price_to_spread)),
    ("Update Z-Spread and Rates Quotes", UpdateQuotes(spread_quotes | ois_quotes)),
    ("Remove Rates Columns", columnDropperTransformer(rates.columns)),
    # ("Z-Spread Distribution", EmpiricalPriceDistribution(periods=returns_period)),
    # ("Reprice Bonds", QuoteToPrice(portfolio, spread_quotes)),
    # ("Prices to Returns", EmpiricalReturnDistribution(portfolio)),
    # ("Optimize Portfolio", MeanRisk(
    #     risk_measure=RiskMeasure.STANDARD_DEVIATION,
    #     objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    #     portfolio_params=dict(name="Max Sharpe", annualized_factor=252),
    # ))
]
)

# model.fit(X_train)
# portfolio_weightings = model.predict(X_test)

In [None]:
bond_spreads = model.fit_transform(X_train)

100%|██████████| 824/824 [00:40<00:00, 20.26it/s]


In [None]:
new_z_spreads = EmpiricalPriceDistribution().transform(bond_spreads[spread_quotes.keys()])

In [None]:
ql.Settings.instance().evaluationDate = ql.Date(15, 3, 2024)

In [None]:
def calculate_z_spread(bond, market_price, curve):
 
    # The root-finding function for finding spread that makes the bond 
    # price close to market price

    bond_copy = bond.copy()
    spread_quote = ql.QuoteHandle(ql.SimpleQuote(z_spread))
    risky_curve = ql.ZeroSpreadedTermStructure(ql.YieldTermStructureHandle(curve), spread_quote)
    z_spread_handle = ql.YieldTermStructureHandle(risky_curve)
    bond_copy.setPricingEngine(ql.DiscountingBondEngine(z_spread_handle))

    def z_spread_func(z_spread):
        spread_quote.setValue(z_spread)
        return bond_copy.cleanPrice() - market_price
     
    # Create and configure the Brent solver 
    accuracy = 1e-6
    guess = 0.005
    min = 1e-6
    max = 1
    solver = ql.Brent()
    solver.setMaxEvaluations(1000)
 
    # Solve the spread
    z_spread = solver.solve(z_spread_func, accuracy, guess, min, max)
 
    return z_spread

def bond_price_to_spread(bond: ql.Bond, price: float) -> float:
    eval_date = ql.Settings.instance().evaluationDate
    return ql.BondFunctions.zSpread(
            bond, 
            ql.BondPrice(price, ql.BondPrice.Clean),
            ql.ImpliedTermStructure(discount_curve, eval_date),
            ql.Actual360(),
            ql.Compounded,
            ql.Semiannual,
            eval_date + 2
            )

# def bond_price_to_spread(bond: ql.Bond, price: float) -> float:    
    #  return calculate_z_spread(bond, price, sofr_curve)

In [None]:
isin = list(portfolio.keys())[0]
bond_price = X_train[isin].iloc[-1]
print(bond_price)
z_spread = bond_price_to_spread(
    portfolio[isin],
    bond_price
)
print(z_spread)
spread_quotes[isin].setValue(z_spread)
print(bond_price - portfolio[isin].cleanPrice())

86.602
-0.01083368030046245
-0.20330138175151546


In [None]:
X_train.iloc[-1] - QuoteToPrice(portfolio_dict=portfolio, quote_dict=spread_quotes).transform(PriceToQuote(portfolio, ois_quotes, bond_price_to_spread).transform(X_train.iloc[[-1]])).iloc[0]

100%|██████████| 1/1 [00:00<00:00, 17.25it/s]
100%|██████████| 1/1 [00:00<00:00, 22.63it/s]


US606822BR40   -1.291912e-08
US172967EW71    1.428868e-06
US86562MBP41   -3.133408e-09
US925524AH30    7.736216e-07
US233835AQ08    7.084765e-07
US172967BL44   -7.604418e-06
US606822BH67    6.951740e-05
US904764AH00   -3.027480e-09
US172967HS33    1.057656e-04
US254687FX90    7.016221e-05
1Y              0.000000e+00
2Y              0.000000e+00
3Y              0.000000e+00
5Y              0.000000e+00
7Y              0.000000e+00
10Y             0.000000e+00
30Y             0.000000e+00
Name: 2024-03-15 00:00:00, dtype: float64

In [None]:
new_z_spreads

Unnamed: 0_level_0,US606822BR40,US172967EW71,US86562MBP41,US925524AH30,US233835AQ08,US172967BL44,US606822BH67,US904764AH00,US172967HS33,US254687FX90
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2024-03-16,0.009161,0.013942,0.008854,0.029884,0.005632,0.013334,0.006983,0.007169,0.014581,0.006553
2024-03-16,0.009142,0.015403,0.008805,0.029319,0.005361,0.013685,0.007097,0.007432,0.015678,0.006032
2024-03-16,0.009279,0.014001,0.009019,0.029454,0.005358,0.012890,0.006424,0.006911,0.015899,0.006313
2024-03-16,0.009330,0.013776,0.008883,0.030035,0.005541,0.014156,0.007559,0.007474,0.015704,0.006580
2024-03-16,0.008748,0.013716,0.008387,0.028411,0.005010,0.013080,0.006222,0.007124,0.014902,0.006062
...,...,...,...,...,...,...,...,...,...,...
2024-03-16,0.009169,0.013913,0.008729,0.028858,0.005437,0.013484,0.006771,0.006939,0.015083,0.006261
2024-03-16,0.008767,0.013983,0.008748,0.028676,0.005018,0.012932,0.006607,0.007222,0.015589,0.006274
2024-03-16,0.009423,0.014367,0.008973,0.028976,0.005578,0.013244,0.007589,0.007245,0.015407,0.006237
2024-03-16,0.008228,0.013095,0.007853,0.027115,0.004347,0.012646,0.006208,0.006097,0.014431,0.005142


In [None]:
simulated_prices = QuoteToPrice(portfolio_dict=portfolio, quote_dict = spread_quotes, update_dates=True).transform(new_z_spreads)

100%|██████████| 823/823 [00:00<00:00, 11217.19it/s]


In [None]:
for isin, quote in spread_quotes.items():
    quote.setValue(0)

In [None]:
simulated_prices

Unnamed: 0_level_0,US606822BR40,US172967EW71,US86562MBP41,US925524AH30,US233835AQ08,US172967BL44,US606822BH67,US904764AH00,US172967HS33,US254687FX90
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2024-03-16,86.754822,127.741786,90.523538,104.114942,122.601447,108.383771,94.388775,108.566929,98.11822,87.630472
2024-03-16,86.754822,127.741786,90.523538,104.114942,122.601447,108.383771,94.388775,108.566929,98.11822,87.630472
2024-03-16,86.754822,127.741786,90.523538,104.114942,122.601447,108.383771,94.388775,108.566929,98.11822,87.630472
2024-03-16,86.754822,127.741786,90.523538,104.114942,122.601447,108.383771,94.388775,108.566929,98.11822,87.630472
2024-03-16,86.754822,127.741786,90.523538,104.114942,122.601447,108.383771,94.388775,108.566929,98.11822,87.630472
...,...,...,...,...,...,...,...,...,...,...
2024-03-16,86.754822,127.741786,90.523538,104.114942,122.601447,108.383771,94.388775,108.566929,98.11822,87.630472
2024-03-16,86.754822,127.741786,90.523538,104.114942,122.601447,108.383771,94.388775,108.566929,98.11822,87.630472
2024-03-16,86.754822,127.741786,90.523538,104.114942,122.601447,108.383771,94.388775,108.566929,98.11822,87.630472
2024-03-16,86.754822,127.741786,90.523538,104.114942,122.601447,108.383771,94.388775,108.566929,98.11822,87.630472


In [None]:
UpdateQuotes(quote_dict=spread_quotes | ois_quotes).transform(PriceToQuote(portfolio_dict=portfolio, quote_dict=spread_quotes, price_transformer=bond_price_to_spread).transform(X_train.iloc[[-1]]))

100%|██████████| 1/1 [00:00<00:00, 22.38it/s]


Unnamed: 0_level_0,US606822BR40,US172967EW71,US86562MBP41,US925524AH30,US233835AQ08,US172967BL44,US606822BH67,US904764AH00,US172967HS33,US254687FX90,1Y,2Y,3Y,5Y,7Y,10Y,30Y
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2024-03-15,0.009025,0.01379,0.008685,0.028499,0.005278,0.013115,0.006922,0.007034,0.015076,0.006277,0.05339,0.04847,0.04571,0.04322,0.04219,0.04187,0.03959


In [None]:
spread_quotes["US606822BR40"].setValue(0.2)

In [None]:
{isin: bond.NPV() for isin, bond in portfolio.items()}

{'US606822BR40': 86.7137879485556,
 'US172967EW71': 127.6760848375164,
 'US86562MBP41': 90.48097472865405,
 'US925524AH30': 104.04877372213608,
 'US233835AQ08': 122.54726727259445,
 'US172967BL44': 108.32847055501144,
 'US606822BH67': 94.3458477201645,
 'US904764AH00': 108.51702483958923,
 'US172967HS33': 98.06684011423165,
 'US254687FX90': 87.59103241263284}

In [None]:
np.sign(simulated_prices - pd.Series({isin: bond.NPV() for isin, bond in portfolio.items()})).mean()

US606822BR40    0.0
US172967EW71    0.0
US86562MBP41    0.0
US925524AH30    0.0
US233835AQ08    0.0
US172967BL44    0.0
US606822BH67    0.0
US904764AH00    0.0
US172967HS33    0.0
US254687FX90    0.0
dtype: float64

In [None]:
simulated_prices

Unnamed: 0_level_0,US606822BR40,US172967EW71,US86562MBP41,US925524AH30,US233835AQ08,US172967BL44,US606822BH67,US904764AH00,US172967HS33,US254687FX90
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2024-03-16,86.713788,127.676085,90.480975,104.048774,122.547267,108.328471,94.345848,108.517025,98.06684,87.591032
2024-03-16,86.713788,127.676085,90.480975,104.048774,122.547267,108.328471,94.345848,108.517025,98.06684,87.591032
2024-03-16,86.713788,127.676085,90.480975,104.048774,122.547267,108.328471,94.345848,108.517025,98.06684,87.591032
2024-03-16,86.713788,127.676085,90.480975,104.048774,122.547267,108.328471,94.345848,108.517025,98.06684,87.591032
2024-03-16,86.713788,127.676085,90.480975,104.048774,122.547267,108.328471,94.345848,108.517025,98.06684,87.591032
...,...,...,...,...,...,...,...,...,...,...
2024-03-16,86.713788,127.676085,90.480975,104.048774,122.547267,108.328471,94.345848,108.517025,98.06684,87.591032
2024-03-16,86.713788,127.676085,90.480975,104.048774,122.547267,108.328471,94.345848,108.517025,98.06684,87.591032
2024-03-16,86.713788,127.676085,90.480975,104.048774,122.547267,108.328471,94.345848,108.517025,98.06684,87.591032
2024-03-16,86.713788,127.676085,90.480975,104.048774,122.547267,108.328471,94.345848,108.517025,98.06684,87.591032


In [None]:
px.histogram(bond_spreads.iloc[-1].loc[spread_quotes.keys()] *  (1 + PricesToReturns().transform(bond_spreads[spread_quotes.keys()])))

In [None]:
px.histogram(return_dist.iloc[:, 9])

In [None]:
from sklearn.compose import ColumnTransformer

model = Pipeline([
    ("Impute Missing Rates", SimpleImputer(strategy="mean")),
    ('Bond Prices to Z-Spreads', PriceToQuote(portfolio, ois_quotes, bond_price_to_spread)),
    ("Update Z-Spread and Rates Quotes", UpdateQuotes(spread_quotes | ois_quotes)),
    ("Calculate Returns", ColumnTransformer(
        [("Linear Returns for Z-Spreads", EmpiricalPriceDistribution(), list(spread_quotes.keys())),
         ("Arithmetic Returns for Rates", EmpiricalPriceDistribution(return_type="arithmetic"), list(rates.columns))]
    ))]
)

In [None]:

model.fit_transform(X_train.head(200)).std() * np.sqrt(20 / returns_period)

100%|██████████| 200/200 [00:10<00:00, 18.85it/s]


US606822BR40    0.110474
US172967EW71    0.079012
US86562MBP41    0.097505
US925524AH30    0.101156
US233835AQ08    0.086913
US172967BL44    0.074141
US606822BH67    0.129966
US904764AH00    0.130331
US172967HS33    0.065051
US254687FX90    0.127736
dtype: float64

In [None]:
portfolio_weightings.returns

array([0.2896497 , 0.29607414, 0.2949828 , 0.29030795, 0.28933105,
       0.29102225, 0.29488628, 0.29430498, 0.29385894, 0.29585359,
       0.29201797, 0.29595029, 0.29561104, 0.29401284, 0.29497736,
       0.28017004, 0.29073462, 0.29476491, 0.28857081, 0.29072286,
       0.29881657, 0.29218816, 0.29327977, 0.29527859, 0.2958412 ,
       0.29669764, 0.29281508, 0.29432106, 0.29634262, 0.29240398,
       0.2949327 , 0.29300067, 0.29111449, 0.29270612, 0.29016121,
       0.29215397, 0.29248056, 0.29389782, 0.2934716 , 0.29447929,
       0.28774148, 0.29415148, 0.29244845, 0.2903648 , 0.29021327,
       0.28850386, 0.29166511, 0.2902649 , 0.28772556, 0.28914969,
       0.29006377, 0.29042604, 0.29357909, 0.29051428, 0.29008924,
       0.29309282, 0.29368981, 0.29028709, 0.29261972, 0.29289208,
       0.29397657, 0.28836304, 0.29139928, 0.29422042, 0.29507522,
       0.29092615, 0.29298683, 0.29452922, 0.29234144, 0.29343989,
       0.29493588, 0.29022896, 0.29068005, 0.29203734, 0.29387

In [None]:
portfolio_weightings.summary()

Mean                                            0.0054%
Annualized Mean                                   1.36%
Variance                                        0.0087%
Annualized Variance                               2.18%
Semi-Variance                                   0.0034%
Annualized Semi-Variance                          0.85%
Standard Deviation                                0.93%
Annualized Standard Deviation                    14.78%
Semi-Deviation                                    0.58%
Annualized Semi-Deviation                         9.23%
Mean Absolute Deviation                           0.50%
CVaR at 95%                                       1.65%
EVaR at 95%                                       4.73%
Worst Realization                                 7.62%
CDaR at 95%                                       4.44%
MAX Drawdown                                      7.92%
Average Drawdown                                  1.34%
EDaR at 95%                                     

In [None]:
portfolio.plot_cumulative_returns()

In [None]:
portfolio.plot_composition()

In [None]:
def bond_price_to_spread(bond: ql.Bond, price: float) -> float:
    return ql.BondFunctions.zSpread(
            bond, 
            ql.BondPrice(price, ql.BondPrice.Clean),
            ql.ZeroSpreadedTermStructure(discount_curve, ql.QuoteHandle(ql.SimpleQuote(0.0))),
            bond.dayCounter(),
            ql.Compounded,
            ql.Semiannual,
            ql.Settings.instance().evaluationDate)

bond_spreads = QLPriceTransformer(portfolio, bond_price_to_spread, quote_dict=ois_quotes).transform(prices=bond_prices, market_data=rates)
px.line(bond_spreads)

100%|██████████| 1231/1231 [01:01<00:00, 19.96it/s]


In [None]:
spread_quotes = {}
for isin, bond in portfolio.items():
    spread_quotes[isin] = ql.SimpleQuote(0.0)
    risky_curve = ql.YieldTermStructureHandle(ql.ZeroSpreadedTermStructure(ql.YieldTermStructureHandle(sofr_curve), ql.QuoteHandle(spread_quotes[isin])))
    bond.setPricingEngine(ql.DiscountingBondEngine(risky_curve))

def spread_to_bond_price(bond):
    return bond.cleanPrice()

# px.line(QLPriceTransformer(portfolio, spread_to_bond_price, quote_dict=spread_quotes | ois_quotes).transform(market_data=bond_spreads.join(rates).ffill().dropna()))

In [None]:
simulated_spreads = bond_spreads.iloc[-1] * (1 + prices_to_returns(bond_spreads))
simulated_prices = QLPriceTransformer(portfolio, spread_to_bond_price, quote_dict=spread_quotes, date_index=False).transform(market_data=simulated_spreads)
simulated_returns = simulated_prices / bond_prices.iloc[-1]
px.histogram(simulated_returns)

100%|██████████| 1230/1230 [00:00<00:00, 4577.80it/s]


In [None]:
from sklearn.impute import SimpleImputer