# Collateral Shortfall Forecast

This notebook is derived from a previous one available on our website around [Collateral optimization](https://github.com/atoti/notebooks/blob/master/notebooks/collateral-shortfall-monitoring/main.ipynb). For context, we’re copying here the first part of this notebook by [Hui Fang Yeo](https://www.linkedin.com/in/huifang-yeo/). The scenario starts diverging at the ‘What-If’ stage to integrate predictive machine learning algorithms. Jump to that part if you are already familiar with the use case.

For more context and definitions around collateral shortfall monitoring, [check out our article on atoti.io](https://www.atoti.io/rapid-collateral-modelling-and-simulation-with-atoti/).

### Introduction

In this notebook, we will showcase how quickly a dashboard can be put together for a simplified use case of Collateral Shortfall monitoring with atoti libraries.  
  
Collateral is a form of credit risk mitigation where an asset is accepted as security for extending a loan.  
Market value of a collateral changes over time and lender has to accomodate for it. As such, depending on the amount of risk associated, a percentage of what is known as haircut is applied to the asset's market value. This gives the value of the collateral that can be used for loan, also known as collateral value.   
  
Collateral Shortfall occurs when the collateral value goes below the cash out value. That meant that the value of collateral is less than what it is expected to be, due to a variety of factors such as market fluctuations, contracts enforceability etc. 

We will be creating a multi-dimension data cube and derive the various measures such as market value, collateral value after haircut, cash out value over account and thereafter, the Collateral shortfall for the accounts. 

Leveraging on the data cube and atoti's data visualization, we will put together dashboards that reflects collateral status of accounts. Cherry on top, we will perform some *What if Analysis* based on price simulation to demonstrate the impact on Collateral in the below scenarios:  

- Asset price forecast at 1-day horizon
- Asset price forecast at 3-days horizon
- Asset price forecast at 1-week horizon

<div style="text-align: center;" ><a href="https://www.atoti.io/?utm_source=gallery&utm_content=collateral-monitoring" target="_blank" rel="noopener noreferrer"><img src="https://data.atoti.io/notebooks/banners/discover.png" alt="Try atoti"></a></div>

#### Dependencies

As data used in this notebook is stored on AWS S3, hence it is necessary to install the [atoti-aws plugin](https://docs.atoti.io/latest/plugins.html#available-plugins).

```
!pip install atoti-aws
or 
!conda install atoti-aws
```

In [1]:
import os

import atoti as tt
import numpy as np
import pandas as pd
import utils
from natsort import natsorted
from sklearn.metrics import explained_variance_score, mean_squared_error, r2_score
from tabulate import tabulate

In [2]:
# a session has to be created for atoti
# dashboards are persisted in the content storage
session = tt.Session(user_content_storage="./content")

### Data loading
A session is used to read data of formats csv, parquet, pandas dataframe, numpy and spark.   
Refer to https://docs.atoti.io/0.3.1/tutorial/08-data-sources.html 
   
#### Loading csv

In [3]:
asset_positions_table = session.read_csv(
    "s3://data.atoti.io/notebooks/collateral-shortfall-monitoring/assets_positions.csv",
    keys=["Account", "Asset_Code"],
    table_name="asset_positions",
)
asset_positions_table.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Quantity
Account,Asset_Code,Unnamed: 2_level_1
Niel,CAP.PA,100000.0
Niel,SAN.PA,100000.0
Musk,ENI.PA,100000.0
Musk,ENGI.PA,100000.0
Bezos & MacKenzie,AC.PA,100000.0


### Loading parquet

In [4]:
assets_table = session.read_parquet(
    "s3://data.atoti.io/notebooks/collateral-shortfall-monitoring/assets_attributes.parquet",
    keys=["Asset_Code"],
    table_name="assets",
)
assets_table.head()

Unnamed: 0_level_0,Sector,Country,Haircut
Asset_Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BNP.PA,Financial Services,France,0.1
CA.PA,Consumer Defensive,France,0.1
AC.PA,Consumer Cyclical,France,0.1
ENGI.PA,Utilities,France,0.1
CAP.PA,Technology,France,0.1


### Loading csv via pandas before session loads pandas dataframe   
Being able to load pandas dataframe gives us the flexibility to manipulate dataframe before loading them or later on when we do simulations.

In [5]:
assets_prices_df = pd.read_csv(os.path.join("../data", "assets-prices-test.csv"))
assets_prices_df.head()

FileNotFoundError: [Errno 2] No such file or directory: '../data/assets-prices-test.csv'

Let's transform the dataframe in order to obtain the desired shape: we want to have all the asset names in one column, and all their prices in anotehr column.

To achieve that, we use the ***melt()*** method of the Pandas Dataframe class, available in the Pandas library.

In [None]:
assets_prices_df = assets_prices_df.melt(
    id_vars="Date", var_name="Asset_Code", value_name="Price"
).set_index(["Asset_Code", "Date"])
assets_prices_df.head()

In [None]:
assets_prices_table = session.read_pandas(
    assets_prices_df,
    keys=["Asset_Code", "Date"],
    table_name="assets_prices",
)
assets_prices_table.head()

In [None]:
loans_positions_df = pd.read_csv(
    "http://data.atoti.io/notebooks/collateral-shortfall-monitoring/loans_positions.csv"
)
loans_positions_df.head()

In [None]:
loans_positions_table = session.read_pandas(
    loans_positions_df, keys=["Account"], table_name="loans_positions"
)
loans_positions_table.head()

### Joining data table

In [None]:
asset_positions_table.join(assets_table, mapping={"Asset_Code": "Asset_Code"})

In [None]:
asset_positions_table.join(loans_positions_table, mapping={"Account": "Account"})

In [None]:
asset_positions_table.join(assets_prices_table, mapping={"Asset_Code": "Asset_Code"})

### Cube creation

In [None]:
cube = session.create_cube(asset_positions_table, "Collateral_Management")

In [None]:
cube.schema

### Quick analysis with cube.visualize

In [None]:
# we can perform drill-down to different hierarchies in a pivot table
session.visualize("explore-dataset-using-pivot-table")

### Cube structure
During cube creation, numeric values are automatically created as measures. Non numeric values are automatically translated to levels under hierarchy of the same name. This can be [configured](https://www.atoti.io/documentation/lib/atoti.html#atoti.session.Session.create_cube) differently.

In [None]:
m = cube.measures
h = cube.hierarchies
lvl = cube.levels

Before we proceed with the data aggregation aspects, let's inspect the hierarchies created

In [None]:
h

We are going to set the hierarchy *Date* as a slicing hierarchy. A slicing hierarchy will not aggregate the data on all its members.  
This means that we always view a subset of the cube by one date by default, which is usually what is needed.

In [None]:
h["Date"].slicing = True

In [None]:
m

### Creating new measures  
From the data we have, we can derive the following:   
$Market Value = Price \times Quantity$  
$Collateral Value = Market Value \times (1 - Haircut)$   
  
The above measures are aggregated over the Account and Asset Code levels in order to compute the Collateral Shortfall at account level:  
$Collateral Shortfall = Collateral Value - Cash Out$   
Where Cash Out is also aggregated at account level           

In [None]:
m["Price"] = tt.value(assets_prices_table["Price"])

In [None]:
m["Market Value"] = tt.agg.sum(
    m["Price"] * m["Quantity.SUM"],
    scope=tt.scope.origin(lvl["Sector"], lvl["Country"], lvl["Account"]),
)

In [None]:
m["Haircut"] = tt.agg.sum(assets_table["Haircut"])

m["Collateral Value"] = tt.agg.sum(
    m["Price"] * m["Quantity.SUM"] * (1 - m["Haircut"]),
    scope=tt.scope.origin(lvl["Sector"], lvl["Country"]),
)

In [None]:
cash_out = tt.value(loans_positions_table["Cash_Out"])
m["Cash Out"] = tt.agg.sum(cash_out, scope=tt.scope.origin(lvl["Account"]))

In [None]:
m["Collateral Shortfall"] = m["Collateral Value"] - m["Cash Out"]

In [None]:
m

Let's explore this new measures.

In [None]:
# we can look at the price.VALUE across Date, further split the charts by Asset_Code
session.visualize("times-series")

In [None]:
# give a meaningful title to the visualization. This helps to reconcile the objective of the visual and also could be the title of
# the widget when visual is published
session.visualize("haircut-value")

### Monitoring Collateral Shortfall  
We created a pivot table for with the Collateral Shortfall, Market Value, Cash Out and Collateral Value for Accounts.  
Negative Collateral Shortfall are highlighted in red. Feel free to click on the `>` to drill-down to other hierarchies such as Sector to account for the shortfall. 

In [None]:
session.visualize("Collateral Shortfall")

In view of all accounts, using a Gauge chart will show us that we are not yet in shortfall and how far we are from it.   
The red marker shows the total Market value, which is the maximum threshold before shortfall will occur.

In [None]:
session.visualize("total-cash-out")

### Total Cash out Bank wide  
  
We can use a Tree map to visualize the asset concentration. A well diversified portfolio will help to reduce the collateral risks.

In [None]:
session.visualize("Asset Concentration")

## atoti UI and Dashboard creation
Until now, we have created a few visualizations. We can right-click on the visuals to publish them as widgets.
These widgets can then be used to build a dashboard.  

<img src="http://data.atoti.io/notebooks/collateral-shortfall-forecast/dashboard.gif" alt="collateral_dashboard" style="zoom:40%;" />

In [None]:
session.link(path="#/dashboard/3a7")

Click on the URL above to view the dashboard that was prepared. We can use the quick filter to select an account for viewing.  
We can also do a right-click drillthrough to investigate the underlying data.  
  
To play with the UI and explore the data, [you can have a look at our UI documentation here](https://www.activeviam.com/activeui/documentation/index.html).

## Simulations
Now that we have basic monitoring on Collateral Shortfall, we can do some simulations in the data cube.

### Setup Price Simulation

The two kinds of simulation in atoti are:

- Measure simulation
- Source simulation

In Measure simulations, we modify the value of the measures in scenarios of the simulations without duplicating data.  
Source simulation on the other hand, is a simulation created by loading a new source of modified data to the cube.

Here, we will use the Source simulation to simulate variations of the price by considering the price forecast at the following different time horizons:

- 1 day
- 3 days
- 1 week

For each forecasting time horizon, we will predict the Collateral Shortfall.

First, let's load the price forecast tables in a unique table.

In [None]:
files = [
    os.path.join("../results/predictions", f)
    for f in os.listdir("../results/predictions")
    if ".csv" in f
]
price_predictions_df = pd.DataFrame()

for f in files:
    price_predictions_df = pd.concat(
        [price_predictions_df, pd.read_csv(f, index_col=0)]
    )
price_predictions_df = price_predictions_df.sort_index()

In [None]:
price_predictions_df = price_predictions_df.reset_index().rename(
    columns={
        "index": "Date",
        "y": "Price",
        "ŷ": "Price Prediction",
        "Asset Code": "Asset_Code",
        "Model Name": "Best Model",
    }
)

In [None]:
price_predictions_df_1_day = price_predictions_df[
    price_predictions_df["forecasting horizon in days"] == 1
][["Date", "Asset_Code", "Price Prediction"]].copy()
price_predictions_df_1_day = pd.melt(
    price_predictions_df_1_day.sort_values(["Asset_Code", "Date"]),
    id_vars=["Date", "Asset_Code"],
    value_vars=["Price Prediction"],
    value_name="Price",
)
price_predictions_df_1_day = price_predictions_df_1_day[
    ["Asset_Code", "Date", "Price"]
].set_index(["Asset_Code", "Date"])
price_predictions_df_1_day

In [None]:
price_predictions_df_3_days = price_predictions_df[
    price_predictions_df["forecasting horizon in days"] == 3
][["Date", "Asset_Code", "Price Prediction"]].copy()
price_predictions_df_3_days = pd.melt(
    price_predictions_df_3_days.sort_values(["Asset_Code", "Date"]),
    id_vars=["Date", "Asset_Code"],
    value_vars=["Price Prediction"],
    value_name="Price",
)
price_predictions_df_3_days = price_predictions_df_3_days[
    ["Asset_Code", "Date", "Price"]
].set_index(["Asset_Code", "Date"])
price_predictions_df_3_days

In [None]:
price_predictions_df_1_week = price_predictions_df[
    price_predictions_df["forecasting horizon in days"] == 7
][["Date", "Asset_Code", "Price Prediction"]].copy()
price_predictions_df_1_week = pd.melt(
    price_predictions_df_1_week.sort_values(["Asset_Code", "Date"]),
    id_vars=["Date", "Asset_Code"],
    value_vars=["Price Prediction"],
    value_name="Price",
)
price_predictions_df_1_week = price_predictions_df_1_week[
    ["Asset_Code", "Date", "Price"]
].set_index(["Asset_Code", "Date"])
price_predictions_df_1_week

In [None]:
assets_prices_table.head(10000)

Notice that the dataframes with the preictions do not have exactly the same size. This is because we consider different time horizons for the predictions.
Indeed, when we consider different time horizons for the prediction.
**To be able to compare the diffrent scenarios, we will filter them on the Date column and consider the period comprised from 2021/04/08 and 2022/01/24.**

To achive that, we consider only the date in the following index for all the scenarios:

In [None]:
index = price_predictions_df_1_week.index
index

Now, let's modify the assets_prices_df that was previously loaded into the cube by modifying the price column by the values corresponding to the price forecast.
For our *Price simulation*, we shall load this modified dataframe directly into the table as a scenario.

### Base scenario: Ground Truth Price

In [None]:
assets_prices_df_base = assets_prices_df.loc[index]
assets_prices_df = assets_prices_df_base.copy()
assets_prices_df

In [None]:
# In Source Simulation, we do not perform simulation_setup. We just load it to the table as scenarios.
with session.start_transaction(scenario_name="Actual"):
    # assets_prices_table.scenarios["Actual Collateral Shortfall"].drop()  # Clear the data from the "base" scenario before loading our new data
    assets_prices_table.scenarios["Actual"].load_pandas(assets_prices_df)

### Scenario 1: Forecast At 1 Day

In [None]:
assets_prices_df = price_predictions_df_1_day.loc[index].copy()
assets_prices_df

In [None]:
with session.start_transaction(scenario_name="Forecast At 1 Day"):
    # assets_prices_table.scenarios["Forecast At 1 Day"].drop()  # Clear the data from the "base" scenario before loading our new data
    assets_prices_table.scenarios["Forecast At 1 Day"].load_pandas(assets_prices_df)

### Scenario 2: Forecast At 3 Days

In [None]:
assets_prices_df = price_predictions_df_3_days.loc[index].copy()
assets_prices_df

In [None]:
# In Source Simulation, we do not perform simulation_setup. We just load it to the table as scenarios.
with session.start_transaction(scenario_name="Forecast At 3 Days"):
    # assets_prices_table.scenarios["Forecast At 3 Days"].drop()  # Clear the data from the "base" scenario before loading our new data
    assets_prices_table.scenarios["Forecast At 3 Days"].load_pandas(assets_prices_df)

### Scenario 3: Forecast At 1 Week

In [None]:
assets_prices_df = price_predictions_df_1_week.loc[index].copy()
assets_prices_df

In [None]:
with session.start_transaction(scenario_name="Forecast At 1 Week"):
    # assets_prices_table.scenarios["Forecast At 1 Week"].drop()  # Clear the data from the "base" scenario before loading our new data
    assets_prices_table.scenarios["Forecast At 1 Week"].load_pandas(assets_prices_df)

#### Comparing the different scenarios
Now, let's compare the impact on the collateral shortfall when considering different time horizon for the forecast.

In [None]:
session.visualize("Collateral Shortfall Forecasts comparison - Table")

In [None]:
session.visualize("Collateral Shortfall Forecasts comparison - Plot")

We observe that the projections are close to the actual calculation of the Collateral Shortfall value for Daniel EK and Musk. Also, they are not far to the actual value for Buffet, Gates and Niel.

However, in the case of Bezos & MacKenzie, we observe that the projections are less accurate.

Additionally, in general, we note that for most of the accounts, the forecast at 1 and 3 days are close whereas the one at 1-week horizon tends to differ significantly. 

These observations are probably due to the fact that these portofolios do not comprised excatly the same assets, for which we have use the same model and assumptions for the forecast. Which could lead to great accuracy for some assets, and less for others. As a consequence, depending on which assets are comprised in the account, the projections could be more or less accurate in comparison the the actual Collateral Shortfall. 

We will explain further the performing level of the forecast later in this notebook.

Check out the dashboard that was prepared in advance.

In [None]:
session.link(path="#/dashboard/69e")

### Quality of the Predictions

In [None]:
for asset in set(index.get_level_values(0)):
    # Ground truth
    y = assets_prices_df_base.loc[index].loc[asset]["Price"]

    # Prediction at 1 day
    y_pred_1d = price_predictions_df_1_day.loc[index].loc[asset]["Price"]
    r2_1d = r2_score(y, y_pred_1d)
    rmse_1d = mean_squared_error(y, y_pred_1d)
    mape_1d = utils.mean_absolute_percentage_error(y, y_pred_1d)

    # Prediction at 3 days
    y_pred_3d = price_predictions_df_3_days.loc[index].loc[asset]["Price"]
    r2_3d = r2_score(y, y_pred_3d)
    rmse_3d = mean_squared_error(y, y_pred_3d)
    mape_3d = utils.mean_absolute_percentage_error(y, y_pred_3d)

    # Prediction at 1 week
    y_pred_1w = price_predictions_df_1_week.loc[index].loc[asset]["Price"]
    r2_1w = r2_score(y, y_pred_1w)
    rmse_1w = mean_squared_error(y, y_pred_1w)
    mape_1w = utils.mean_absolute_percentage_error(y, y_pred_1w)

    # Results summary table
    results_df = pd.DataFrame(
        index=pd.Series(
            ["Prediction At 1 Day", "Prediction At 3 Days", "Prediction At 1 Week"]
        ),
        columns=["y_mean", "y_std", "ŷ_mean", "ŷ_std", "R2", "RMSE", "MAPE"],
    )

    metrics = {
        "Prediction At 1 Day": {
            "y_mean": np.mean(y),
            "y_std": np.std(y),
            "ŷ_mean": np.mean(y_pred_1d),
            "ŷ_std": np.std(y_pred_1d),
            "R2": r2_1d,
            "RMSE": rmse_1d,
            "MAPE": mape_1d,
        },
        "Prediction At 3 Days": {
            "y_mean": np.mean(y),
            "y_std": np.std(y),
            "ŷ_mean": np.mean(y_pred_3d),
            "ŷ_std": np.std(y_pred_3d),
            "R2": r2_3d,
            "RMSE": rmse_3d,
            "MAPE": mape_3d,
        },
        "Prediction At 1 Week": {
            "y_mean": np.mean(y),
            "y_std": np.std(y),
            "ŷ_mean": np.mean(y_pred_1w),
            "ŷ_std": np.std(y_pred_1w),
            "R2": r2_1w,
            "RMSE": rmse_1w,
            "MAPE": mape_1w,
        },
    }

    for k, v in metrics.items():
        for m in list(v.keys()):
            results_df.loc[k, m] = utils.truncate(v[m], 3)

    print(f"Result summary for asset code {asset}:\n{results_df.to_markdown()}\n\n")

The result tables show the following predictions, and regression evaluation metrics, for each time horizon prediction:

- **y_mean:** The average of the actual price of the considered asset
- **y_std:** The standard deviation of the actual price of the considered asset
- **ŷ_mean:** The average of the predicted price of the considered asset
- **ŷ_std:** The average standard deviation of the predicted price of the considered asset
- **R2:** The coefficient of determination (R squared) or regression score of the model
- **RMSE:** The Root Mean Squared Error
- **MAPE:** The Mean Absolute Percentage Error

Refer to https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics for the definitions.

<font color='blue' size=3>The results tables show that the forecasting models are good in general:

- <font color='blue' size=3>On average, they predict values close to the actual values with associated standard deviations comparable to the actual values as well. This demonstrates a good fit of the model with a low bias, except for the asset CAP.PA;
    
- <font color='blue' size=3>The R2 values are also good, except for a few assets like CAP.PA, ENGI.PA and TIT.MI, they are above 0.65. This shows good correlations between the actual values and the predictions;
    
- <font color='blue' size=3>The RMSE are low compared to the actual prices except for the assets CAP.PA;
    
- <font color='blue' size=3>The MAPE values are good as they are lower than 0.05 (i.e. less than 5% deviation from the actual prices on avrage). This demonstrates that, on average, the relative error between the prediction and the actual price is less than 5%. 
    
<font color='blue' size=3> Here, we can see the difference of performance of the models corresponding at different time horizon. **As expected with time series forecast, we observe that the closer the closer the time horizon, the better forecast.**
    
<font color='blue' size=3> Furthermore, we can see different performing levels at different time horizons for the same assets. But, we can also see diffrent performing levels at the same time horizon - so the same model -  for different assets. This is due to the fact that the diffrent assets do not have necessarily the haracteristics in terms of trends and seasonality, but for simplicity we have used the same assumptions to create their predictive features and forecast models. Which is not the best solution, and is definitely a way of improvment of our approach.

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

plt.rcParams["figure.figsize"] = (20, 13)

colors = {
    "Actual Price": "r",
    "Prediction at 1 day": "g",
    "Prediction at 3 days": "b",
    "Prediction at 1 week": "orange",
}
styles = {
    "Actual Price": "--",
    "Prediction at 1 day": "--",
    "Prediction at 3 days": "--",
    "Prediction at 1 week": "--",
}

for asset in set(index.get_level_values(0)):
    y = (
        assets_prices_df_base.loc[index]
        .loc[asset]
        .reset_index()
        .rename(columns={"Price": "Actual Price"})
    )
    y_pred_1d = (
        price_predictions_df_1_day.loc[index]
        .loc[asset]
        .reset_index()
        .rename(columns={"Price": "Prediction at 1 day"})
    )
    y_pred_3d = (
        price_predictions_df_3_days.loc[index]
        .loc[asset]
        .reset_index()
        .rename(columns={"Price": "Prediction at 3 days"})
    )
    y_pred_1w = (
        price_predictions_df_1_week.loc[index]
        .loc[asset]
        .reset_index()
        .rename(columns={"Price": "Prediction at 1 week"})
    )

    # df = pd.merge(left=y, right=y_pred_1d, on='Date')
    df = y.merge(
        y_pred_1d.merge(
            y_pred_3d.merge(y_pred_1w, how="inner", on="Date"), how="inner", on="Date"
        ),
        how="inner",
        on="Date",
    )
    df = df.set_index("Date")

    df.plot(
        color=[colors.get(x) for x in df.columns],
        style=[styles.get(x) for x in df.columns],
    )
    plt.title(f"Actual Price vs Predictions for asset code {asset}")
    plt.legend(loc="upper right")
    plt.show()

<font color='blue' size=3>The visualizations confirm the previous observations, as well as the good fit of the different models in general:

- <font color='blue' size=3>In general, the curves of predicted prices correspond well to those of actual prices, in terms of amplitude and trend;
    
- <font color='blue' size=3>The forecast at 1 day appears to be the best globally;
    
- <font color='blue' size=3>Although the 1-week forecast seems to be less good than the other models, its curves seem to be quite good since they follow the trend of real prices well in most cases, with however some relatively large deviations in amplitude for the different assets except BNP.PA, RACE.MI, TIT.MI, and ENGI.PA

## Conclusion

<font size=3>In this analysis, we have analyzed the risk of Collateral Shortfall for some portfolios in projection in a near future at different time horizons.

<font size=3>While the different time horizons used show an overall good forecast accuray, we observe that the models' performance decrease when we increase the forecast horizon: in general, the closer the forecast horizon, the better the forecast values.

<font size=3>We show how machine learning can be very effective in helping portfolio managers make informed decisions and manage risk taking into account different considerations such as the time/accuracy trade-off. In fact, machine learning can help assess the best strategy for the firm between a longer-term view with relatively low accuracy, and a shorter-term view with more accurate projections.

<div style="text-align: center;" ><a href="https://www.atoti.io/?utm_source=gallery&utm_content=collateral-monitoring" target="_blank" rel="noopener noreferrer"><img src="https://data.atoti.io/notebooks/banners/discover-try.png" alt="Try atoti"></a></div>