# Atoti vs pandas comparative analysis for Value at Risk: pandas notebook

Last tested version: <img src="https://img.shields.io/badge/Atoti-v0.8.14-blue">

## Introduction

Aiming to compare the advanced analytics capabilities of Atoti with that of pandas, we will build a notebook that implements, **using pandas**, the main components of a VaR use case:

* Compute VaR and ES:
    * At two different confidence levels: 95% and 99%; 
    * At three different granularities: top-of-house (global value for entire financial institution), combination of book and trade, and combination of all attributes.
* Track, for each of those queries, the:
    * Response time;
    * Memory usage.


We will also enrich the use case by:
* Computing the marginal VaR
* Performing simulations

💡 **Note:** This notebook uses a dataset of ~5.5GB, which may take a number of minutes to initially load.

<div style="text-align: center;" ><a href="https://www.atoti.io/?utm_source=gallery&utm_content=atoti-pandas-comparison" target="_blank" rel="noopener noreferrer"><img src="https://data.atoti.io/notebooks/banners/Discover-Atoti-now.png" alt="Try Atoti"></a></div>

In [None]:
!pip install memory_profiler
!pip install s3fs

In [None]:
%load_ext memory_profiler

In [None]:
#Package Imports
import pandas as pd
import numpy as np
import s3fs

# Loading the data into a pandas DataFrame

### Trade PnL Table

Our trade table is the main table that contains all the trades, some of their attributes, and, most importantly, their Profit-and-Loss (PnL) vectors.

We will define the structure of the table, then feed it from out S3 repository. This will be a rather large dataset, which will allow us to perform our comparative analysis at a significant level.

In [None]:
s3 = s3fs.S3FileSystem()
trades=pd.read_parquet("bd-connect-london-hybrid-demo-202306/data/pnl/", filesystem=s3)

### Book Table

The book table will enrich the data model with information about the books that contain our trades.

In [None]:
books = pd.read_csv("./data/books.csv")

## Merge Tables

In [None]:
merged=trades.merge(books, how='left',  on="BOOKID")

# Computing the VaR Metrics

Since pandas doesn't inherently meanage dynamic aggregation, a pre-aggregation step necessarily needs to be coded for each desired level of aggregation before the statistical/mathematical function is applied to get the final metric.

The groupby() function is thus used to perform that initial aggregation of the PnL Vectors, and then the quantile/mean functions would be applied on the resulting set.

The downside of this is that each level of aggregation would require its own, multi-line code block, wich introduces **inefficiencies** and **redundancy**. 
In Atoti, however, a measure is configured only once, with one line of code, and can be subsequently used and evaluated at any granularity.

### Computing Top of House VaR and ES

In [None]:
%%time
%%memit

VaRTopOfHouse=merged.groupby(['ASOFDATE'])['PNL_VECTOR'].sum().reset_index()
VaRTopOfHouse['VaR95']=VaRTopOfHouse.apply(lambda x: np.quantile(x['PNL_VECTOR'], 0.05), axis=1)
VaRTopOfHouse['VaR99']=VaRTopOfHouse.apply(lambda x: np.quantile(x['PNL_VECTOR'], 0.01), axis=1)
VaRTopOfHouse['ES95']=VaRTopOfHouse.apply(lambda x: np.mean(sorted(x['PNL_VECTOR'][:12])), axis=1)
VaRTopOfHouse

### Computing VaR and ES at BookId and TradeId Level

In [None]:
%%time
%%memit

VaRByBookandTrade=merged.groupby(['ASOFDATE', 'BOOK', 'TRADEID'])['PNL_VECTOR'].sum().reset_index()
VaRByBookandTrade['VaR95']=VaRByBookandTrade.apply(lambda x: np.quantile(x['PNL_VECTOR'], 0.05), axis=1)
VaRByBookandTrade['VaR99']=VaRByBookandTrade.apply(lambda x: np.quantile(x['PNL_VECTOR'], 0.01), axis=1)
VaRByBookandTrade['ES95']=VaRByBookandTrade.apply(lambda x: np.mean(sorted(x['PNL_VECTOR'][:12])), axis=1)
VaRByBookandTrade

### Computing VaR and ES at the most granular level (combination of all available qualitative hierarchies) 

In [None]:
%%time
%%memit

VaRGranular=merged.groupby(['ASOFDATE', 'BUSINESS_UNIT', 'SUB_BUSINESS_UNIT', 'TRADING_DESK' , 'BOOKID', 'RISKCLASS', 'TRADEID'])['PNL_VECTOR'].sum().reset_index()
VaRGranular['VaR95']=VaRGranular.apply(lambda x: np.quantile(x['PNL_VECTOR'], 0.05), axis=1)
VaRGranular['VaR99']=VaRGranular.apply(lambda x: np.quantile(x['PNL_VECTOR'], 0.01), axis=1)
VaRGranular['ES95']=VaRGranular.apply(lambda x: np.mean(sorted(x['PNL_VECTOR'][:12])), axis=1)
VaRGranular

### Incremental VaR (Parent VAR - Parent VAR excluding self)

Since pandas does not have any semantic dimension attached to its columns, it lacks the understanding of any potential hierarchical/order relationships that may lie within them, which means that implementing a measure such as incremental VaR would require us to hard code every step of that logic.

In [None]:
%%time
%%memit

bookpnl=merged.groupby(['BOOK'])['PNL_VECTOR'].sum().reset_index()
deskpnl=merged.groupby(['TRADING_DESK'])['PNL_VECTOR'].sum().reset_index()
deskandbook=merged[['TRADING_DESK', 'BOOK']].drop_duplicates().sort_values(by=['TRADING_DESK', 'BOOK']).reset_index(drop=True)

deskandbook['DESKPNL']=''
deskandbook['BOOKPNL']=''

for i in range(len(deskandbook)):
    book=deskandbook.loc[i, 'BOOK']
    desk=deskandbook.loc[i, 'TRADING_DESK']
    bookpl=bookpnl[bookpnl['BOOK']==book]['PNL_VECTOR']
    deskpl=deskpnl[deskpnl['TRADING_DESK']==desk]['PNL_VECTOR']
    deskandbook.at[i, 'BOOKPNL']=bookpl.values
    deskandbook.at[i, 'DESKPNL']=deskpl.values
    
deskandbook['DESKWITHOUTBOOK']=deskandbook['DESKPNL']-deskandbook['BOOKPNL']
deskandbook['VaRDesk95']=deskandbook.apply(lambda x:  np.quantile(np.quantile(x['DESKPNL'], 0.05), 0.05), axis=1)
deskandbook['VaRDeskWITHOUTBOOK95']=deskandbook.apply(lambda x: np.quantile(x['DESKWITHOUTBOOK'], 0.05), axis=1)

deskandbook['IncrementalVaR']=deskandbook['VaRDesk95']-deskandbook['VaRDeskWITHOUTBOOK95']

In [None]:
deskandbook[['TRADING_DESK', 'BOOK', 'IncrementalVaR']]

# What If Scenarios

pandas does not support branching either, which means that the only way to create something similar to a simulation or new scenario would require loading a full new dataset in a separate data structure, not only the deltas (differing data points). This doubles the memory footprint and limits any possibility of contained, side-by-side analytics.

e will start by making a copy of the original dataset, then enriching it with the stressed data points, then dropping the duplicates manually to remove the old, unstressed records.

In [None]:
stressed=merged.copy()

In [None]:
addition=pd.read_parquet("bd-connect-london-hybrid-demo-202306/data/simulation/pnl_16.parquet", filesystem=s3)
stressed=pd.concat([stressed, addition])

In [None]:
stressed.drop_duplicates(['BOOKID',	'ASOFDATE',	'TRADEID',	'DATASET',	'RISKFACTOR',	'RISKCLASS'], keep='last')

<div style="text-align: center;" ><a href="https://www.atoti.io/?utm_source=gallery&utm_content=atoti-pandas-comparison" target="_blank" rel="noopener noreferrer"><img src="https://data.atoti.io/notebooks/banners/Your-turn-to-try-Atoti.jpg" alt="Try Atoti"></a></div>