# Atoti vs pandas comparative analysis for Value at Risk: Atoti notebook

Last tested version: <img src="https://img.shields.io/badge/Atoti-v0.8.14-blue">

## Introduction

Aiming to compare the advanced analytics capabilities of Atoti with that of pandas, we will build a notebook that implements, **using Atoti**, the main components of a VaR use case:

* Compute VaR and ES:
    * At two different confidence levels: 95% and 99%; 
    * At three different granularities: top-of-house (global value for entire financial institution), combination of book and trade, and combination of all attributes.
* Track, for each of those queries, the:
    * Response time;
    * Memory usage.


We will also enrich the use case by:
* Computing the incremental VaR
* Performing simulations

💡 **Note:** This notebook uses a dataset of ~5.5GB, which may take a number of minutes to initially load.

<div style="text-align: center;" ><a href="https://www.atoti.io/?utm_source=gallery&utm_content=atoti-pandas-comparison" target="_blank" rel="noopener noreferrer"><img src="https://data.atoti.io/notebooks/banners/Discover-Atoti-now.png" alt="Try Atoti"></a></div>

# Imports

In [None]:
# !pip install memory_profiler
# !pip install boto3

In [None]:
%load_ext memory_profiler

In [None]:
import atoti as tt
import boto3

# Session Creation

Let's start by creating an Atoti session to host our in-memory data cube

In [None]:
session = tt.Session()

# Configuration of the Data model

Once our session is up, we can start developing and configuring our data model. This will form the basis on top of which we build our cube and that will allow us to further enrich it and build a extensible semantic layer.

### Trade PnL Table

Our trade table is the main table that contains all the trades, some of their attributes, and, most importantly, their Profit-and-Loss (PnL) vectors.

We will define the structure of the table, then feed it from out S3 repository. This will be a rather large dataset, which will allow us to perform our comparative analysis at a significant level.

In [None]:
trades_atoti = session.create_table(
    'TRADE_PNLS',
    types={
        'BOOKID': tt.type.DOUBLE,
        'ASOFDATE': tt.type.LOCAL_DATE,
        'TRADEID': tt.type.STRING,
        'DATASET': tt.type.STRING,
        'RISKFACTOR': tt.type.STRING,
        'RISKCLASS': tt.type.STRING,
        'SENSITIVITYNAME': tt.type.STRING,
        'CCY': tt.type.STRING,
        'TID': tt.type.DOUBLE,
        'PNL_VECTOR': tt.type.DOUBLE_ARRAY,
    },
    default_values={"BOOKID": 0.0, "TID": 0.0}    
)

In [None]:
bucket='bd-connect-london-hybrid-demo-202306'
s3 = boto3.resource('s3')
myBucket = s3.Bucket(bucket)

In [None]:
for object_summary in myBucket.objects.filter(Prefix="data/pnl/"):
    file='s3://'+bucket+'/'+object_summary.key
    if (file != 's3://bd-connect-london-hybrid-demo-202306/data/pnl/'):
        trades_atoti.load_parquet(file)

### Book Table

The book table will enrich the data model with information about the books that contain our trades.

In [None]:
books_atoti = session.read_csv("./data/books.csv", table_name="BOOKS", keys=['BOOKID'])

## Join tables

In [None]:
trades_atoti.join(books_atoti, trades_atoti["BOOKID"] == books_atoti["BOOKID"])

# Semantic Layer

## Cube creation

Our underlying data model is defined by joining the different tables. All that's left is to build a cube on top of it.

In [None]:
cube = session.create_cube(trades_atoti, mode='no_measures')

m, l, h = cube.measures, cube.levels, cube.hierarchies

In [None]:
session.tables.schema

## Hierarchy configuration

Here we add the BOOKID hierarchy to be used in the queries, and create a multi-level hierarchy that will allow us to define and compute an incremental VaR measure down the line.

In [None]:
h["BOOKID"] = [trades_atoti['BOOKID']]                            

In [None]:
h["Trading Book Hierarchy"] = {
    "Desk": l["TRADING_DESK"],
    "Book": l["BOOK"],
}

## Measure Configuration

This is the part where most of the relevant metrics will be defined and added to the base cube. We start by loading our PnL vectors into a measure, that will make it both queriable and usable by other subsequent measures, but that will also aggregate it at any desired level.

In [None]:
m["PNL_VECTOR"] = tt.agg.sum(trades_atoti['PNL_VECTOR'])

### VAR 95%

### Computing VaR and ES, defined once, yet applicable to any level of aggregation

Since the "PNL_VECTOR" measure defined above implicitly manages the aggregation of the vectors at any queried level, the only thing left to obtain our VaR metrics is to apply the necessary statistical functions on top of it. 
Note that each of the metrics will only be defined once, yet it will be queriable and computable at **any required granularity of level of aggregation**, which saves us a lot of typing and redundancy compared to pandas, where each level of aggregation would require its own block of code.

In [None]:
m["VaR95"] = tt.array.quantile(m["PNL_VECTOR"], 1 - 0.95)
m["VaR99"] = tt.array.quantile(m["PNL_VECTOR"], 1- 0.99)
m["ES95"] = tt.array.mean(tt.array.n_lowest(m["PNL_VECTOR"], n=12))

### Incremental VaR (Parent VAR - Parent VAR excluding self)

Going even further, and thanks to the semantic dimension introduced by Atoti, the cube understands the hierarchical order and relationship between the different levels of a multi-level hierarchy, and provides functions and elements that leverage those semantics to derive measures based off of those relationships **(the SiblingsScope scope and the parent_value() function)**

In [None]:
# Aggregated vector at parent level, excluding self
m["Parent PnL Vector Ex"] = tt.agg.sum(
    m["PNL_VECTOR"],
    scope=tt.SiblingsScope(hierarchy=h["Trading Book Hierarchy"], exclude_self=True),
)

# VaR at the parent level
m["Parent VaR95"] = tt.parent_value(m["VaR95"], degrees={h["Trading Book Hierarchy"]:1})

# VaR at the parent level excluding self
m["Parent VaR95 Ex"] = tt.array.quantile(m["Parent PnL Vector Ex"],  (0.05))

# Incremental VaR
m["Incremental VaR95"] = m["Parent VaR95"] - m["Parent VaR95 Ex"]

# Performing the analytics

## Using Atoti UI widgets

We will use the Active UI JupyterLab extension to embed widgets and visualize our desired data within the notebook.
Within one widget, we are able to visualize and analyze all of our VaR metrics, at all 3 aggregation levels (Top-of-house is on the Totals row, and you can drill down to get down to the book and trade levels with a couple of clicks)

In [None]:
session.widget

## Performing queries against the cube

Even though this defeats some of the purpose of having Atoti UI as a native and integral part of Atoti, allowing for readily accessible, streamlined, optimized analytics on top of the cube, Atoti still allows for throwing code-based queries against the cube that return a DataFrame as a result. 

We will now perform these queries in order to have **a fair basis for comparison** against the outputs of the pandas implementation. 

However, keep in mind that even though these Atoti queries are optimized and will outperform the pandas approach everytime, using Atoti UI will provide an even better outcome.

the **time and memit magic commands** are used to track, respectively, the **response times** and **memory usage** of these queries.

### Computing Top of House VaR and ES

In [None]:
%%time
%%memit

cube.query(m['VaR95'], m['VaR99'], m['ES95'], mode='raw')

### Computing VaR and ES at BookId and TradeId Level

In [None]:
%%time
%%memit

cube.query(m['VaR95'], m['VaR99'], m['ES95'], levels=[l['ASOFDATE'], l['BOOKID'], l['TRADEID']], mode='raw')

### Computing VaR and ES at the most granular level (combination of all available qualitative hierarchies) 

In [None]:
%%time
%%memit

cube.query(m['VaR95'], m['VaR99'], m['ES95'], levels=[l['ASOFDATE'], l['BUSINESS_UNIT'], l['SUB_BUSINESS_UNIT'], l['TRADING_DESK'] , l['BOOKID'], l['RISKCLASS'], l['TRADEID']], mode='raw')

# What If Scenarios

Going beyond the VaR metrics, Atoti facilitates other components of advanced analytics, namely performing simulations by leveraging its built-in branching capabilities.

Here, we will create a new branch, or scenario, where we will be able to load in a stressed version of our dataset. Then, by simply adding the "Source Simulation" or "Scenario" hierarchy on our columns, we obtain a side-by-side comparison of how each of our measures evaluate for each of our scenarios.

Note that this is extremely optimized as any additional branch will only store and keep track of the data points that present a difference compared to the base branch.

In [None]:
addon_scenario = trades_atoti.scenarios["Stress-Test"]

addon_scenario.load_parquet('s3://bd-connect-london-hybrid-demo-202306/data/simulation/pnl_16.parquet')

In [None]:
session.widget

<div style="text-align: center;" ><a href="https://www.atoti.io/?utm_source=gallery&utm_content=atoti-pandas-comparison" target="_blank" rel="noopener noreferrer"><img src="https://data.atoti.io/notebooks/banners/Your-turn-to-try-Atoti.jpg" alt="Try Atoti"></a></div>