New function: index of the k-th smallest #153

APolyakova · 2020-09-18T07:03:00Z

Description

I need a function to compute the index of k-th smallest value in an array

Context

I'm trying to solve the following task: I want to evaluate VaR scenario for a global portfolio and then use this scenario index to filter PL vector - that single PL value (equal to "simple"+"lower" quantile) I'll be breaking down by sub-portfolios in additive manner - this is called "component VaR" or "LEstimated VaR" type of calculation. It seems that I need more functions in atoti to operate with indexes of a sorted array efficiently.

I guess what I need is similar to https://stackoverflow.com/questions/34226400/find-the-index-of-the-k-smallest-values-of-a-numpy-array

Other information (if relevant)

the LEstimated VaR I want to add to the VaR Notebook in the atoti gallery. So far the only solution I found is as follows:

# VaR - this sould match to the k-th smallest scenario - but it doesn't
m["VaR"] = atoti.array.quantile(
    m["Position Vector"],
    (1 - m["Confidence Level"]),
    interpolation="lower",
    mode="simple",
)


# SInce VaR doesn't match k-th smallest - I'm picking it manually
vectorSize = atoti.array.len(m["Position Vector"])
m["VaR Rank Current Portfolio"] = atoti.floor((1 - m["Confidence Level"]) * vectorSize)
# Creating this measure - should match to VaR but for some reason it does not
m["VaR Scenario PL"] = atoti.array.sort(m["Position Vector"])[m["VaR Rank Current Portfolio"]]


# Scenarios
cube.create_parameter_hierarchy(
    "Scenario Id", [i for i in range(272)], index_measure="Scenario Index",
)

m["PL at Index"] = m["Position Vector"][m["Scenario Index"]]

# Picking Scenario Id
var_scenario =  atoti.where(m["PL at Index"] == m["VaR Scenario PL"], m["Scenario Index"], None)
h['Scenario Id'].slicing = False
# Using "min" to select a non-empty id across scenarios...
m['VaR Scenario Id'] = atoti.agg.min(var_scenario, scope = atoti.scope.origin(lvl['Scenario Id']))

robbiemouat · 2020-09-22T00:53:39Z

numpy contains arg* versions of array functions that return indices instead of values from an array (and there are similar methods in our IVector.java).

Similar functions in atoti.array would make a nice complement to the "element at index" operation, especially for capital allocation/additive decomposition.

fabiencelier · 2020-09-22T10:41:35Z

Hello,

If I understand correctly you want to get the index corresponding to a quantile of a vector then take the value at this index in another vector ?
I'm not sure how adding the index of the k-th smallest would help for that, what "k" would you take ?

numpy contains arg* versions of array functions that return indices instead of values from an array (and there are similar methods in our IVector.java).

In the ActivePivot IVector API there are bottomKIndices, topKIndices and quantileIndex

I am trying to think of new functions that we can add to solve this issue that are general enough to be reused.

index_of

You could compute the quantile then take the index of this quantile to use it on other arrays.

var_index = tt.array.index_of(m["quantile"], m["vector"])

sort_indices

That would return an array of sorted indices

m["vector"] = [ 100.0, 50.0, 200.0, 150.0, 0.0]
m["sorted indices"] = tt.array.sort_indices(m["vector"])
cube.query(m["sorted indices"])
> [ 4, 1, 0, 3, 2]  (0.0 at position 4 is the smallest, then 50.0 at position 1 ...)
m["k smallest index"] = m["sorted indices"][k]

quantile_index

That would return the index of the quantile instead of the value.
It is very specific to quantile and not really reusable.
It exists in the Java API but as atoti has its own quantile implementation we would need reimplement quantile_index anyway.

var_index = tt.array.quantile_index(m["vector"], q=...)

fabiencelier · 2020-09-22T12:13:21Z

Also here is an example/workaround of how I think you can do it by exploding your vector and using the rank function:

import atoti as tt
import pandas as pd
df = pd.DataFrame({
    "id": [0,1],
    "value": [[10.0,20.0,30.0,40.0,50.0], [45.0,0.0,20.0,50.0,60.0]],
    "other": [[50,100,150,200,250], [50,100,150,200,250]]
})
session = tt.create_session()

store = session.read_pandas(df, store_name="vectors")
cube = session.create_cube(store)
m,l,h = cube.measures, cube.levels, cube.hierarchies

# Explode the vector with a date hierarchy
cube.create_parameter_hierarchy("date",["day1", "day2", "day3", "day4", "day5"], index_measure="date_index")
h["date"].slicing = False

m["value at date"] = m["value.SUM"][m["date_index"]]
m["other at date"] = m["other.SUM"][m["date_index"]]

# Let's rank the date according to the value at the date
m["day rank"] = tt.rank(m["value at date"], hierarchy=h["date"])
cube.query( m["value at date"], m["day rank"], m["other at date"], levels=l["date"])

date	value at date	day rank	other at date

day1	55.00	3	100
day2	20.00	1	200
day3	50.00	2	300
day4	90.00	4	400
day5	110.00	5	500

# Use this rank to get the value at the rank we want
m["2nd worst value"] = tt.agg.single_value(
    tt.where(m["day rank"] == 2, m["value at date"])
)
m["other at 2nd worst value"] = tt.agg.single_value(
    tt.where(m["day rank"] == 2,m["other at date"])
)
cube.query(m["2nd worst value"], m["other at 2nd worst value"] )

2nd worst value	other at 2nd worst value
50.00	300

robbiemouat · 2020-09-22T22:02:34Z

Hi @fabiencelier, The bottomKIndices (and topKIndices) from IVector would also be useful. Beyond VaR, we have the concept of a "tail measure" where only the worst scenarios are considered.

For example: the 99% ES (Expected Shortfall) is the average over the worst 1% of values. And is more amenable to LEstimator capital allocation than VaR.

Another example, for component VaR, the Harrell-Davis Quantile Estimator (i.e. Beta function) has its support (non-zero values) clustered around the quantile -- the quantile is calculated as the dot product of the beta function and the (sorted) array. The Beta function approximates a gaussian (for large arrays) and this is a more stable calculation of the quantile (especially for component VaR and Monte Carlo simulations).

For these cases, sort_indices will work. But we might only be looking at the bottom 1000 values of an array of size 1m (e.g. for FRTB IMA DRC), so bottomKIndices will save a lot of sorting effort.

fabiencelier · 2020-09-23T09:10:19Z

Ok, I am now convinced by bottomKIndices/topKIndices.

It can be as general as sort_indices if you take the length of the array as "k" and it can be more performant if necessary.

I will add them and call them n_lowest_indices and n_greatest_indices to be consistent n_greatest and n_lowest

the nth lowest index will be possible like this:

m["nth lowest index"] = atoti.array.n_lowest_indices(m["array"], n)[n - 1]

APolyakova · 2020-09-24T00:49:18Z

Thanks!

fabiencelier · 2020-09-24T07:34:32Z

n_lowest_indices and n_greatest_indices will be available in the next release.

You can try them in the latest continuous build.

APolyakova · 2020-09-28T04:17:27Z

Hi @fabiencelier

I'm computing the rank of the VaR as follows:

m["VaR Rank Current Portfolio"] = atoti.math.floor((1 - m["Confidence Level"]) * vectorSize)

For my confidence level and vector length if gives me 13.00.

Then I want to pick the VaR scenario index as follows:

m['Tail Indices'] = atoti.array.n_lowest_indices(m["Position Vector"], m["VaR Rank Current Portfolio"])
m['VaR Index'] = m['Tail Indices'][m["VaR Rank Current Portfolio"]-1]

The n_lowest_indices function is giving this exception UnsupportedOperationException: Cannot read 'int' value from an instance of ArrayChunkLongNullable

Do I need to convert data types somehow? and is there a way?

Thank you!

fabiencelier · 2020-09-28T09:56:40Z

I manage to reproduce, as it is more general than just this function I have open a new issue: #159

There is no way to cast in atoti, but we should automatically handle int and long for you

fabiencelier · 2020-09-28T15:52:09Z

Side note: when #149 is added you will be able to get the last element of the array with -1 as index:

m['VaR Index'] = m['Tail Indices'][-1]

APolyakova added the enhancement ✨ label Sep 18, 2020

tibdex added this to the Next release milestone Sep 24, 2020

fabiencelier self-assigned this Sep 28, 2020

APolyakova mentioned this issue Sep 29, 2020

Lestimated var atoti/notebooks#70

Merged

patachoux bot closed this as completed Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New function: index of the k-th smallest #153

New function: index of the k-th smallest #153

APolyakova commented Sep 18, 2020 •

edited by jbe456

Loading

robbiemouat commented Sep 22, 2020

fabiencelier commented Sep 22, 2020

fabiencelier commented Sep 22, 2020

robbiemouat commented Sep 22, 2020

fabiencelier commented Sep 23, 2020 •

edited

Loading

APolyakova commented Sep 24, 2020

fabiencelier commented Sep 24, 2020 •

edited

Loading

APolyakova commented Sep 28, 2020

fabiencelier commented Sep 28, 2020

fabiencelier commented Sep 28, 2020

New function: index of the k-th smallest #153

New function: index of the k-th smallest #153

Comments

APolyakova commented Sep 18, 2020 • edited by jbe456 Loading

Description

Context

Other information (if relevant)

robbiemouat commented Sep 22, 2020

fabiencelier commented Sep 22, 2020

fabiencelier commented Sep 22, 2020

robbiemouat commented Sep 22, 2020

fabiencelier commented Sep 23, 2020 • edited Loading

APolyakova commented Sep 24, 2020

fabiencelier commented Sep 24, 2020 • edited Loading

APolyakova commented Sep 28, 2020

fabiencelier commented Sep 28, 2020

fabiencelier commented Sep 28, 2020

APolyakova commented Sep 18, 2020 •

edited by jbe456

Loading

fabiencelier commented Sep 23, 2020 •

edited

Loading

fabiencelier commented Sep 24, 2020 •

edited

Loading