Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New function: index of the k-th smallest #153

Closed
APolyakova opened this issue Sep 18, 2020 · 10 comments
Closed

New function: index of the k-th smallest #153

APolyakova opened this issue Sep 18, 2020 · 10 comments
Assignees
Milestone

Comments

@APolyakova
Copy link

APolyakova commented Sep 18, 2020

Description

I need a function to compute the index of k-th smallest value in an array

Context

I'm trying to solve the following task: I want to evaluate VaR scenario for a global portfolio and then use this scenario index to filter PL vector - that single PL value (equal to "simple"+"lower" quantile) I'll be breaking down by sub-portfolios in additive manner - this is called "component VaR" or "LEstimated VaR" type of calculation. It seems that I need more functions in atoti to operate with indexes of a sorted array efficiently.

I guess what I need is similar to https://stackoverflow.com/questions/34226400/find-the-index-of-the-k-smallest-values-of-a-numpy-array

Other information (if relevant)

the LEstimated VaR I want to add to the VaR Notebook in the atoti gallery. So far the only solution I found is as follows:

# VaR - this sould match to the k-th smallest scenario - but it doesn't
m["VaR"] = atoti.array.quantile(
    m["Position Vector"],
    (1 - m["Confidence Level"]),
    interpolation="lower",
    mode="simple",
)


# SInce VaR doesn't match k-th smallest - I'm picking it manually
vectorSize = atoti.array.len(m["Position Vector"])
m["VaR Rank Current Portfolio"] = atoti.floor((1 - m["Confidence Level"]) * vectorSize)
# Creating this measure - should match to VaR but for some reason it does not
m["VaR Scenario PL"] = atoti.array.sort(m["Position Vector"])[m["VaR Rank Current Portfolio"]]


# Scenarios
cube.create_parameter_hierarchy(
    "Scenario Id", [i for i in range(272)], index_measure="Scenario Index",
)

m["PL at Index"] = m["Position Vector"][m["Scenario Index"]]

# Picking Scenario Id
var_scenario =  atoti.where(m["PL at Index"] == m["VaR Scenario PL"], m["Scenario Index"], None)
h['Scenario Id'].slicing = False
# Using "min" to select a non-empty id across scenarios...
m['VaR Scenario Id'] = atoti.agg.min(var_scenario, scope = atoti.scope.origin(lvl['Scenario Id']))
@robbiemouat
Copy link

numpy contains arg* versions of array functions that return indices instead of values from an array (and there are similar methods in our IVector.java).

Similar functions in atoti.array would make a nice complement to the "element at index" operation, especially for capital allocation/additive decomposition.

@fabiencelier
Copy link
Contributor

Hello,

If I understand correctly you want to get the index corresponding to a quantile of a vector then take the value at this index in another vector ?
I'm not sure how adding the index of the k-th smallest would help for that, what "k" would you take ?

numpy contains arg* versions of array functions that return indices instead of values from an array (and there are similar methods in our IVector.java).

In the ActivePivot IVector API there are bottomKIndices, topKIndices and quantileIndex

I am trying to think of new functions that we can add to solve this issue that are general enough to be reused.

  • index_of

You could compute the quantile then take the index of this quantile to use it on other arrays.

var_index = tt.array.index_of(m["quantile"], m["vector"])
  • sort_indices

That would return an array of sorted indices

m["vector"] = [ 100.0, 50.0, 200.0, 150.0, 0.0]
m["sorted indices"] = tt.array.sort_indices(m["vector"])
cube.query(m["sorted indices"])
> [ 4, 1, 0, 3, 2]  (0.0 at position 4 is the smallest, then 50.0 at position 1 ...)
m["k smallest index"] = m["sorted indices"][k]
  • quantile_index

That would return the index of the quantile instead of the value.
It is very specific to quantile and not really reusable.
It exists in the Java API but as atoti has its own quantile implementation we would need reimplement quantile_index anyway.

var_index = tt.array.quantile_index(m["vector"], q=...)

@fabiencelier
Copy link
Contributor

Also here is an example/workaround of how I think you can do it by exploding your vector and using the rank function:

import atoti as tt
import pandas as pd
df = pd.DataFrame({
    "id": [0,1],
    "value": [[10.0,20.0,30.0,40.0,50.0], [45.0,0.0,20.0,50.0,60.0]],
    "other": [[50,100,150,200,250], [50,100,150,200,250]]
})
session = tt.create_session()

store = session.read_pandas(df, store_name="vectors")
cube = session.create_cube(store)
m,l,h = cube.measures, cube.levels, cube.hierarchies

# Explode the vector with a date hierarchy
cube.create_parameter_hierarchy("date",["day1", "day2", "day3", "day4", "day5"], index_measure="date_index")
h["date"].slicing = False

m["value at date"] = m["value.SUM"][m["date_index"]]
m["other at date"] = m["other.SUM"][m["date_index"]]

# Let's rank the date according to the value at the date
m["day rank"] = tt.rank(m["value at date"], hierarchy=h["date"])
cube.query( m["value at date"], m["day rank"], m["other at date"], levels=l["date"])
date value at date day rank other at date
day1 55.00 3 100
day2 20.00 1 200
day3 50.00 2 300
day4 90.00 4 400
day5 110.00 5 500
# Use this rank to get the value at the rank we want
m["2nd worst value"] = tt.agg.single_value(
    tt.where(m["day rank"] == 2, m["value at date"])
)
m["other at 2nd worst value"] = tt.agg.single_value(
    tt.where(m["day rank"] == 2,m["other at date"])
)
cube.query(m["2nd worst value"], m["other at 2nd worst value"] )
2nd worst value other at 2nd worst value
50.00 300

@robbiemouat
Copy link

Hi @fabiencelier, The bottomKIndices (and topKIndices) from IVector would also be useful. Beyond VaR, we have the concept of a "tail measure" where only the worst scenarios are considered.

For example: the 99% ES (Expected Shortfall) is the average over the worst 1% of values. And is more amenable to LEstimator capital allocation than VaR.

Another example, for component VaR, the Harrell-Davis Quantile Estimator (i.e. Beta function) has its support (non-zero values) clustered around the quantile -- the quantile is calculated as the dot product of the beta function and the (sorted) array. The Beta function approximates a gaussian (for large arrays) and this is a more stable calculation of the quantile (especially for component VaR and Monte Carlo simulations).

For these cases, sort_indices will work. But we might only be looking at the bottom 1000 values of an array of size 1m (e.g. for FRTB IMA DRC), so bottomKIndices will save a lot of sorting effort.

@fabiencelier
Copy link
Contributor

fabiencelier commented Sep 23, 2020

Ok, I am now convinced by bottomKIndices/topKIndices.

It can be as general as sort_indices if you take the length of the array as "k" and it can be more performant if necessary.

I will add them and call them n_lowest_indices and n_greatest_indices to be consistent n_greatest and n_lowest

the nth lowest index will be possible like this:

m["nth lowest index"] = atoti.array.n_lowest_indices(m["array"], n)[n - 1]

@APolyakova
Copy link
Author

Thanks!

@tibdex tibdex added this to the Next release milestone Sep 24, 2020
@fabiencelier
Copy link
Contributor

fabiencelier commented Sep 24, 2020

n_lowest_indices and n_greatest_indices will be available in the next release.

You can try them in the latest continuous build.

@APolyakova
Copy link
Author

Hi @fabiencelier

I'm computing the rank of the VaR as follows:

m["VaR Rank Current Portfolio"] = atoti.math.floor((1 - m["Confidence Level"]) * vectorSize)

For my confidence level and vector length if gives me 13.00.

Then I want to pick the VaR scenario index as follows:

m['Tail Indices'] = atoti.array.n_lowest_indices(m["Position Vector"], m["VaR Rank Current Portfolio"])
m['VaR Index'] = m['Tail Indices'][m["VaR Rank Current Portfolio"]-1]

The n_lowest_indices function is giving this exception UnsupportedOperationException: Cannot read 'int' value from an instance of ArrayChunkLongNullable

Do I need to convert data types somehow? and is there a way?

Thank you!

@fabiencelier
Copy link
Contributor

I manage to reproduce, as it is more general than just this function I have open a new issue: #159

There is no way to cast in atoti, but we should automatically handle int and long for you

@fabiencelier fabiencelier self-assigned this Sep 28, 2020
@fabiencelier
Copy link
Contributor

Side note: when #149 is added you will be able to get the last element of the array with -1 as index:

m['VaR Index'] = m['Tail Indices'][-1]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants