## SI Calculation: an example using generated sample trades

Here we will generate sample trades from the RTS 2 Annex III taxonomy.  Each sample trade is then enriched with the information needed run an SI calculation.

Once the trade data is assembled the the data normally provided by the regulator is synthesised.

Lastly, the SI calculations are run

The SI calculation includes a number of tests.  See the official word:
https://ec.europa.eu/transparency/regdoc/rep/3/2016/EN/3-2016-2398-EN-F1-1.PDF

# Step 1 - Prepare the trade data

The first step is to use the RTS 2 Annex III taxonomy to generate some sample trades.


In [1]:
import rts2_annex3
import random
import json

random.seed()

root = rts2_annex3.class_root

asset_class = root.asset_class_by_name("Credit Derivatives")

# Ask the Asset class to generate some sample trade
sample_trades = asset_class.make_test_samples(number=500)

# Print the one of the generated trades
print(vars(random.choice(sample_trades)))


{'to_date': datetime.date(2018, 5, 24), 'ref_entity_type': 'ref_entity_type.value', 'asset_class_name': 'Credit Derivatives', 'sub_asset_class_name': 'Single name credit default swap (CDS)', 'from_date': datetime.date(2018, 4, 24), 'notional_currency': 'notional_currency.value', 'underlying_ref_entity': 'underlying_ref_entity.value'}


## LEIs

In a real firm with real trades we would need to know the LEI (Legal Entity Identifier) of the legal entity which did each trade because SI status is reported distinctly for each legal entity (LEI).

Quite often firms will do trades within a single legal entity, perhaps to move risk from one trading desk to another.  These are called intra-entity trades and must be filtered out before the SI calculation.  For this example we'll say that all the trades we generated are inter-entity trades, trades between distinct legal entities, so we count them all.

In this example we'll use just one LEI, and not even a valid one, but it will suffice for the example.

In [2]:
# Add our LEI to each trade
our_lei = 'Our_LEI_here'
for sample_trade in sample_trades:
    sample_trade.our_lei = our_lei

# Print the one of the modified sample trades
print(vars(random.choice(sample_trades)))

{'our_lei': 'Our_LEI_here', 'to_date': datetime.date(2018, 7, 23), 'asset_class_name': 'Credit Derivatives', 'sub_asset_class_name': 'Single name CDS options', 'cds_sub_class': 'cds_sub_class.value', 'from_date': datetime.date(2018, 4, 24)}


## Trade Date
The SI calculation includes checks for frequency, the number of trades done in a single week.  To work that out we need a trade date for each trade.  Here we'll just use a few dates and add these to our sample trades.

In [3]:
# We give each sample trade a trade date in a 30 day range of dates
import datetime

sample_dates = []
today = datetime.date.today()
for day_number in range(-30, 0):
    a_date =  today + datetime.timedelta(day_number)
    if a_date.weekday() < 6:
        sample_dates.append(a_date)

for sample_trade in sample_trades:
    sample_trade.trade_date = random.choice(sample_dates)
    
# Print the one of the modified sample trades
print(vars(random.choice(sample_trades)))

{'asset_class_name': 'Credit Derivatives', 'sub_asset_class_name': 'Bespoke basket credit default swap (CDS)', 'our_lei': 'Our_LEI_here', 'trade_date': datetime.date(2018, 4, 12)}


## MIC
The Market Identifier Code (MIC) is the ISO 10383 ID for a trading venue, for example a stock exchange.  The regulator is expected to provide a list of MIC values which identify venues which are recognised for the purposes of the SI calculation.  Trades which are done on vs. off recognised venues are counted differently.

In [4]:
# We define our MICs.  A MIC value is always 4 charcters in length.  The values used
# here are made-up nonsense, but good enough for an illustration

eea_mics = ['EEA1', 'EEA2', 'EEA3']
non_eea_mics = ['OFF1', 'OFF2', 'OFF3', 'OFF4']
all_mics = eea_mics + non_eea_mics

# Add a MIC to each sample trade
for sample_trade in sample_trades:
    sample_trade.mic = random.choice(all_mics)

# Print the one of the modified sample trades
print(vars(random.choice(sample_trades)))

{'trade_date': datetime.date(2018, 4, 2), 'sub_asset_class_name': 'CDS index options', 'from_date': datetime.date(2018, 4, 24), 'mic': 'OFF3', 'to_date': datetime.date(2021, 1, 18), 'asset_class_name': 'Credit Derivatives', 'our_lei': 'Our_LEI_here', 'cds_index_sub_class': 'cds_index_sub_class.value'}


## Own Account

We need to know if a trade was done on the firms own account.  Such trades are counted differently. 

In [5]:
# Own Account is simply a boolean.  Either this is a trade which the regulator views
# as being on own account, or not.  I use a random boolean with a probability.

own_account_probability = 0.25

for sample_trade in sample_trades:
    sample_trade.own_account = random.random() < own_account_probability
    
# Print the one of the modified sample trades
print(vars(random.choice(sample_trades)))

{'trade_date': datetime.date(2018, 3, 31), 'sub_asset_class_name': 'Single name CDS options', 'cds_sub_class': 'cds_sub_class.value', 'from_date': datetime.date(2018, 4, 24), 'mic': 'EEA3', 'to_date': datetime.date(2021, 1, 18), 'own_account': False, 'asset_class_name': 'Credit Derivatives', 'our_lei': 'Our_LEI_here'}


## Client Order

We need to know if a trade was done in response to a client order.  Such trades are counted differently. 

In [6]:
# Client Order is also simply a boolean.  Either this is a trade which was done
# in response to a client order, or not.  I use a random boolean.

client_order_probability = 0.5

for sample_trade in sample_trades:
    sample_trade.client_order = random.random() < client_order_probability
    
# Print the one of the modified sample trades
print(vars(random.choice(sample_trades)))

{'ref_entity_type': 'ref_entity_type.value', 'trade_date': datetime.date(2018, 4, 18), 'sub_asset_class_name': 'Single name credit default swap (CDS)', 'from_date': datetime.date(2018, 4, 24), 'client_order': True, 'mic': 'EEA1', 'to_date': datetime.date(2018, 7, 23), 'own_account': True, 'asset_class_name': 'Credit Derivatives', 'our_lei': 'Our_LEI_here', 'notional_currency': 'notional_currency.value', 'underlying_ref_entity': 'underlying_ref_entity.value'}


## EUR Notional
Another measure used by the SI calculation is the EUR notional value of each trade.  Here we assign a notional value to each trade.

In [7]:
# Add a random-ish Euro Notional amount of n million EUR to each trade
notional_amounts = [x * 1000000 for x in [1, 1, 1, 2, 2, 5, 10, 25]]

for sample_trade in sample_trades:
    sample_trade.eur_notional = random.choice(notional_amounts)

# Print the one of the modified sample trades
print(vars(random.choice(sample_trades)))

{'eur_notional': 2000000, 'own_account': False, 'asset_class_name': 'Credit Derivatives', 'sub_asset_class_name': 'Other credit derivatives', 'our_lei': 'Our_LEI_here', 'client_order': True, 'mic': 'OFF3', 'trade_date': datetime.date(2018, 4, 18)}


## RTS 2 Annex III Classification
The last step before we start the SI calculation is to add the RTS 2 Annex III classification to each trade.

In [8]:
# Now classify each trade and add the JSON classification back to the trade
for sample_trade in sample_trades:
    classification = root.classification_for(subject=sample_trade)
    json_classification = json.dumps(classification.classification_dict())
    sample_trade.rts2_classification = json_classification

print(random.choice(sample_trades).rts2_classification)

{"RTS2 version": "EU 2017/583 of 14 July 2016", "Asset class": "Credit Derivatives", "Sub-asset class": "Single name CDS options", "Segmentation criterion 1 description": "single name CDS sub-class as specified for the sub-asset class of single name CDS", "Segmentation criterion 1": "cds_sub_class.value", "Segmentation criterion 2 description": "time maturity bucket of the option defined as follows:", "Segmentation criterion 2": "Maturity bucket 1: Zero to 6 months"}


## Put the trade data in to Pandas tables

The SI calculation requires a number of selections of the trade population.  See the comments below for details of each selection.

In [9]:
# Put the essential information for each trade into a Pandas table

import pandas as pd

def si_details_from_sample(sample_trade):
    return dict(
        lei=sample_trade.our_lei,
        trade_date=sample_trade.trade_date,
        mic=sample_trade.mic,
        own_account=sample_trade.own_account,
        client_order=sample_trade.client_order,
        eur_notional=sample_trade.eur_notional,
        rts2_classification=sample_trade.rts2_classification,
        )

# The set of all trades (by LEI if there is more than one)
all_trades = pd.DataFrame.from_records([si_details_from_sample(s) for s in sample_trades])

# The subset of all trades which were not done on an EEA venue (i.e. OTC trades)
otc_trades = all_trades[~all_trades.mic.isin(eea_mics)]

# The subset of OTC trades which were done on the banks own account
own_account_otc_trades = otc_trades[otc_trades.own_account]

# The subset of own account OTC trades which were done in response to client orders
client_own_account_otc_trades = own_account_otc_trades[own_account_otc_trades.client_order]


# Step 2 - The Regulator Supplied Data

The regulator is expected to provide information about each sub class:
* Is the sub class liquid?
* How many trades of that sub class were done in the whole EU?
* What is the total EUR notional value traded in that sub class in the whole EU?

We don't have any regulator supplied data here so we synthesise some.

In [10]:
# For every RTS 2 sub class we need to decide if it is liquid or not
# We simply generate a random true/false for each sub class and use
# a dictionary to hold the result so we can look it up later.

distinct_sub_classes = all_trades.rts2_classification.unique()
liquidity_dictionary = dict()
for sub_class in distinct_sub_classes:
    is_liquid = random.random() < 0.5
    liquidity_dictionary[sub_class] = is_liquid
liquidity_dictionary.values()


dict_values([True, False, False, False, False, True, False, False, True, False, True, False])

In [15]:
# The values which will be compared with the EU trade count and sum(eur_notional)
# are the counts and totals of the own account OTC trades which were done in 
# response to client orders.  We synthesise the test EU numbers from these.

# First we get the counts and sum of notional, grouping by sub class (RTS 2 string)
notional_by_sub_class = client_own_account_otc_trades[['rts2_classification', 'eur_notional']] \
    .groupby(by='rts2_classification')
sums_series = notional_by_sub_class.agg(['count', 'sum'])
sums_df = pd.DataFrame(sums_series)

# We add a column for the EU trade count for each sub class.  For this exercise, the
# threshold for being an SI is if our LEO count for the subclass is >= 2.5% of
# the EU count.  The EU figure is randomly set to be a bit more or a bit less than
# will trigger SI status.

sums_df['eu_count'] = sums_df['eur_notional']['count']\
    .apply(lambda x: x * 40 + random.choice([x * -1, x]) )

# Now we ass a columns for the EU notional for each sub class.  The threshold for
# notional is 1% of the EU figure.  Again the EU number is randomly tweaked
sums_df['eu_eur_notional'] = sums_df['eur_notional']['sum']\
    .apply(lambda x: x * 100 + random.choice([x * -1, x]) )

sums_df.head(2)

Unnamed: 0_level_0,eur_notional,eur_notional,eu_count,eu_eur_notional
Unnamed: 0_level_1,count,sum,Unnamed: 3_level_1,Unnamed: 4_level_1
rts2_classification,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
"{""RTS2 version"": ""EU 2017/583 of 14 July 2016"", ""Asset class"": ""Credit Derivatives"", ""Sub-asset class"": ""Bespoke basket credit default swap (CDS)""}",5,13000000,195,1287000000
"{""RTS2 version"": ""EU 2017/583 of 14 July 2016"", ""Asset class"": ""Credit Derivatives"", ""Sub-asset class"": ""CDS index options"", ""Segmentation criterion 1 description"": ""CDS index sub-class as specified for the sub-asset class of index credit default swap (CDS )"", ""Segmentation criterion 1"": ""cds_index_sub_class.value"", ""Segmentation criterion 2 description"": ""time maturity bucket of the option defined as follows:"", ""Segmentation criterion 2"": ""Maturity bucket 1: Zero to 6 months""}",4,21000000,164,2121000000


# Step 3 - Do the SI calculation

The "calculation" is really a set of filters which might catch an RTS 2 subclass for an LEI

1. If the RTS 2 Annex III sub class is liquid
   - and the count of client own-account otc trades >= 2.5% of eu_rts2_trade_count
   - and average weekly number of client own-account otc trades >= 1
2. If the RTS 2 Annex III sub class is not liquid 
   - and average weekly number of client own-account otc trade >= 1
3. If the sum of EUR notional for client own-account otc trades is
   - \>= 25% of all trades notional for the LEI
   - **or** >= 1% of EU trade notional

In [None]:
# Filter 1:  For the sub_classes which are liquid check the counts and frequency ...
