# Tutorial - Evaluate DNBs additional Rules (ARS)

This notebook is a tutorial for the evaluation of DNBs additional Rules for the annual Solvency II reports for solo entities.

Make sure there is a file called 'ars_patterns_additional_rules' in the solvency2-rules folder.
If not, run the 'Tutorial Convert DNBs Additional Validation Rules to Patterns' notebook first.

## Import packages

In [None]:
import pandas as pd  # dataframes
import numpy as np  # mathematical functions, arrays and matrices
from os.path import join, isfile  # some os dependent functionality
import data_patterns  # evaluation of patterns
from pprint import pprint  # pretty print
import logging

## General parameters

In [None]:
# DATAPOINTS_PATH: path to the excel-file containing all possible datapoints (simplified taxonomy)
# RULES_PATH: path to the excel-file with the additional rules
# INSTANCES_DATA_PATH: path to the source data
# RESULTS_PATH: path to the results
DATAPOINTS_PATH = join('..', 'data', 'datapoints')
RULES_PATH = join('..', 'solvency2-rules')
INSTANCES_DATA_PATH = join('..', 'data', 'instances', 'all')
RESULTS_PATH = join('..', 'results') 

In [None]:
# We log to rules.log in the data/instances path
logging.basicConfig(filename = join(INSTANCES_DATA_PATH, 'rules.log'),level = logging.INFO, 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

## Read file with all possible datapoints

We use a simplified taxonomy with all possible datapoints, located in the data/datapoints directory

In [None]:
df_datapoints = pd.read_csv(join(DATAPOINTS_PATH, 'ARS.csv'), sep=";").fillna("")  # load file to dataframe
df_datapoints.head()

## Read input data

We distinguish 2 types of tables: with a closed axis, and with an open axis.

An example of a table with an open axis is the list of assets: an entity reports several 'rows of data' in the relevant table. An example of a closed axis is the balance sheet: an entity reports only 1 balance sheet per period.

### Read tables from source path

We combine all tables with closed axes into one DataFrame. This DataFrame is then used for all validation rules for closed axes tables. 

Tables with an open axis are put in a dictionary of DataFrames. We perform validation rules per table for tables with an open axis.

In [None]:
tables_closed_axis = []  # for listing all input tables with closed axis
tables_open_axis = []  # for listing all input tables with open axis
df_closed_axis = pd.DataFrame()  # one dataframe with all data from closed axis tables
dict_open_axis = {}  # dictionary with all open axis tables

In [None]:
tables_complete_set = df_datapoints.tabelcode.sort_values().unique().tolist()  # list of all ARS tables
tables = [table for table in tables_complete_set 
          if isfile(join(INSTANCES_DATA_PATH, table + '.pickle'))]  # ARS tables found in the specified INSTANCES_DATA_PATH
tables_not_reported = [table for table in tables_complete_set if table not in tables]  # ARS tables not found

In [None]:
for table in tables:
    df = pd.read_pickle(join(INSTANCES_DATA_PATH, table + '.pickle'))  # read dataframe
    
    if df.index.nlevels > 2:  # if more than 2 indexes (entity, period), then the table has an open axis
        # Add to relevant list/dict
        tables_open_axis.append(table)
        dict_open_axis[table] = df 
        
        # Identify which columns within the open axis table make a 'table row' unique (index-columns):
        index_columns_open_axis = [col for col in list(df.index.names) if col not in ['entity','period']]
        
        # Duplicate index-columns to data columns:
        df.reset_index(level=index_columns_open_axis, inplace=True)
        for i in range(len(index_columns_open_axis)):
            df['index_col_' + str(i)] = df[index_columns_open_axis[i]].astype(str)
            df.set_index(['index_col_' + str(i)], append=True, inplace=True)
    else:  # closed axis
        tables_closed_axis.append(table)  # add to relevant list
        
        # Add table to dataframe with all data from closed axis tables
        if len(df_closed_axis) == 0:  # no data yet --> copy dataframe
            df_closed_axis = df.copy()
        else:  # join to existing dataframe
            df_closed_axis = df_closed_axis.join(df)

print('Closed-axis tables:')
pprint(tables_closed_axis)
print()
print('Open-axis tables:')
pprint(tables_open_axis)

### Add not reported datapoints as 0's to the dataframes

Here we perform some necessary data cleaning.

In [None]:
# List with all possible datapoints:
all_datapoints = [x.replace(',,',',') for x in 
                  list(df_datapoints['tabelcode'] + ',' + df_datapoints['rij'] + ',' + df_datapoints['kolom'])]
# List with all possible datapoints for closed axis tables:
all_datapoints_closed = [x for x in all_datapoints if x[:13] in tables_closed_axis]
# List with all possible datapoints for open axis tables:
all_datapoints_open = [x for x in all_datapoints if x[:13] in tables_open_axis]

# Add not reported datapoints to the dataframe with data from closed axis tables
for col in [column for column in all_datapoints_closed if column not in list(df_closed_axis.columns)]:
    df_closed_axis[col] = np.nan
df_closed_axis.fillna(0, inplace = True)

# Add not reported datapoints to the dataframes with data from open axis tables
for table in [table for table in dict_open_axis.keys()]:
    all_datapoints_table = [x for x in all_datapoints_open if x[:13] == table]
    for col in [column for column in all_datapoints_table if column not in list(dict_open_axis[table].columns)]:
        dict_open_axis[table][col] = np.nan
    dict_open_axis[table].fillna(0, inplace = True)

## Read DNBs Additional Validation Rules

DNBs additional validation rules are currently published as an Excel file on the DNB statistics website. We included the Excel file in the project under data/downloaded files.

The rules are already converted to a syntax Python can interpret, using the notebook: 'Convert DNBs Additional Validation Rules to Patterns'. In the next line of code we read these converted rules (patterns).

In [None]:
df_patterns = pd.read_excel(join(RULES_PATH, 'ars_patterns_additional_rules.xlsx')).fillna("").set_index('index')

At first, we're interested in patterns for closed-axis tables. Herefore we need to filter out:
- patterns pointing to tables that are not reported;
- patterns for open-axis tables.

In [None]:
df_patterns_closed_axis = df_patterns.copy()
df_patterns_closed_axis = df_patterns_closed_axis[df_patterns_closed_axis['pandas ex'].apply(
    lambda expr: not any(table in expr for table in tables_not_reported) 
    and not any(table in expr for table in tables_open_axis))]
df_patterns_closed_axis.head()

## Evaluate patterns for tables with a closed axis

We now have:
- the data for closed-axis tables in a dataframe;
- the patterns for closed-axis tables in a dataframe.

To evaluate the patterns we need to create a 'PatternMiner' (part of the data_patterns package), and run the analyze function.

In [None]:
miner = data_patterns.PatternMiner(df_patterns=df_patterns_closed_axis)
df_results_closed_axis = miner.analyze(df_closed_axis)
df_results_closed_axis.head()

## Evaluate patterns for tables with an open axis

First find the patterns defined for open-axis tables

In [None]:
df_patterns_open_axis = df_patterns.copy()
df_patterns_open_axis = df_patterns_open_axis[df_patterns_open_axis['pandas ex'].apply(
    lambda expr: any(table in expr for table in tables_open_axis))]

Patterns involving multiple open-axis tables are not yet supported

In [None]:
import regex as re
df_patterns_open_axis = df_patterns_open_axis[df_patterns_open_axis['pandas ex'].apply(
    lambda expr: len(set(re.findall('S.\d\d.\d\d.\d\d.\d\d',expr)))) == 1]
df_patterns_open_axis.head()

Next we loop through the open-axis tables en evaluate the corresponding patterns on the data

In [None]:
output_open_axis = {}  # dictionary with input and results per table
for table in tables_open_axis:  # loop through open-axis tables
    if df_patterns_open_axis['pandas ex'].apply(lambda expr: table in expr).sum() > 0:  # check if there are patterns
        info = {}
        info['data'] = dict_open_axis[table]  # select data
        info['patterns'] = df_patterns_open_axis[df_patterns_open_axis['pandas ex'].apply(
            lambda expr: table in expr)]  # select patterns
        miner = data_patterns.PatternMiner(df_patterns=info['patterns'])
        info['results'] = miner.analyze(info['data'])  # evaluate patterns
        output_open_axis[table] = info

Print results for the first table (if there are rules for tables with an open axis)

In [None]:
if len(output_open_axis.keys()) > 0:
    display(output_open_axis[list(output_open_axis.keys())[0]]['results'].head())

## Combine and export results for closed and open axis tables

In [None]:
# Function to transform results for open-axis tables, so it can be appended to results for closed-axis tables
# The 'extra' index columns are converted to data columns
def transform_results_open_axis(df):
    if df.index.nlevels > 2:
        reset_index_levels = list(range(2, df.index.nlevels))
        df = df.reset_index(level=reset_index_levels)
        rename_columns={}
        for x in reset_index_levels:
            rename_columns['level_' + str(x)] = 'id_column_' + str(x - 1)
        df.rename(columns=rename_columns, inplace=True)
    return df

In [None]:
df_results = df_results_closed_axis.copy()  # results for closed axis tables
for table in list(output_open_axis.keys()):  # for all open axis tables with rules -> append and sort patterns and results
    df_results = transform_results_open_axis(output_open_axis[table]['results']).append(df_results, sort=False).sort_values(by=['pattern_id']).sort_index()

Change column order so the dataframe starts with the identifying columns:

In [None]:
list_col_order = []
for i in range(1, len([col for col in list(df_results.columns) if col[:10] == 'id_column_']) + 1):
    list_col_order.append('id_column_' + str(i))
list_col_order.extend(col for col in list(df_results.columns) if col not in list_col_order)
df_results = df_results[list_col_order]
df_results.head()

## Save results

The dataframe df_all_results contains all output of the evaluation of the validation rules. 

In [None]:
# To save all results use df_results
# To save all exceptions use df_results['result_type']==False 
# To save all confirmations use df_results['result_type']==True

# Here we save only the exceptions to the validation rules
df_results[df_results['result_type']==False].to_excel(join(RESULTS_PATH, "results.xlsx"))

In [None]:
# Get the pandas code from the first pattern and evaluate it
s = df_patterns.loc[12, 'pandas ex'].replace('df', 'df_closed_axis')
print(s)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # print whole dataframe
    display(eval(s).T)