### In this first example, we will explain the functionality of the SnapRollRateTable class

### SnapRollRateTable

In contrast with month over month roll rates, sometimes the bad definition is done by examining the worst delinquency of the clients between time windows. An observation and a performance window are defined along with a snapshot month. The length of the windows is measured in months, so for example a 6x6 snapshot roll rate would indicate a 6 month observation and perofrmance window. The snapshot month is defined in order to specify which accounts we want to monitor. **Also the snapshot month is the last month of the observation window**. What is being measured is the the performance of the accounts during the performance window with respect to their performance in observation window.

In [1]:
import os
import polars as pl
from roll_rate_analysis import SnapshotRollRateTable

os.chdir("..\\tests\\simulation_data")
snapshot_file = "test_sample_4.csv"

obs_files = [
    "test_sample_0.csv",
    "test_sample_1.csv",
    "test_sample_2.csv",
    "test_sample_3.csv",
    "test_sample_4.csv",
]
perf_files = [
    "test_sample_5.csv",
    "test_sample_6.csv",
    "test_sample_7.csv",
    "test_sample_8.csv",
    "test_sample_9.csv",
]

The files are of the same strucure with the files on the previous examples

In [2]:
srr_table = SnapshotRollRateTable(
    snapshot_file=snapshot_file,
    unique_key_col="id",
    delinquency_col="delq",
    obs_files=obs_files,
    perf_files=perf_files,
    max_delq=6,
)

In [3]:
srr_table.compute()

In [4]:
srr_table.get_roll_rates()

Unnamed: 0,0_cycle_delinquent,1_cycle_delinquent,2_cycle_delinquent,3_cycle_delinquent,4_cycle_delinquent,5_cycle_delinquent,6+_cycle_delinquent
0_cycle_delinquent,85156,17604,1670,406,188,139,13
1_cycle_delinquent,23022,28245,5476,937,318,223,138
2_cycle_delinquent,3962,7626,2795,1738,676,319,357
3_cycle_delinquent,1418,2676,1206,572,1219,735,942
4_cycle_delinquent,430,666,308,153,119,544,1074
5_cycle_delinquent,151,157,77,63,48,50,891
6+_cycle_delinquent,198,162,82,50,19,23,2509


Now, if it's not clear where the line needs to be drawn for bads, i.e. the roll up rate for 3 and 4 cycle delinquent accounts is ~50%, then you can produce a more detailed view for those buckets.

In [11]:
srr_table = SnapshotRollRateTable(
    snapshot_file=snapshot_file,
    unique_key_col="id",
    delinquency_col="delq",
    obs_files=obs_files,
    perf_files=perf_files,
    max_delq=6,
    detailed=True,
    granularity=2,
)

In [12]:
srr_table.compute()

In [13]:
srr_table.get_roll_rates()

Unnamed: 0,0_cycle_delinquent,1_cycle_delinquent,2_cycle_delinquent,3_cycle_delinquent,4_cycle_delinquent,5_cycle_delinquent,6+_cycle_delinquent
0_cycle_delinquent,85156,17604,1670,406,188,139,13
1_cycle_delinquent,23022,28245,5476,937,318,223,138
2_cycle_delinquent,3962,7626,2795,1738,676,319,357
3x1_cycle_delinquent,1393,2626,1173,544,1194,713,915
3x2+_cycle_delinquent,25,50,33,28,25,22,27
4x1_cycle_delinquent,427,656,305,147,113,541,1063
4x2+_cycle_delinquent,3,10,3,6,6,3,11
5_cycle_delinquent,151,157,77,63,48,50,891
6+_cycle_delinquent,198,162,82,50,19,23,2509


The detailed argument must be **True** for this view and the granularity argument can't be lower than 2 when detailed equals to **True**.

So, what do those buckets mean? Well, at the first table the buckets meant that the **maximum delinquency** of an account at the observation window was 0, 1, 2 etc. Now, in the detailed buckets what is shown is the number of times that some accounts reached their **maximum delinquency**. The + sign means that the account reached that delinquency x plus times (e.g. 3x2+ reached 3 cycle delinquent 2 or more times).

# Miscellaneous

There is also another method for that class, the build() method which constracts the dataset, but in a **Lazy** form. Because this is a **Polars** library object, you can additionally call the collect() method to materialize the dataset like below.

In [16]:
srr_table = SnapshotRollRateTable(
    snapshot_file=snapshot_file,
    unique_key_col="id",
    delinquency_col="delq",
    obs_files=obs_files,
    perf_files=perf_files,
    max_delq=6,
    detailed=True,
    granularity=2,
)

In [18]:
srr_table.build().collect().head()

id,obs_max_delq,obs_times_3_cycle,obs_times_4_cycle,perf_max_delq
str,i64,u32,u32,i64
"""ChdjDnG""",0,0,0,0
"""gohvKpu""",1,0,0,0
"""WUzgWOB""",2,0,0,4
"""eoqIvwb""",0,0,0,1
"""SHPGWxV""",0,0,0,1


We can see that there are 5 columns:

**id**: Account id, the unique key of the dataset. \
**obs_max_delq**: The account's max delinqueny in observation window. \
**obs_times_3_cycle**: The number of times the account reaced maximum delinquency equal to 3. \
**obs_times_4_cycle**: The number of times the account reaced maximum delinquency equal to 4. \
**perf_max_delq**: The account's max delinqueny in performance window.

If the table wasn't in detailed form, there would be 3 columns: **id**, **obs_max_delq** & **perf_max_delq**.