### In this example, we will explain how the binary columns are integrated in MOMRollRateTable

In [1]:
import polars as pl

data_i = pl.scan_csv("../tests/simulation_data/test_sample_0.csv").collect()
data_i_1 = pl.scan_csv("../tests/simulation_data/test_sample_1.csv").collect()

In [2]:
data_i.head()

id,delq,Open,Active,Deactive,Closed,bin_ind_1,bin_ind_2
str,i64,i64,i64,i64,i64,i64,i64
"""nRHuJNu""",1,1,1,0,0,1,1
"""AGFlmZR""",1,1,1,0,0,0,1
"""gLtoZLI""",0,1,1,0,0,0,1
"""yRBhkMC""",3,1,1,0,0,1,1
"""VPQuDtr""",0,1,1,0,0,1,0


In [3]:
data_i_1.head()

id,delq,Open,Active,Deactive,Closed,bin_ind_1,bin_ind_2
str,i64,i64,i64,i64,i64,i64,i64
"""pIQRjkz""",0,1,1,0,0,1,0
"""XzevMfz""",0,1,1,0,0,1,0
"""AcVYRoO""",0,1,1,0,0,1,1
"""QetPjCL""",1,1,1,0,0,0,0
"""iYwhasV""",0,1,1,0,0,0,0


We can see that there are 8 columns:

**id**: Account id, the unique key of the dataset. \
**delq**: The delinquency of each account. \
**bin_ind_1**: A binary indicator. \
**bin_ind_2**: A binary indicator. \
**Open, Active, Deactive, Closed**: Indicates if the account is open, active, deactive or closed in that month. 

The last 4 columns are not of any use to us for this example.

Suppose now that we want to include the 2 binary columns in our data into our roll rate table. Before just adding them into the MOMRollRateTable object we have to consider to which indicator we want to give priority, i.e. if an account id is populated with both indicators who do we consider more important? 

In [4]:
from roll_rate_analysis import MOMRollRateTable

table = MOMRollRateTable(
    unique_key_col="id",
    delinquency_col="delq",
    path_i="../tests/simulation_data/test_sample_0.csv",
    path_i_1="../tests/simulation_data/test_sample_1.csv",
    max_delq=6,
    binary_cols=["bin_ind_1", "bin_ind_2"],
)

The rows that we put first have the largest priority, in other words, priority is in descending order [bin_ind_1 > bin_ind_2]. That said, for accounts on row "bin_ind_2", the bin_ind_1 variable **equal to zero**. But accounts on row "bin_ind_1" could have **both indicators equal to 1 or just the bin_ind_1**.

In [5]:
table.build()

In [6]:
table.get_roll_rates()

Unnamed: 0,0_cycle_deliqnuent,1_cycle_deliqnuent,2_cycle_deliqnuent,3_cycle_deliqnuent,4_cycle_deliqnuent,5_cycle_deliqnuent,6+_cycle_deliqnuent,bin_ind_2,bin_ind_1
0_cycle_deliqnuent,1840,336,0,0,0,0,0,2238,4402
1_cycle_deliqnuent,486,628,133,0,0,0,0,1197,2542
2_cycle_deliqnuent,67,156,35,46,0,0,0,266,568
3_cycle_deliqnuent,12,21,3,3,36,0,0,67,145
4_cycle_deliqnuent,5,2,0,0,1,15,0,17,60
5_cycle_deliqnuent,1,0,0,0,1,0,7,20,29
6+_cycle_deliqnuent,2,1,0,0,0,0,21,22,55
bin_ind_2,2365,1096,156,49,31,16,36,3863,7585
bin_ind_1,4839,2170,340,115,56,33,70,7701,15444


Also, by default the indicators are prioritized from the delinquency columns. \
For example if an account is 2 cycle delinquent and both indicator columns are equal to 1, then the will appear on the row of the biggest priority indicator, meaning bin_ind_1. It can be seen from the table below how the counts are distributed without the binary indicators.

In [7]:
table = MOMRollRateTable(
    unique_key_col="id",
    delinquency_col="delq",
    path_i="../tests/simulation_data/test_sample_0.csv",
    path_i_1="../tests/simulation_data/test_sample_1.csv",
    max_delq=6,
)

table.build()

table.get_roll_rates()

Unnamed: 0,0_cycle_deliqnuent,1_cycle_deliqnuent,2_cycle_deliqnuent,3_cycle_deliqnuent,4_cycle_deliqnuent,5_cycle_deliqnuent,6+_cycle_deliqnuent
0_cycle_deliqnuent,29372,6009,3,0,0,0,5
1_cycle_deliqnuent,7626,9491,2308,1,0,1,0
2_cycle_deliqnuent,1096,2196,393,770,0,0,0
3_cycle_deliqnuent,238,384,57,25,463,0,0
4_cycle_deliqnuent,53,41,3,6,29,276,0
5_cycle_deliqnuent,27,8,0,1,1,19,160
6+_cycle_deliqnuent,20,3,0,0,1,1,364
