title: Who are our TBIL Public Beta Users?
author: Helder Silva 
date: 2021-02-15 
region: EU
tags: tbil, transaction based instalment loans, bank products, lisbon, schufa, balance, mcc, memberships, overdraft, credit
summary: Only 1.2% of the users that were whitelisted ended up using the product. The biggest drops are happening at the top of the funnel, between being whitelisted and having an eligible transaction. When looking into loan buckets, 71% of the users are taking loans between 50€ and 200€ (the 2 lowest buckets). The balance buckets with more TBIL users are between 50€ and 500€, with 45.6% of all TBIL users combined - which is very much in line with the current offer of loan amounts, that also range from 50€ to 500€. The proportion of users that are using Overdraft (i.e Users with Overdraft enabled and negative balance) increases across the funnel (reaching 9.8% for users that have TBIL). In terms of memberships, the biggest increases are in Flex (with 30.4% of TBIL users having this membership), and Metal (reaching 10.6% for TBIL users).

<div class="alert alert-block alert-success">
    <H1>Who are our TBIL Public Beta Users?</H1>

</div>

This research aims to better understand the progress of the public beta of our Transaction Based Instalment Loans product (henceforth TBIL). 

This public beta is a Data Science experiment to test credit decisioning with Lisbon, a data science model that uses a customer’s transaction data and credit bureau score to decide whether to grant a loan, you can find more details on the experiment design [here](https://number26-jira.atlassian.net/wiki/spaces/ProdTech/pages/1898021050/Transaction-based+instalment+loans+public+beta).

This experiment consisted on whitelisting users with favorable credit scores and German T&Cs, and splitting them into 3 groups:
 - **Group 1**: customers that would be eligible based on both Schufa and Lisbon
 - **Group 2**: customers that would be eligible based on Schufa but not Lisbon
 - **Group 3**: customers that would be eligible based on Lisbon but not Schufa

After that, these users are eligible to get a TBIL if they make a transaction with one of the MCCs selected for this public beta.
 
Therefore, these research aims to explore:
 - [How many whitelisted users end up using TBIL?](#section1)
 - [What is the average balance of TBIL users?](#section2)
 - [What transaction categories can we add to have more eligible transactions for TBIL?](#section3)
 - [How are TBIL users using other products?](#section4)
 - [Recommendations for future whitelisting for TBIL](#section5)
 
 
## Here are the highlights of what we've found:
 - Only **1.2%** of the users that were whitelisted **ended up using the product**. The biggest drops are happening at the top of the funnel, between being whitelisted and having an eligible transaction.
 - When looking into loan buckets, **71% of the users are taking loans between 50€ and 200€** (the 2 lowest buckets).
 - The **balance buckets with more TBIL users are between 50€ and 500€**, with 45.6% of all TBIL users combined - which is very much in line with the current offer of loan amounts, that also range from 50€ to 500€.
 - The **proportion of users that are using Overdraft** (i.e Users with Overdraft enabled and negative balance) **increases across the funnel** (reaching 9.8% for users that have TBIL).
 - In terms of memberships, the biggest increases are in **Flex** (with **30.4%** of TBIL users having this membership), and **Metal** (reaching **10.6%** for TBIL users)

In [46]:
import pandas as pd
import numpy as np
import altair as alt
from utils.datalib_database import df_from_sql
from IPython.display import HTML

import utils.altair_functions as af

In [3]:
query_all_tbil = """
with loan_users as (
select user_id,
min(created) as first_tbil_created,
max(created) as last_tbil_created,
count(*) as n_loans
from nh_transaction_instalment_loan
where created between '2020-10-20' and '2021-01-31'
group by 1
)
select *
from dev_dbt.temp_tbil_user_analysis
left join loan_users lu using(user_id)
"""

query_happy_funnel = """
with loan_users as (
select distinct(user_id)
from nh_transaction_instalment_loan
where created between '2020-10-20' and '2021-01-31'
),
unions as (
select 
'all_whitelisted_users' as label,
risk_provider_group,
count (*) as value
from dev_dbt.temp_tbil_user_analysis
group by 1, 2
union all 
select
'has_kycc' as label,
risk_provider_group,
count(case when kyc_first_completed is not null then 1 end) as value
from dev_dbt.temp_tbil_user_analysis
group by 1, 2
union all 
select
'is_mau' as label,
risk_provider_group,
count(case when is_mau is true then 1 end) as value
from dev_dbt.temp_tbil_user_analysis
group by 1, 2
union all 
select
'has_eligible_transaction' as label,
risk_provider_group,
count(case when first_eligible_txn is not null then 1 end) as value
from dev_dbt.temp_tbil_user_analysis
group by 1, 2
union all 
select
'viewed_infocard' as label,
risk_provider_group,
count(case when viewed_infocard then 1 end) as value
from dev_dbt.temp_tbil_user_analysis
group by 1, 2
union all 
select
'received_infocard' as label,
risk_provider_group,
count(case when received_infocard then 1 end) as value
from dev_dbt.temp_tbil_user_analysis
group by 1, 2
union all 
select
'tbil_users' as label,
risk_provider_group,
count(case when lu.user_id is not null then 1 end) as value
from dev_dbt.temp_tbil_user_analysis
left join loan_users lu using(user_id)
group by 1, 2
)
select 
label, 
'All' as risk_provider_group,  
sum(value)
from unions
group by 1, 2
union all 
select 
label, 
risk_provider_group,  
value
from unions
order by 2 desc
"""

query_maus = """ with loan_users as (
select distinct(user_id)
from nh_transaction_instalment_loan
where created between '2020-10-20' and '2021-01-31'
)
select 
case when is_mau then 'MAU when whitelisted' else 'Not MAU when whitelisted' end as mau_status,
count(case when first_eligible_txn is not null then 1 end) as has_eligible_txn,
count(case when lu.user_id is not null then 1 end) as has_tbil
from dev_dbt.temp_tbil_user_analysis
left join loan_users lu using(user_id)
where kyc_first_completed is not null 
group by 1"""

In [5]:
query_avg_balance = """
select 
risk_provider_group, 
right(wave, 2) as wave,
round(avg(avg_month_balance_eur), 2) as avg_balance
from dev_dbt.temp_tbil_user_analysis
where risk_provider_group != 'Other'
and kyc_first_completed is not null 
group by 1, 2
order by 1, 2
"""

query_balance_buckets = """
with loan_users as (
select distinct(user_id)
from nh_transaction_instalment_loan
where created between '2020-10-20' and '2021-01-31'
)
select 
case when avg_month_balance_eur < 0 then '1. negative balance'
when avg_month_balance_eur = 0 then '2. 0 balance'
when avg_month_balance_eur between 0 and 50 then '3. btw 0€ and 50€'
when avg_month_balance_eur between 50 and 250 then '4. btw 50€ and 250€'
when avg_month_balance_eur between 250 and 500 then '5. btw 250€ and 500€'
when avg_month_balance_eur between 500 and 1000 then '6. btw 500€ and 1000€'
when avg_month_balance_eur between 1000 and 2500 then '7. btw 1000€ and 2500€'
when avg_month_balance_eur between 2500 and 5000 then '8. btw 2500€ and 5000€'
else '9. balance > 5000€' end as balance_bucket,
count(*) as all_kycc_whitelisted,
count(case when first_eligible_txn is not null then 1 end) as has_eligible_txn,
count(case when lu.user_id is not null then 1 end) as has_tbil
from dev_dbt.temp_tbil_user_analysis
left join loan_users lu using(user_id)
where risk_provider_group != 'Other'
and kyc_first_completed is not null 
group by 1
order by 1
"""

query_loan_buckets = """ 
with totals as (
select 
risk_provider_group,
case when amount >= 400  and amount < 500 then '400€ to 500€'
when amount between 300 and 400 then '300€ to 400€'
when amount between 200 and 300 then '200€ to 300€'
when amount between 100 and 200 then '100€ to 200€'
when amount between 50 and 100 then ' 50€ to 100€'
when amount between 0 and 50 then ' 0€ to 50€'
else round(amount)::text
end as loan_buckets, 
count(*) as users
from nh_transaction_instalment_loan
inner join dev_dbt.temp_tbil_user_analysis using(user_id)
where created between '2020-10-20' and '2021-01-31'
group by 1, 2
)
select 
risk_provider_group,
loan_buckets,
users
from totals 
union all 
select 
'All' as risk_provider_group,
loan_buckets,
sum(users) 
from totals 
group by 1, 2
"""

query_deu_mcc = """ select * from dev_dbt.temp_tbil_top_mcc_codes"""

In [8]:
query_subscriptions = """
with loan_users as (
select distinct(user_id)
from nh_transaction_instalment_loan
where created between '2020-10-20' and '2021-01-31'
), totals as (
select 
user_id,
product_id,
has_p_account, 
has_od_enabled,
using_od_status, 
has_consumer_credit,
case when first_eligible_txn is not null then 1 end as has_eligible_txn,
case when lu.user_id is not null then 1 end as has_tbil
from dev_dbt.temp_tbil_user_analysis
left join loan_users lu using(user_id)
),
unions as (
select 
user_id, 
'ALL' as subscription,
has_eligible_txn,
has_tbil
from totals 
union all 
select 
user_id, 
product_id as subscription,
has_eligible_txn,
has_tbil
from totals 
union all 
select 
user_id, 
case when has_od_enabled is true then 'HAS_OD_ENABLED' end as subscription,
has_eligible_txn,
has_tbil
from totals 
union all 
select 
user_id, 
case when using_od_status is true then 'IS_USING_OD' end as subscription,
has_eligible_txn,
has_tbil
from totals 
union all 
select 
user_id, 
case when has_consumer_credit is true then 'HAS_CONSUMER_CREDIT' end as subscription,
has_eligible_txn,
has_tbil
from totals 
union all 
select 
user_id, 
case when has_consumer_credit is true then 'HAS_P_ACCOUNT' end as subscription,
has_eligible_txn,
has_tbil
from totals 
)
select
'1. All Whitelisted Users' as group, 
subscription,
count(*)
from unions 
where subscription is not null
group by 1, 2
union all 
select
'2. Has eligible txn' as group, 
subscription,
count(*)
from unions 
where subscription is not null
and has_eligible_txn = 1
group by 1, 2
union all 
select
'3. Has TBIL' as group, 
subscription,
count(*)
from unions 
where subscription is not null
and has_tbil = 1
group by 1, 2
"""

query_flex_groups = """ 
select 
risk_provider_group,
count(*) all_tbil_users,
count(case when product_id = 'FLEX_ACCOUNT_MONTHLY' then 1 end) as flex_tbil_users
from nh_transaction_instalment_loan
inner join dev_dbt.temp_tbil_user_analysis using(user_id)
group by 1
order by 1
"""

<a id='section1'></a>
# How many whitelisted users end up using TBIL?

In this analysis, we considered the following steps between whitelisting users and seeing these applying to a TBIL:

All whitelisted users ▶ have KYCC ▶ are MAUs ▶ have one eligible transaction ▶ received TBIL infocard ▶ viewed TBIL infocard ▶ are TBIL users

The time frame considered for the funnels below are between October 20th 2020 (date of the launch of the TBIL public beta) and end of January 2021.

When looking into the funnel below, we can see that only 1.2% of the users that were whitelisted ended up using the product. The biggest drops are happening at the top of the funnel:
 - All whitelisted users ▶ have KYCC: 18.3% drop
 - have KYCC ▶ are MAUs: 18.5% drop
 - are MAUs ▶ have one eligible transaction: 42.2% drop

This means that if we start by whitelisting MAUs only, we can avoid some of these biggest drops. 

Also, one possible solution for the MAUs ▶ have one eligible transaction drop is to add more MCCs to the eligible transactions for this product (see section "[What transaction categories can we add to have more eligible transactions for TBIL?](#section3)")

In [38]:
df_happy_funnel = df_from_sql("redshiftreader", query_happy_funnel)

In [12]:
df_happy_funnel = df_happy_funnel.rename(columns={"sum": "users"})
df_happy_funnel = df_happy_funnel.sort_values(by=["users"], ascending=False)

#### Whitelisted users funnel count & percentage

In [52]:
df_happy_funnel_all = (
    df_happy_funnel[df_happy_funnel["risk_provider_group"] == "All"]
    .sort_values(by=["users"], ascending=False)
    .reset_index()
)
df_happy_funnel_all["percent_users"] = round(
    (df_happy_funnel_all["users"] / df_happy_funnel_all["users"][0]) * 100, 1
)
df_happy_funnel_all.drop(columns=["index"])

Unnamed: 0,label,risk_provider_group,users,percent_users
0,all_whitelisted_users,All,50531,100
1,has_kycc,All,41305,82
2,is_mau,All,31951,63
3,has_eligible_transaction,All,10604,21
4,received_infocard,All,10538,21
5,viewed_infocard,All,8541,17
6,tbil_users,All,602,1


In [53]:
af.bar_single_label(
    df_happy_funnel_all, af.teal, "percent_users:Q", "label:O", 600, 250, "-x"
).properties(title="TBIL Whitelisted Users Funnel").configure_axis(labelLimit=500)

### Are we excluding too many users if we filter for MAUs?

Looking at the experiment results so far, the answer is no. Only 1% of users with eligible transactions were not MAUs at the time of whitelisting. This proportion gets even smaller when we look into the percentage of users that have TBIL and weren't MAUs at the time of whitelisting (0.5%). 

In [39]:
df_maus = df_from_sql("redshiftreader", query_maus)

In [16]:
df_maus[["perc_eligible_txns"]] = df_maus[["has_eligible_txn"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
df_maus[["perc_has_tbil"]] = df_maus[["has_tbil"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
df_maus[["has_eligible_txn", "perc_eligible_txns", "has_tbil", "perc_has_tbil"]]

Unnamed: 0,has_eligible_txn,perc_eligible_txns,has_tbil,perc_has_tbil
0,10496,99.0,599,99.5
1,108,1.0,3,0.5


When looking at the funnel differences between the 3 experiment groups, we can see that Group 2 has the lowest conversion (0.1% between whitelisting and using TBIL), and the biggest drops happen once again in the first steps of the funnel - only 27.7% of Group 2 whitelisted users are MAUs, whereas these percentages are 86.6% and 97.9% for groups 1 and 3 respectively. 

These findings further support the suggestion of whitelisting MAUs only, since this would reduce these differences between groups.

#### Whitelisted users funnel count & percentage per group

In [17]:
# Group 1
df_happy_funnel_1 = (
    df_happy_funnel[df_happy_funnel["risk_provider_group"] == "Group 1"]
    .sort_values(by=["users"], ascending=False)
    .reset_index()
)
df_happy_funnel_1["percent_users"] = round(
    (df_happy_funnel_1["users"] / df_happy_funnel_1["users"][0]) * 100, 1
)

# Group 2
df_happy_funnel_2 = (
    df_happy_funnel[df_happy_funnel["risk_provider_group"] == "Group 2"]
    .sort_values(by=["users"], ascending=False)
    .reset_index()
)
df_happy_funnel_2["percent_users"] = round(
    (df_happy_funnel_2["users"] / df_happy_funnel_2["users"][0]) * 100, 1
)

# Group 2
df_happy_funnel_3 = (
    df_happy_funnel[df_happy_funnel["risk_provider_group"] == "Group 3"]
    .sort_values(by=["users"], ascending=False)
    .reset_index()
)
df_happy_funnel_3["percent_users"] = round(
    (df_happy_funnel_3["users"] / df_happy_funnel_3["users"][0]) * 100, 1
)

# Group Other
df_happy_funnel_other = (
    df_happy_funnel[df_happy_funnel["risk_provider_group"] == "Other"]
    .sort_values(by=["users"], ascending=False)
    .reset_index()
)
df_happy_funnel_other["percent_users"] = round(
    (df_happy_funnel_other["users"] / df_happy_funnel_other["users"][0]) * 100, 1
)

df_happy_funnel_groups = pd.concat(
    [df_happy_funnel_1, df_happy_funnel_2, df_happy_funnel_3],
    ignore_index=True,
    sort=False,
)

pivot = df_happy_funnel_groups.rename(columns={"risk_provider_group": "i"})
pivot = pivot.pivot(index="label", columns="i", values="users").reset_index()
pivot.fillna(0).sort_values(by=["Group 1"], ascending=False)

i,label,Group 1,Group 2,Group 3
0,all_whitelisted_users,17296,21873,10852
2,has_kycc,16434,13515,10852
3,is_mau,14984,6069,10626
1,has_eligible_transaction,5249,997,4321
4,received_infocard,5213,993,4296
6,viewed_infocard,4186,784,3563
5,tbil_users,142,22,437


In [54]:
af.bar_multi(
    df_happy_funnel_groups,
    "risk_provider_group:N",
    "percent_users:Q",
    "label:N",
    "risk_provider_group:N",
    700,
    100,
    "-x",
).properties(title="TBIL Whitelisted Users Funnel Per Group")

<a id='section2'></a>
# What is the average balance of TBIL users?

In [40]:
df_avg_balance = df_from_sql("redshiftreader", query_avg_balance)

In order to calculate the average balance per group, we looked into daily balances between August 2020 and January 2021, made a monthly average and finally calculated the average for those 6 months. We can see that Group 1 users have the highest monthly average with 4.2K€, whereas the group 2 users are the lowest with a 1.1k€ monthly balance average - this corresponds to a 3.1K€ difference between these two groups. 

The only relevant trend found in the balance of these groups across the rollout waves (i.e. the timings when the product was made available for these users) is that Group 2 average balance seems to have dropped from 1.9K€ to 0.9K€ between waves 3 and 4, and never made it above 1.1K€ ever since. 

#### Average Balance per Group

In [20]:
df_avg_balance_mean = df_avg_balance.groupby("risk_provider_group").mean().reset_index()
df_avg_balance_mean["avg_balance"] = round(df_avg_balance_mean["avg_balance"], 2)
df_avg_balance_mean

Unnamed: 0,risk_provider_group,avg_balance
0,Group 1,4204.22
1,Group 2,1076.7
2,Group 3,3448.96


In [55]:
af.line_multi(
    df_avg_balance, "risk_provider_group:N", "wave:O", "avg_balance:Q", 800, 400, "x"
).properties(title="TBIL Whitelisted Users Balance Per Group")

In [41]:
df_loan_buckets = df_from_sql("redshiftreader", query_loan_buckets)

In [56]:
# Group 1
df_loan_buckets_1 = (
    df_loan_buckets[df_loan_buckets["risk_provider_group"] == "Group 1"]
    .sort_values(by=["loan_buckets"], ascending=False)
    .reset_index()
)
df_loan_buckets_1["percent_users"] = df_loan_buckets_1[["users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)

# Group 2
df_loan_buckets_2 = (
    df_loan_buckets[df_loan_buckets["risk_provider_group"] == "Group 2"]
    .sort_values(by=["loan_buckets"], ascending=False)
    .reset_index()
)
df_loan_buckets_2["percent_users"] = df_loan_buckets_2[["users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)

# Group 2
df_loan_buckets_3 = (
    df_loan_buckets[df_loan_buckets["risk_provider_group"] == "Group 3"]
    .sort_values(by=["loan_buckets"], ascending=False)
    .reset_index()
)
df_loan_buckets_3["percent_users"] = df_loan_buckets_3[["users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)

# All groups
df_loan_buckets_all = (
    df_loan_buckets[df_loan_buckets["risk_provider_group"] == "All"]
    .sort_values(by=["loan_buckets"], ascending=False)
    .reset_index()
)
df_loan_buckets_all["percent_users"] = df_loan_buckets_all[["users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)

df_loan_buckets_groups = pd.concat(
    [df_loan_buckets_1, df_loan_buckets_2, df_loan_buckets_3, df_loan_buckets_all],
    ignore_index=True,
    sort=False,
)

### What are the most used loan buckets?

By focusing on All TBIL users, we can see that the higher the loan bucket, the lower the percentage of users in that bucket - 71% of these users are taking loans between 50€ and 200€.

We also see a similar trend when comparing among groups (since Group 2 has such a low number of TBIL users, their distribution across the buckets looks more irregular).

#### Loan Bucket User Count per Group

In [24]:
pivot = df_loan_buckets.pivot(
    index="loan_buckets", columns="risk_provider_group", values="users"
)
pd.options.display.float_format = "{:,.0f}".format
pivot.fillna(0)

risk_provider_group,All,Group 1,Group 2,Group 3,Other
loan_buckets,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
50€ to 100€,270,51,11,208,0
100€ to 200€,187,49,6,131,1
200€ to 300€,80,21,0,59,0
300€ to 400€,40,10,2,28,0
400€ to 500€,25,11,3,11,0


In [57]:
af.column_multi(
    df_loan_buckets_groups,
    "risk_provider_group:N",
    "risk_provider_group:N",
    "percent_users:Q",
    "loan_buckets:N",
    100,
    400,
    "x",
).properties(title="TBIL Loan Buckets Per Group")

In [42]:
df_balance_buckets = df_from_sql("redshiftreader", query_balance_buckets)

We also created balance buckets for the average balances mentioned above, and compared them between whitelisted users, users with eligible transactions, and users with TBIL. The main differences between these groups are:
- While 5.8% of the whitelisted users have a 0 balance, none of them have eligible transactions (nor TBIL).
- The balance bucket with the most eligible transaction users (22.7%)is the highest one (balance > 5000€), however this proportion drops to 6.7% in the group of TBIL users. 
- The balance buckets with more TBIL users are between 50€ and 250€ and between 250€ and 500€, with 45.6% of all TBIL users combined - which is very much in line with the current offer of loan amounts, that also range from 50€ to 500€.

#### User Count and Percentage per Balance Bucket

In [27]:
df_balance_buckets[["perc_whitelisted"]] = df_balance_buckets[
    ["all_kycc_whitelisted"]
].apply(lambda x: round((x / x.sum()) * 100, 1), axis=0)
df_balance_buckets[["perc_eligible_txns"]] = df_balance_buckets[
    ["has_eligible_txn"]
].apply(lambda x: round((x / x.sum()) * 100, 1), axis=0)
df_balance_buckets[["perc_has_tbil"]] = df_balance_buckets[["has_tbil"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
df_balance_buckets[
    [
        "balance_bucket",
        "all_kycc_whitelisted",
        "perc_whitelisted",
        "has_eligible_txn",
        "perc_eligible_txns",
        "has_tbil",
        "perc_has_tbil",
    ]
]

Unnamed: 0,balance_bucket,all_kycc_whitelisted,perc_whitelisted,has_eligible_txn,perc_eligible_txns,has_tbil,perc_has_tbil
0,1. negative balance,3071,8,831,8,58,10
1,2. 0 balance,2377,6,0,0,0,0
2,3. btw 0€ and 50€,5581,14,201,2,16,3
3,4. btw 50€ and 250€,7356,18,1175,11,145,24
4,5. btw 250€ and 500€,4428,11,1220,12,129,22
5,6. btw 500€ and 1000€,4500,11,1482,14,114,19
6,7. btw 1000€ and 2500€,5092,12,1926,18,74,12
7,8. btw 2500€ and 5000€,3163,8,1329,13,25,4
8,9. balance > 5000€,5233,13,2403,23,40,7


In [58]:
whitelisted = af.bar_single_label(
    df_balance_buckets,
    af.wheat,
    "perc_whitelisted:Q",
    "balance_bucket:N",
    150,
    300,
    "y",
).properties(title="Whitelisted Users Avg. Balance")
eligible_txn = af.bar_single_label(
    df_balance_buckets,
    af.petrol,
    "perc_eligible_txns:Q",
    "balance_bucket:N",
    150,
    300,
    "y",
).properties(title="Eligible Txn Users Avg. Balance")
has_tbil = af.bar_single_label(
    df_balance_buckets, af.teal, "perc_has_tbil:Q", "balance_bucket:N", 150, 300, "y"
).properties(title="TBIL Users Avg. Balance")

whitelisted | eligible_txn | has_tbil

<a id='section3'></a>
# What transaction categories can we add to have more eligible transactions for TBIL?

In order to evaluate what other MCCs we could add into this product, we looked into transactions within the 50€ to 500€ range (the range of the TBIL amounts) for users with German T&Cs between August 2020 and January 2021.

Below we can see that the MCCs that have more than 3% of the transactions in this period are ATM withdrawals, groceries stores, service stations, book stores, restaurants and gambling transactions. 

You can find an overview of the MCCs and merchants currently being used for TBIL [here](https://metabase-product.tech26.de/question/2721)

#### Top MCCs Transaction Count and Average Volume per Transaction

In [43]:
df_deu_mcc = df_from_sql("redshiftreader", query_deu_mcc)

In [30]:
df_deu_mcc_filter = df_deu_mcc.sort_values(by=["n_txns"], ascending=False).reset_index()
df_deu_mcc_filter["percent_txns"] = df_deu_mcc_filter[["n_txns"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
pd.options.display.max_colwidth = 150
df_deu_mcc_filter[["mcc", "mastercard_name", "n_txns", "avg_volume_per_txn"]].head(20)

Unnamed: 0,mcc,mastercard_name,n_txns,avg_volume_per_txn
0,6011,AUTOMATED CASH DISBURSEMENTS-CUSTOMER FINANCIAL INSTITUTION,1809796,145
1,5411,"GROCERY STORES, SUPERMARKETS",953727,83
2,5541,SERVICE STATIONS WITH OR WITHOUT ANCILLARY SERVICE,306370,66
3,5942,BOOK STORES,302714,111
4,5812,"EATING PLACES, RESTAURANTS",223051,87
5,7995,GAMBLING TRANSACTIONS,219354,101
6,5651,FAMILY CLOTHING STORES,162574,111
7,5691,MENS AND WOMENS CLOTHING STORES,162044,117
8,4829,MONEY TRANSFER,139927,170
9,7011,"LODGING-HOTELS,MOTELS,RESORTS-NOT CLASSIFIED",129460,152


In [59]:
af.bar_single_label(
    df_deu_mcc_filter.head(20),
    af.wheat,
    "percent_txns:Q",
    "mastercard_name:O",
    500,
    500,
    "-x",
).properties(title="Top 20 MMCs for DEU Transactions").configure_axis(labelLimit=500)

<a id='section4'></a>
# How are TBIL users using other products?

In [44]:
df_subscriptions = df_from_sql("redshiftreader", query_subscriptions)

Finally, we compared whitelisted users, users with eligible transactions, and users with TBIL based on their subscription and whether they have an Arranged Overdraft, Consumer Credit, or a P-Account on the time of their whitelisting. The main differences between these groups are:
 - The proportion of users that are using Overdraft (i.e Users with Overdraft enabled and negative balance) increases across these groups (reaching 9.8% for users that have TBIL). This finding is consistent with the increasing proportion of users with a negative balance seen above).
 - In terms of memberships, the biggest increases are in Flex (with 30.4% of TBIL users having this membership), and Metal (reaching 10.6% for TBIL users)
 - On the other hand, we also have membership proportions decreasing from group to group: Standard decreased from 78.7% of all whitelisted users to 45% of TBIL users, and Business Standard also dropped from 11.9% whitelisted users to 4.8% TBIL Users.
 
#### Memberships and Bank Products User Count

In [33]:
df_subscriptions.head()

pivot = df_subscriptions.pivot(index="subscription", columns="group", values="count")
pivot.fillna(0)

group,1. All Whitelisted Users,2. Has eligible txn,3. Has TBIL
subscription,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ALL,50531,10604,602
BLACK_CARD_MONTHLY,1584,728,42
BUSINESS_BLACK,334,133,9
BUSINESS_CARD,5998,1196,29
BUSINESS_METAL,64,31,2
BUSINESS_SMART,1,0,0
FLEX_ACCOUNT_MONTHLY,1779,751,183
HAS_CONSUMER_CREDIT,313,160,10
HAS_OD_ENABLED,9924,2952,134
HAS_P_ACCOUNT,313,160,10


In [34]:
# Whitelisted
df_subscriptions_whitelisted = (
    df_subscriptions[df_subscriptions["group"] == "1. All Whitelisted Users"]
    .sort_values(by=["count"], ascending=False)
    .reset_index()
)
df_subscriptions_whitelisted["percent_users"] = round(
    (df_subscriptions_whitelisted["count"] / df_subscriptions_whitelisted["count"][0])
    * 100,
    1,
)

# has eligible txn
df_subscriptions_eligible_txn = (
    df_subscriptions[df_subscriptions["group"] == "2. Has eligible txn"]
    .sort_values(by=["count"], ascending=False)
    .reset_index()
)
df_subscriptions_eligible_txn["percent_users"] = round(
    (df_subscriptions_eligible_txn["count"] / df_subscriptions_eligible_txn["count"][0])
    * 100,
    1,
)

# has TBIL
df_subscriptions_tbil = (
    df_subscriptions[df_subscriptions["group"] == "3. Has TBIL"]
    .sort_values(by=["count"], ascending=False)
    .reset_index()
)
df_subscriptions_tbil["percent_users"] = round(
    (df_subscriptions_tbil["count"] / df_subscriptions_tbil["count"][0]) * 100, 1
)


df_subscriptions_clean = pd.concat(
    [
        df_subscriptions_whitelisted,
        df_subscriptions_eligible_txn,
        df_subscriptions_tbil,
    ],
    ignore_index=True,
    sort=False,
)
df_subscriptions_clean = df_subscriptions_clean[
    df_subscriptions_clean["subscription"] != "ALL"
].sort_values(by=["count"], ascending=False)

In [60]:
af.column_multi(
    df_subscriptions_clean,
    "subscription:N",
    "subscription:N",
    "percent_users:Q",
    "group:N",
    220,
    400,
    "-y",
).properties(title="Percentage of users with Memberships and Bank Products")

### From which group are flex users coming from?

Given that Flex users are by default users with an unfavorable credit score, we were wondering in which of the experiment groups these users were assigned to: the answer is Group 3 (Lisbon only) for the vast majority of these users.

#### Flex Users Count per Group

In [45]:
df_flex_groups = df_from_sql("redshiftreader", query_flex_groups)

In [37]:
df_flex_groups["perc_flex"] = round(
    (df_flex_groups["flex_tbil_users"] / df_flex_groups["all_tbil_users"]) * 100, 1
)
df_flex_groups

Unnamed: 0,risk_provider_group,all_tbil_users,flex_tbil_users,perc_flex
0,Group 1,170,2,1
1,Group 2,24,0,0
2,Group 3,492,199,40
3,Other,70,0,0


<a id='section5'></a>
# Recommendations for future whitelisting for TBIL
 - Only whitelist customers with KYCC (since we have a drop of 18.3% of the users in this step)
 - Only include MAUs at the time of the whitelisting  (we lose another 18.5% of the users in this one)
 - Only include users with an average balance different than 0€ at the time of the whitelisting (no whitelisted users with an average balance of 0€ have eligible transactions)