title: How big is the potential market for N26 Installments in France, Italy and Spain? (Part II)
author: Helder Silva 
date: 2021-10-08
region: EU
tags: N26 Installments, TBIL, bank products, credit, transactions, italy, france, spain, lisbon scores
summary: The main take-away of this research is that a potential exclusion of users with riskier scores (10 through 12) in the expansion of Installments into new markets (France, Spain and Italy) would limit the risk of launching in these markets, but also reduce the potential user base from 513K to 180.5K eligible users.

<div class="alert alert-block alert-success">
    <H1>How big is the potential market for N26 Installments in France, Italy and Spain? (Part II)</H1>
</div>

This research comes as a follow-up to a [previous investigation](https://research.tech26.de/reports/20210705_potential_european_market_TBIL.html) where we looked into how many potential users we have for this product in the French, Spanish, and Italian markets, based on data from the first half of 2021. At the time we looked into users that had Lisbon scores between 7 (best eligible score) and 12 (worst eligible score), and this time around, in order to reduce risk, we'll only be looking into scores between 7 and 9.

In order to assess the potential eligibility of these users, we'll be looking into 2 main factors:
 - Users in these markets should have a Lisbon between 7 and 9 score on September 27th 2021. Since credit scoring with Lisbon isn't yet live in these markets, we will be looking into Beta scores - for this we extracted CSV files with these scores directly from Lisbon since these aren't being passed into our DWH.
 - Also, they had at least 1 transaction between 50€ and 500€  between January 1st and August 31st 2021 with a merchant that hasn't been excluded from this product. [Here](https://docs.google.com/spreadsheets/d/1Bn-0BHBPztNrjDj1yudYwDDjJFzLnmDXAbmEMApMFJ4/edit#gid=975436947) you can find a comprehensive list of the excluded MCCs.
 
We will be answering the following questions:
 - [How many users in these markets have a favorable score?](#section1)
 - [How many MAUs in these markets have eligible scores and transactions?](#section2)
 - [How much are eligible users spending in eligible transactions?](#section3)
 - [Which merchant categories get the biggest portion of the eligible transactions?](#section4)
 
The main take-away of this research is that a potential exclusion of users with riskier scores (10 through 12) in the expansion of Installments into new markets (France, Spain and Italy) would limit the risk of launching in these markets, but also reduce the potential user base from 513K to 180.5K eligible users.

In [1]:
%%capture
cd /app/

In [2]:
%%capture

!pip install duckdb
!pip install altair

In [3]:
import pandas as pd
from utils.datalib_database import df_from_sql
import utils.altair_functions as af
import duckdb

con = duckdb.connect(database=":memory:", read_only=False)

In [4]:
esp_df = pd.read_csv(
    "research/product/bank_products/20210920_potential_european_market_TBIL_v2/ls_pd_esp.csv"
)
fra_df = pd.read_csv(
    "research/product/bank_products/20210920_potential_european_market_TBIL_v2/ls_pd_fra.csv"
)
ita_df = pd.read_csv(
    "research/product/bank_products/20210920_potential_european_market_TBIL_v2/ls_pd_ita.csv"
)

esp_df["market"] = "ESP"
fra_df["market"] = "FRA"
ita_df["market"] = "ITA"

input_df = pd.concat([esp_df, fra_df, ita_df])

<a id='section1'></a>
# How many users in these markets have a favorable score?

Below we can see that less than 30% of users in each of the selected markets have a favorable Lisbon score. Even though Italy has the highest percentage of favorable scores (27.2%), France has has the highest number of users with about 93.4K eligible users.

In [5]:
scores_query = """
select 
market,
round(count(case when rating_class <= 9 then 1 end)::numeric/1000, 1) as eligible_k_users,
round(count(case when rating_class > 9 then 1 end)::numeric/1000, 1) as non_eligible_k_users,
round(count(case when rating_class <= 9 then 1 end)::numeric/count(*)::numeric, 3)*100 as perc_eligible_users
from input_df 
group by 1
order by 1
"""
con.register("input_df", input_df)
scores_df = con.execute(scores_query).fetchdf()
scores_df

Unnamed: 0,market,eligible_k_users,non_eligible_k_users,perc_eligible_users
0,ESP,34.2,96.2,26.2
1,FRA,93.4,376.4,19.9
2,ITA,73.3,196.4,27.2


In [6]:
af.bar_single_label(
    scores_df, af.petrol, "perc_eligible_users", "market", 700, 200, "-x"
).properties(title="Ratio of Lisbon Eligible Users per country")

In [7]:
maus_txns_df = df_from_sql(
    "redshiftreader",
    """with maus_last_6_mo as (
select 
user_id, 
tnc_country_group
from dwh_cohort_weeks as d
inner join dbt.zrh_users as u
on d.end_time >= u.user_created
and d.end_time between '2021-01-01' and '2021-08-31'
and tnc_country_group in ('ESP', 'FRA', 'ITA')
inner join dbt.zrh_user_activity_txn as a
on a.user_created = u.user_created
and d.end_time between a.activity_start and least(u.closed_at,a.activity_end)
and activity_type = '1_tx_35'
group by 1, 2
),
eligible_txns_last_6_mo as ( 
select 
user_id,
count(*) as n_txns, 
sum(amount_cents_eur)::numeric/100 as sum_amount_eur,
sum_amount_eur::numeric/n_txns::numeric as avg_volume_eur_per_txn
from dbt.zrh_card_transactions
inner join dbt.zrh_users using(user_created)
where type = 'PT' 
and mcc not in (
'1343','1381','1454','1500','1761','1771','2490','3014','3028','3039',
'3048','3049','3267','3302','3351','3352','3353','3354','3356','3358',
'3359','3360','3361','3362','3363','3364','3365','3366','3367','3368',
'3369','3370','3371','3372','3373','3374','3375','3376','3377','3378',
'3379','3380','3381','3382','3383','3384','3385','3386','3387','3388',
'3390','3391','3392','3393','3394','3395','3396','3397','3398','3399',
'3400','3401','3402','3403','3404','3406','3407','3408','3409','3410',
'3411','3412','3413','3414','3415','3416','3417','3418','3419','3420',
'3421','3422','3423','3424','3425','3426','3427','3428','3429','3430',
'3431','3432','3433','3434','3435','3436','3437','3438','3439','3440',
'3520','3526','3544','3615','3635','3641','3670','3672','3716','3754',
'3777','3824','4121','4411','4511','4722','4812','4814','4816','4829',
'5039','5094','5122','5231','5310','5399','5411','5541','5812','5814',
'5815','5816','5912','5933','5960','5962','5963','5964','5965','5966',
'5967','5972','5983','5993','6011','6012','6051','6211','6513','6538',
'6540','7012','7021','7273','7277','7311','7361','7372','7399','7519',
'7841','7922','7988','7993','7994','7995','7997','8062','8071','8099',
'8211','8389','8999','9222','9223','9311','9399','9405','5813','5499',
'7523','8111','5948','5047','6300','5099','1520','7997','8299','3790') --Blacklisted MCCs provided by product
and created::date between '2021-01-01' and '2021-08-31'
and tnc_country_group in ('ESP', 'FRA', 'ITA')
and amount_cents_eur::numeric/100 between 50 and 499
group by 1
)
select *
from maus_last_6_mo 
left join eligible_txns_last_6_mo using (user_id)
""",
)

<a id='section2'></a>
# How many MAUs in these markets have eligible scores and transactions?

In total, we found about 180.5K eligible users in these 3 markets. Even though Italy has the highest percentage of MAUs with eligible scores and transactions out of all MAUs in their population, once again France has by far the highest number of eligible MAUs (82.9K).

In [8]:
users_query = """
with elig_users as (
select distinct user_id 
from input_df
where rating_class <= 9
),
unions as (
select tnc_country_group as market, '  n_maus' as label, count(*)::numeric/1000 as n_users_k 
from maus_txns_df 
group by 1
union all
select tnc_country_group, ' n_maus_elig_scores', round(count(case when e.user_id is not null then 1 end)::numeric/1000, 1)
from maus_txns_df 
left join elig_users e using (user_id)
group by 1
union all 
select tnc_country_group, '  n_maus_elig_txns', round(count(case when n_txns is not null then 1 end)::numeric/1000, 1)
from maus_txns_df 
left join elig_users e using (user_id)
group by 1
union all
select tnc_country_group, 'n_maus_elig_scores_and_txns', round(count(case when e.user_id is not null and n_txns is not null then 1 end)::numeric/1000, 1)
from maus_txns_df 
left join elig_users e using (user_id)
group by 1
), 
n_maus as (
select market, n_users_k as n_maus_k
from unions where trim(label) = 'n_maus'
)
select *,
round(n_users_k::numeric/ n_maus_k::numeric, 3)*100 as perc_users
from unions
inner join n_maus using(market)
"""
con.register("maus_txns_df", maus_txns_df)
grouped_users_df = con.execute(users_query).fetchdf()
grouped_users_df

Unnamed: 0,market,label,n_users_k,n_maus_k,perc_users
0,ITA,n_maus,388.016,388.016,100.0
1,FRA,n_maus,703.066,703.066,100.0
2,ESP,n_maus,229.1,229.1,100.0
3,FRA,n_maus_elig_scores,93.4,703.066,13.3
4,ITA,n_maus_elig_scores,73.3,388.016,18.9
5,ESP,n_maus_elig_scores,34.2,229.1,14.9
6,FRA,n_maus_elig_txns,380.9,703.066,54.2
7,ITA,n_maus_elig_txns,209.0,388.016,53.9
8,ESP,n_maus_elig_txns,97.6,229.1,42.6
9,FRA,n_maus_elig_scores_and_txns,82.9,703.066,11.8


In [9]:
af.column_multi(
    grouped_users_df, "market:N", "label:N", "perc_users:Q", "market:N", 100, 400, "x"
).properties(title="Ratio of eligible MAUs per country")

In [10]:
af.bar_single_label(
    grouped_users_df[grouped_users_df["label"] == "n_maus_elig_scores_and_txns"],
    af.petrol,
    "n_users_k:Q",
    "market",
    700,
    200,
    "-x",
).properties(title="Number of eligible MAUs per country")

In [11]:
elig_users_query = """ 
select user_id, 
trim(tnc_country_group) as tnc_country_group
from maus_txns_df
left join input_df i using (user_id)
where i.user_id is not null 
and n_txns is not null 
and rating_class <= 9
"""
elig_users_df = con.execute(elig_users_query).fetchdf()

In [12]:
elig_users_ids = elig_users_df.user_id.tolist()
elig_users_ids = ",".join(map("'{0}'".format, elig_users_ids))

In [14]:
mcc_df = df_from_sql(
    "redshiftreader",
    """
select 
tnc_country_group,
mcc_category,
case when amount_cents_eur::numeric/100 < 100 then '>= 50 and < 100'
when amount_cents_eur::numeric/100 < 200 then '>= 100 and < 200'
when amount_cents_eur::numeric/100 < 300 then '>= 200 and < 300'
when amount_cents_eur::numeric/100 < 400 then '>= 300 and < 400'
when amount_cents_eur::numeric/100 < 500 then '>= 400 and < 500'
end as loan_buckets,
count(*) as n_txns, 
count(distinct user_created) as n_users,
sum(amount_cents_eur)::numeric/100 as sum_amount_eur
from dbt.zrh_card_transactions 
inner join dbt.zrh_users using(user_created)
where amount_cents_eur::numeric/100 between 50 and 499
and mcc not in (
'1343','1381','1454','1500','1761','1771','2490','3014','3028','3039',
'3048','3049','3267','3302','3351','3352','3353','3354','3356','3358',
'3359','3360','3361','3362','3363','3364','3365','3366','3367','3368',
'3369','3370','3371','3372','3373','3374','3375','3376','3377','3378',
'3379','3380','3381','3382','3383','3384','3385','3386','3387','3388',
'3390','3391','3392','3393','3394','3395','3396','3397','3398','3399',
'3400','3401','3402','3403','3404','3406','3407','3408','3409','3410',
'3411','3412','3413','3414','3415','3416','3417','3418','3419','3420',
'3421','3422','3423','3424','3425','3426','3427','3428','3429','3430',
'3431','3432','3433','3434','3435','3436','3437','3438','3439','3440',
'3520','3526','3544','3615','3635','3641','3670','3672','3716','3754',
'3777','3824','4121','4411','4511','4722','4812','4814','4816','4829',
'5039','5094','5122','5231','5310','5399','5411','5541','5812','5814',
'5815','5816','5912','5933','5960','5962','5963','5964','5965','5966',
'5967','5972','5983','5993','6011','6012','6051','6211','6513','6538',
'6540','7012','7021','7273','7277','7311','7361','7372','7399','7519',
'7841','7922','7988','7993','7994','7995','7997','8062','8071','8099',
'8211','8389','8999','9222','9223','9311','9399','9405','5813','5499',
'7523','8111','5948','5047','6300','5099','1520','7997','8299','3790') --Blacklisted MCCs provided by product
and created::date between '2021-01-01' and '2021-08-31'
and user_id in("""
    + elig_users_ids
    + """)
group by 1, 2, 3
""",
)

In [15]:
mcc_df["tnc_country_group"] = mcc_df["tnc_country_group"].str.strip()

<a id='section3'></a>
# How much are eligible users spending in eligible transactions?

Overall, users spent 515M€ in eligible transactions between january and August 2021. Since France has the highest number of eligible users, it makes sense that they also have the highest transaction volume (267M€). For all markets, users tend to have more transactions below 200€, which explains the fact that even though these buckets correspond to smaller volumes, they take the majority of the transaction volume. 

In [16]:
txn_bucket_query = """
select
tnc_country_group as market, 
loan_buckets,
case when loan_buckets = '>= 50 and < 100' then ' >= 50 and < 100'
else loan_buckets end as loan_buckets,
round(sum(sum_amount_eur)::float/1000000::int) as total_txn_volume_M€, 
sum(sum_amount_eur)::numeric/ sum(n_users)::numeric as avg_volume_per_user, 
sum(n_txns)::numeric/ sum(n_users)::numeric as avg_txns_per_user, 
sum(sum_amount_eur)::numeric/ sum(n_txns)::numeric as avg_volume_per_txn
from mcc_df 
group by 1, 2, 3
order by 3
"""

In [17]:
con.register("mcc_df", mcc_df)
txn_bucket_df = con.execute(txn_bucket_query).fetchdf()
grouped_txns = txn_bucket_df.groupby(["market"]).sum().reset_index()
grouped_txns[["market", "total_txn_volume_m€"]]

Unnamed: 0,market,total_txn_volume_m€
0,ESP,68.0
1,FRA,267.0
2,ITA,180.0


In [18]:
af.column_multi(
    txn_bucket_df,
    "market:N",
    "loan_buckets:N",
    "total_txn_volume_m€:Q",
    "market:N",
    250,
    400,
    "x",
).properties(title="Volume of transactions per transaction bucket and country")

In [19]:
af.column_multi(
    txn_bucket_df,
    "market:N",
    "market:N",
    "avg_txns_per_user:Q",
    "loan_buckets:N",
    100,
    400,
    "x",
).properties(
    title="Average number of transactions per user per transaction bucket and country"
)

<a id='section4'></a>
# Which merchant categories get the biggest portion of the eligible transactions?

The answer to this question is quite different for each of the selected markets:
 - As for the total transaction volume, if we exclude the uncategorized merchant groups (no_cat), clothing department stores is the group with the highest volume for Spain and Italy (with 10.7M€ and 36.9M€ respectively), and the third highest in France (the highest merchant group in France is household stores with 46.1M€)

In [20]:
mcc_cat_query = """
select
tnc_country_group as market, 
case when trim(mcc_category) = 'clothing_depart_store' then 'clothing_dpt_store'
when trim(mcc_category) = 'computer_electronic_stores' then 'pc_electronic_stores'
when trim(mcc_category) = 'money_cash_financial' then 'money_cash_fin'
when trim(mcc_category) = 'local_transport_railway' then 'local_trans_railway'
when trim(mcc_category) = 'gambling_gaming' then 'gambling'
else trim(mcc_category) end as mcc_category,
round(sum(sum_amount_eur)::numeric/1000000, 1) as total_txn_volume_M€, 
sum(n_users),
round(sum(sum_amount_eur)::numeric/ sum(n_users)::numeric, 1) as avg_volume_per_user, 
round(sum(n_txns)::numeric/ sum(n_users)::numeric, 1) as avg_txns_per_user, 
round(sum(sum_amount_eur)::numeric/ sum(n_txns)::numeric) as avg_volume_per_txn
from mcc_df 
group by 1, 2
order by 2
"""

In [21]:
mcc_cat_df = con.execute(mcc_cat_query).fetchdf()

In [22]:
af.column_multi(
    mcc_cat_df,
    "market:N",
    "mcc_category:N",
    "total_txn_volume_m€:Q",
    "market:N",
    400,
    400,
    "-y",
).properties(
    title="Total transaction volume in million euros per merchant group and country"
)

In [23]:
mcc_cat_top10_query = """
with ita as (
select
tnc_country_group as market, 
mcc_category,
sum(sum_amount_eur) as total_txn_volume_M€, 
round(sum(sum_amount_eur)::numeric/ sum(n_users)::numeric, 1) as avg_volume_per_user, 
round(sum(n_txns)::numeric/ sum(n_users)::numeric, 1) as avg_txns_per_user, 
round(sum(sum_amount_eur)::numeric/ sum(n_txns)::numeric) as avg_volume_per_txn
from mcc_df 
where tnc_country_group = 'ITA'
group by 1, 2
order by 3 desc 
limit 10
), 
fra as(
select
tnc_country_group as market, 
mcc_category,
sum(sum_amount_eur) as total_txn_volume_M€, 
round(sum(sum_amount_eur)::numeric/ sum(n_users)::numeric, 1) as avg_volume_per_user, 
round(sum(n_txns)::numeric/ sum(n_users)::numeric, 1) as avg_txns_per_user, 
round(sum(sum_amount_eur)::numeric/ sum(n_txns)::numeric) as avg_volume_per_txn
from mcc_df 
where tnc_country_group = 'FRA'
group by 1, 2
order by 3 desc 
limit 10
), 
esp as (
select
tnc_country_group as market, 
mcc_category,
sum(sum_amount_eur) as total_txn_volume_M€, 
round(sum(sum_amount_eur)::numeric/ sum(n_users)::numeric, 1) as avg_volume_per_user, 
round(sum(n_txns)::numeric/ sum(n_users)::numeric, 1) as avg_txns_per_user, 
round(sum(sum_amount_eur)::numeric/ sum(n_txns)::numeric) as avg_volume_per_txn
from mcc_df 
where tnc_country_group = 'ESP'
group by 1, 2
order by 3 desc 
limit 10
)
select * from ita 
union all select * from fra
union all select * from esp
"""

In [24]:
mcc_cat_top10_df = con.execute(mcc_cat_top10_query).fetchdf()

 After identifying the top 10 merchant groups in each country based on the volume of transactions, we will look into these on the user/ transaction level:
 - As for the average number of transactions per user for the top 10 groups, the gas service station group is the highest one in France  and Italy, with an average of 6 transactions for both of them, and clothing department store leads in Spain (with 3.9 transactions per user).
 - In regards to the average volume per transaction, airline is on top for France (with 178€ per transaction), hotel lodging leads in Italy (with 152€ on average), and car rental has the highest average in Spain (173€).

In [25]:
af.column_multi(
    mcc_cat_top10_df,
    "market:N",
    "mcc_category:N",
    "avg_txns_per_user:Q",
    "market:N",
    250,
    400,
    "x",
).properties(
    title="Avg Transactions per user for the top 10 merchant groups per country"
)

In [26]:
af.column_multi(
    mcc_cat_top10_df,
    "market:N",
    "mcc_category:N",
    "avg_volume_per_txn:Q",
    "market:N",
    250,
    400,
    "x",
).properties(title="Avg Volume per txn for the top 10 merchant groups per country")

# Appendix

### Could we use Klarna transactions as a proxy for the appetite for this product in the selected markets?

By looking at direct debit requests with the keyword 'klarna' in the creditor name, we found a very small sample of users using this product outside of Germany between January and August 2021. 

In [27]:
klarna_txns_df = df_from_sql(
    "redshiftreader",
    """
select
tnc_country_group,
count(*) as n_tnxs,
count(distinct user_id) as n_users
from etl_reporting.ch_direct_debit_request_incoming dd
inner join etl_reporting.cr_account_external_id ea
on dd.debitor_iban = ea.external_id
inner join dbt.zrh_users using (account_id)
where creditor_name ilike '%KLARNA%'
and (timestamp 'epoch' + dd.created/1000 * INTERVAL '1 Second ') between '2021-01-01' and '2021-08-31'
group by 1
order by 2 desc
""",
)

In [28]:
klarna_txns_df

Unnamed: 0,tnc_country_group,n_tnxs,n_users
0,DEU,409377,39101
1,AUT,9651,2154
2,GrE,2851,473
3,NEuro,981,131
4,ITA,795,119
5,FRA,793,97
6,ESP,544,89


In [None]:
# Below you can find the queries used to build a business case to assess the risk of getting this product
# into those new markets
export_input_df = df_from_sql(
    "redshiftreader",
    """
with current_arrears_users as (
select 
distinct user_created
from cbs_arrear_original 
inner join dbt.zrh_users using (user_created)
where closed_at is null
and end_tstmp is null 
)
select 
to_char(created, 'YYYY-MM') as txn_month, 
user_id,
tnc_country_group,
legal_entity,
case when amount_cents_eur::numeric/100 < 100 then '50 - 99'
when amount_cents_eur::numeric/100 < 150 then '100 - 149'
when amount_cents_eur::numeric/100 < 200 then '150 - 199'
when amount_cents_eur::numeric/100 < 250 then '200 - 249'
when amount_cents_eur::numeric/100 < 300 then '250 - 299'
when amount_cents_eur::numeric/100 < 350 then '300 - 349'
when amount_cents_eur::numeric/100 < 400 then '350 - 399'
when amount_cents_eur::numeric/100 < 450 then '400 - 449'
when amount_cents_eur::numeric/100 <= 500 then '450 - 500'
end as loan_buckets,
count(*) as n_txns,
sum(amount_cents_eur)::numeric/100 as sum_amount_eur
from dbt.zrh_card_transactions 
inner join dbt.zrh_users using(user_created)
left join current_arrears_users cau using (user_created)
where cau.user_created is null
and amount_cents_eur::numeric/100 between 50 and 500
and mcc not in (
'1343','1381','1454','1500','1761','1771','2490','3014','3028','3039',
'3048','3049','3267','3302','3351','3352','3353','3354','3356','3358',
'3359','3360','3361','3362','3363','3364','3365','3366','3367','3368',
'3369','3370','3371','3372','3373','3374','3375','3376','3377','3378',
'3379','3380','3381','3382','3383','3384','3385','3386','3387','3388',
'3390','3391','3392','3393','3394','3395','3396','3397','3398','3399',
'3400','3401','3402','3403','3404','3406','3407','3408','3409','3410',
'3411','3412','3413','3414','3415','3416','3417','3418','3419','3420',
'3421','3422','3423','3424','3425','3426','3427','3428','3429','3430',
'3431','3432','3433','3434','3435','3436','3437','3438','3439','3440',
'3520','3526','3544','3615','3635','3641','3670','3672','3716','3754',
'3777','3824','4121','4411','4511','4722','4812','4814','4816','4829',
'5039','5094','5122','5231','5310','5399','5411','5541','5812','5814',
'5815','5816','5912','5933','5960','5962','5963','5964','5965','5966',
'5967','5972','5983','5993','6011','6012','6051','6211','6513','6538',
'6540','7012','7021','7273','7277','7311','7361','7372','7399','7519',
'7841','7922','7988','7993','7994','7995','7997','8062','8071','8099',
'8211','8389','8999','9222','9223','9311','9399','9405','5813','5499',
'7523','8111','5948','5047','6300','5099','1520','7997','8299','3790') --Blacklisted MCCs provided by product
and created::date between '2020-11-01' and '2021-08-31'
and is_mau
and not is_fraudster
and tnc_country_group in ('FRA', 'ESP', 'ITA')
group by 1, 2, 3, 4, 5
""",
)

In [None]:
legal_entities_df = df_from_sql(
    "redshiftreader",
    """
select user_id, legal_entity from dbt.zrh_users 
where tnc_country_group in ('FRA', 'ESP', 'ITA')
""",
)

In [None]:
export_query = """
select 
txn_month || '-01' as txn_month,
tnc_country_group,
legal_entity,
loan_buckets,
rating_class,
round(pd, 2) as pd,
count(*) as n_users
from export_input_df
inner join input_df using (user_id)
where rating_class <= 12
group by 1, 2, 3, 4, 5, 6
order by 1, 2, 3, 4, 5, 6
"""
con.register("export_input_df", export_input_df)
export_df = con.execute(export_query).fetchdf()

In [None]:
export_grouped_query = """
with all_elig_users as (
select 
tnc_country,
legal_entity,
rating_class,
round(pd, 2) as pd,
count(*) as n_elig_users
from input_df
inner join legal_entities_df using (user_id)
where rating_class <= 12
group by 1, 2, 3, 4
),
txn_users as (
select 
txn_month || '-01' as txn_month,
tnc_country,
legal_entity,
rating_class,
round(pd, 2) as pd,
count(distinct user_id) as n_txn_users
from export_input_df
inner join input_df using (user_id)
where rating_class <= 12
group by 1, 2, 3, 4, 5
)
select 
txn_month,
tnc_country,
legal_entity,
rating_class,
pd,
n_txn_users,
n_elig_users - n_txn_users as n_elig_users_without_transactions,
n_elig_users
from txn_users
inner join all_elig_users using (tnc_country, legal_entity, rating_class, pd)
order by 1, 2, 3, 4, 5
"""
con.register("legal_entities_df", legal_entities_df)
export_grouped_df = con.execute(export_grouped_query).fetchdf()

In [None]:
export_df.to_csv(
    "research/product/bank_products/[UPDATED]20210705_potential_european_market_TBIL/gsheets_output.csv",
    sep="\t",
)

In [None]:
export_grouped_df.to_csv(
    "research/product/bank_products/[UPDATED]20210705_potential_european_market_TBIL/gsheets_grouped_output.csv",
    sep="\t",
)