title: Which User Clusters are most common for our Bank Products users?
author: Helder Silva 
date: 2021-05-11
region: EU
link: https://docs.google.com/presentation/d/10jBvJM6MYIgp_NopTuOw1EJbMIu1H_v06VIzQbVIoIY
tags: user clusters, early activity, bank products, installment loans, overdraft, easyflex savings, consumer credit
summary: In this deep dive we'll be looking into how the Early Users Clusters research by Wendy Vu applies to our Bank Products users that were using these products on April 30th 2021. The percentage of users in the Holding Account users decreases for all products but Fixed Term Savings (which makes sense since this product is designed for long term savings and doesn't require user activity on its own). On the other hand, the percentage of Barely Active users is drastically higher for users in Overdraft Arrears and Using Overdraft with 40% and 18% increases respectively). Also, the proportion of Spaces Power Users for savings users is higher than the baseline or any other Bank Product (with a 31% increase for EasyFlex Savings and 16% increase for Fixed Term Savings).

In [43]:
!pip install duckdb

<div class="alert alert-block alert-success">
    <H1>Which User Clusters are most common for our Bank Products users?</H1>
</div>

In this deep dive we'll be looking into how the [Early Users Clusters](https://docs.google.com/presentation/d/1lLOP1JsVTasj2hTf6Mseu0BHg0NZslY1lq-pyPN8kZk) research by Wendy Vu applies to our Bank Products users that were using these products on April 30th 2021. Our Bank Products at the time of this research are:
 - **Overdraft.** Since we have 130K+ users with this product, we are splitting it in 3 buckets:
      - **Overdraft - Not Using** (users that have enabled the overdraft product and didn't have a negative balance at the time of this research)
      - **Overdraft - Using** (users that have enabled the overdraft product and had a negative balance within their allowed limit at the time of this research)
      - **Overdraft - In Arrears** (users that have enabled the overdraft product and had a negative balance higher their allowed limit at the time of this research)
 - **Consumer Credit**
 - **Transaction Based Instalment Loans** (referred as TBIL henceforth)
 - **Fixed Term Savings**
 - **EasyFlex Savings**
 
 In this deep dive, we'll be answering the following questions:
  - [How many missing early clusters do we have missing for each Bank Product?](#section1)
  - [What is the proportion of early user clusters per Bank Product?](#section2)
  - [How does each Bank Product differ from the baseline?](#section3)
  - [What is the proportion of bank product users in the DEU baseline?](#section4)
  - [Are there any early cluster trends for written-off users?](#section5)
  - [How many users sign-up for a Bank Product after the assignment of the early clusters?](#section6)
  - [Do we find the same patterns for users who got their Bank Product after their early activity?](#section7)
  
 Here are our main findings:

In [44]:
import pandas as pd
from utils.datalib_database import df_from_sql

import utils.altair_functions as af
import altair as alt

import duckdb

con = duckdb.connect(database=":memory:", read_only=False)

In [3]:
bp_users_query = """
with bp_users as (
select 
case when outstanding_balance_eur is null then 'Overdraft - Not Using'
when outstanding_balance_eur > max_amount_cents::numeric/100  then 'Overdraft - In Arrears'
else 'Overdraft - Using' end as label,
user_created
from dbt.bp_overdraft_users od
where od_enabled_flag 
and end_time::date = '2021-04-30'
and timeframe = 'month'
union all 
select 
'FT savings users' as label, 
user_created
from dbt.bank_products_users
where has_ft_savings
and end_time::date = '2021-04-30'
union all 
select 
'Easyflex savings users' as label,
user_created
from dbt.bank_products_users
where has_easyflex_savings
and end_time::date = '2021-04-30'
union all 
select 
'Credit users' as label, 
user_created
from dbt.bank_products_users
where has_consumer_credit
and end_time::date = '2021-04-30'
union all 
select 
'TBIL users' as label, 
user_created
from dbt.bank_products_users
where has_tbil
and end_time::date = '2021-04-30'
)
select bp.*, uc.*, tnc_country_group
from bp_users bp
inner join dbt.zrh_users using (user_created)
left join dev_dbt.early_cluster_3M_labels_April142021 uc using (user_id)
"""

In [4]:
grouped_clusters_query = """
select
cluster_12, 
early_cluster,
case when early_cluster in ('E01','E02','E03','E04','E05','E06','E12') then  'High_Activity'
when early_cluster in ('E07','E08','E09','E10','E11') then 'Low_Activity' end as activity_group, 
case when early_cluster = 'E01' then 'Spaces Power Users'
when early_cluster = 'E02' then 'International Travelers'
when early_cluster = 'E03' then  'Euro Travelers'
when early_cluster = 'E04' then  'Cash26ers'
when early_cluster = 'E05' then  'Domestic Spenders'
when early_cluster = 'E06' then  'Mobile Spenders'
when early_cluster = 'E07' then  'Online Spenders'
when early_cluster = 'E08' then  'Barely Active'
when early_cluster = 'E09' then  'Cash Spenders'
when early_cluster = 'E10' then  'Holding Account'
when early_cluster = 'E11' then  'Refer Friends'
when early_cluster = 'E12' then  'Primary Account Users'
end as cluster_name,
tnc_country_group,
count (*) as n_users
from dev_dbt.early_cluster_3M_labels_April142021 uc
inner join dbt.zrh_users using (user_id)
group by 1, 2, 3, 4, 5
order by 2
"""

In [45]:
bp_users_df = df_from_sql("redshiftreader", bp_users_query)

In [46]:
grouped_clusters_df = df_from_sql("redshiftreader", grouped_clusters_query)

In [48]:
# First we register the table name to existing dataframe
con.register("bp_users_df", bp_users_df)
con.register("grouped_clusters_df", grouped_clusters_df)

<a id='section1'></a>
# How many missing early clusters do we have missing for each Bank Product?

In [8]:
monthly_missing_clusters_query = """ 
select 
date_trunc('month', user_created::date) as month, 
count(*) n_users,
count(case when user_id is null then 1 end) as missing_users,
round(count(case when user_id is null then 1 end)::numeric/count(*)::numeric, 2)*100 as perc_missing_users
from bp_users_df
group by 1
order by 1 desc
"""

In [9]:
monthly_missing_clusters_df = con.execute(monthly_missing_clusters_query).fetchdf()

The vast majority of Bank Products who signed-up with N26 before June 2016 and after February 2021 are missing an early user cluster, and therefore won't be included in the user cluster analysis in this research.

In [10]:
af.area_single(
    monthly_missing_clusters_df,
    af.pink,
    "month:O",
    "perc_missing_users:Q",
    800,
    400,
    "x",
)

In [11]:
missing_clusters_product_query = """ 
select label, 
count(*) n_users,
count(case when user_id is null then 1 end) as missing_users,
round(count(case when user_id is null then 1 end)::numeric/count(*)::numeric, 2)*100 as perc_missing_users
from bp_users_df
group by 1
"""

In [12]:
missing_clusters_product_df = con.execute(missing_clusters_product_query).fetchdf()

When looking into how this affects the user base of each of our bank products, it seems that our older products tend to have about half of their users with a missing early cluster, whereas our most recent products such as TBIL and EasyFlex Savings only have 32% and 22% of users with missing early clusters respectively.

In [13]:
missing_clusters_product_df

Unnamed: 0,label,n_users,missing_users,perc_missing_users
0,Overdraft - In Arrears,2384,905,38.0
1,Overdraft - Using,37543,18129,48.0
2,Overdraft - Not Using,91252,48507,53.0
3,FT savings users,972,304,31.0
4,Easyflex savings users,1308,292,22.0
5,Credit users,4678,2599,56.0
6,TBIL users,524,170,32.0


In [14]:
af.column_single_label(
    missing_clusters_product_df,
    af.rhubarb,
    "label:N",
    "perc_missing_users:Q",
    600,
    400,
    "-y",
)

<a id='section2'></a>
# What is the proportion of early user clusters per Bank Product?

Below you can see the proportion of each early cluster per Bank Product, as well as a comparison with the baseline of all users with an early cluster. 

We can see that the top 3 combinations are: 
 - Barely Active users correspond to 45% of all users in Overdraft Arrears
 - Spaces Power Users correspond to 40% of all EasyFlex Savings Users 
 - Domestic Spenders correspond to 28% of all TBIL Users

In [15]:
bp_clusters_query = """
select 
label,
early_cluster || ' | ' ||cluster_name as cluster_name,
count(*) as n_users,
round(count(*)::numeric/(sum(count(*)) over (partition by label))::numeric, 2)*100 as perc_users
from bp_users_df
inner join grouped_clusters_df using (early_cluster)
group by 1, 2
union all
select 
' User Cluster Baseline' as label,
early_cluster || ' | ' ||cluster_name,
sum(n_users),
round(sum(n_users)::numeric/(sum(sum(n_users)) over ())::numeric, 2)*100 as perc_users
from grouped_clusters_df
group by 1, 2
order by 1, 2
"""

In [16]:
# Then we execute que query and store the outupt in a different df (we could store it in the same one, ofc)
bp_clusters_df = con.execute(bp_clusters_query).fetchdf()

In [17]:
heatmap = (
    alt.Chart(bp_clusters_df)
    .mark_rect()
    .encode(
        x="label:N", y="cluster_name:N", color="perc_users:Q", tooltip="perc_users:Q"
    )
    .properties(
        width=600, height=500, title="Early User Clusters per Bank Product and Baseline"
    )
)

# Configure text
text = heatmap.mark_text(baseline="middle").encode(
    text="perc_users:Q",
    color=alt.condition(
        alt.datum.perc_users < 20, alt.value("black"), alt.value("white")
    ),
)

graph = (
    alt.hconcat(heatmap + text)
    .configure_axis(
        labelFontSize=12,
        titleFontSize=14,
        labelAngle=30,
        labelColor="#666666",
        titleColor="#266678",
        grid=False,
    )
    .configure_title(fontSize=15)
)

graph

<a id='section3'></a>
# How does each Bank Product differ from the baseline?

When looking into how the proportion of early clusters for each Bank Product differ from the baseline, we can see the following trends:
 - The percentage of users in the Holding Account users decreases for all products but Fixed Term Savings (which makes sense since this product is designed for long term savings and doesn't require user activity on its own).
 - On the other hand, the percentage of Barely Active users is drastically higher for users in Overdraft Arrears and Using Overdraft with 40% and 18% increases respectively).
 - Also, the proportion of Spaces Power Users for savings users is higher than the baseline or any other Bank Product (with a 31% increase for EasyFlex Savings and 16% increase for Fixed Term Savings).

In [18]:
bp_clusters_diff_query = """
with cluster_groups as (
select
cluster_12, 
early_cluster,
cluster_name,
sum(n_users) as n_users,
round(sum(n_users)::numeric/(sum(sum(n_users)) over ())::numeric, 2)*100 as perc_users
from grouped_clusters_df
group by 1, 2, 3
),
totals as (
select 
label,
early_cluster || ' | ' ||cluster_name as cluster_name,
perc_users,
count(*) as n_users,
round(count(*)::numeric/(sum(count(*)) over (partition by label))::numeric, 2)*100 as bp_perc_users
from bp_users_df
inner join cluster_groups using (early_cluster)
group by 1, 2, 3
)
select 
label,
cluster_name,
n_users, 
bp_perc_users - perc_users as perc_diff,
bp_perc_users,
perc_users
from totals
"""

In [19]:
# Then we execute que query and store the outupt in a different df (we could store it in the same one, ofc)
bp_clusters_diff_df = con.execute(bp_clusters_diff_query).fetchdf()

In [20]:
heatmap = (
    alt.Chart(bp_clusters_diff_df)
    .mark_rect()
    .encode(
        x="label:N",
        y="cluster_name:N",
        color=alt.Color("perc_diff:Q", scale=alt.Scale(domain=(-10, 10))),
        tooltip="perc_diff:Q",
    )
    .properties(
        width=600,
        height=500,
        title="Bank Products Early User Cluster diff from baseline",
    )
)

# Configure text
text = heatmap.mark_text(baseline="middle").encode(
    text="perc_diff:Q",
    color=alt.condition(
        alt.datum.perc_diff < 0, alt.value("black"), alt.value("white")
    ),
)

graph = (
    alt.hconcat(heatmap + text)
    .configure_axis(
        labelFontSize=12,
        titleFontSize=14,
        labelAngle=30,
        labelColor="#666666",
        titleColor="#266678",
        grid=False,
    )
    .configure_title(fontSize=15)
)

graph

<a id='section4'></a>
# What is the proportion of bank product users in the DEU baseline?

When looking into the proportion of Bank Product Users in all Germany users with early user clusters, we can see that only the Overdraft Product has relevant percentages (which makes sense since this product has the highest user base by far). Once again we can verify that the biggest Early Cluster percentage for Overdraft users goes for Barely Active (18% of all Germany users have at least an Overdraft enabled), and surprisingly the second biggest bucket is International Travelers (with 15% of all Germany users).

In [21]:
bp_perc_baseline_query = """
with cluster_groups as (
select
cluster_12, 
early_cluster,
cluster_name,
sum(n_users) as cluster_n_users
from grouped_clusters_df
where tnc_country_group like '%DEU%'
group by 1, 2, 3
),
totals as (
select 
case when label like 'Overdraft%' then 'Overdraft' else label end as label,
early_cluster || ' | ' ||cluster_name as cluster_name,
cluster_n_users,
count(*) as bp_n_users
from bp_users_df bp
inner join cluster_groups gc using (early_cluster)
where tnc_country_group like '%DEU%'
group by 1, 2, 3
)
select 
label,
cluster_name,
cluster_n_users,
bp_n_users,
round(bp_n_users::numeric/ cluster_n_users::numeric, 2) * 100 as perc_in_baseline
from totals
"""

In [22]:
# Then we execute que query and store the outupt in a different df (we could store it in the same one, ofc)
bp_perc_baseline_df = con.execute(bp_perc_baseline_query).fetchdf()

In [24]:
heatmap = (
    alt.Chart(bp_perc_baseline_df)
    .mark_rect()
    .encode(
        x="label:N",
        y="cluster_name:N",
        color="perc_in_baseline:Q",
        tooltip="perc_in_baseline:Q",
    )
    .properties(
        width=600,
        height=500,
        title="Proportion of Bank Product users in the DEU baseline",
    )
)

# Configure text
text = heatmap.mark_text(baseline="middle").encode(
    text="perc_in_baseline:Q",
    color=alt.condition(
        alt.datum.perc_in_baseline < 10, alt.value("black"), alt.value("white")
    ),
)

graph = (
    alt.hconcat(heatmap + text)
    .configure_axis(
        labelFontSize=12,
        titleFontSize=14,
        labelAngle=30,
        labelColor="#666666",
        titleColor="#266678",
        grid=False,
    )
    .configure_title(fontSize=15)
)

graph

<a id='section5'></a>
# Are there any early cluster trends for written-off users?

In [25]:
write_off_users_query = """
select 
reason as label,
user_created,
user_id,
early_cluster,
tnc_country_group
from dbt.write_off
inner join dbt.zrh_users using (user_created)
left join dev_dbt.early_cluster_3M_labels_April142021 uc using (user_id)
where (od_status = 'Arranged' or reason in ('Credit', 'TBIL'))
"""

In [49]:
write_off_users_df = df_from_sql("redshiftreader", write_off_users_query)

In [27]:
write_off_clusters_query = """
select 
label,
early_cluster || ' | ' ||cluster_name as cluster_name,
sum(case when user_created is not null then 1 else 0 end) as n_users,
round(count(*)::numeric/(sum(count(*)) over (partition by label))::numeric, 2)*100 as perc_users
from write_off_users_df
inner join grouped_clusters_df using (early_cluster)
group by 1, 2
union all
select 
' User Cluster Baseline' as label,
early_cluster || ' | ' ||cluster_name,
sum(n_users),
round(sum(n_users)::numeric/(sum(sum(n_users)) over ())::numeric, 2)*100 as perc_users
from grouped_clusters_df
group by 1, 2
order by 1, 2
"""

In [28]:
grouped_write_off_clusters_query = """
select 
label,
count(*) as n_users
from write_off_users_df
group by 1
"""

In [29]:
# First we register the table name to existing dataframe
con.register("write_off_users_df", write_off_users_df)

# Then we execute que query and store the outupt in a different df (we could store it in the same one, ofc)
write_off_clusters_df = con.execute(write_off_clusters_query).fetchdf()
grouped_write_off_clusters_df = con.execute(grouped_write_off_clusters_query).fetchdf()

In [30]:
grouped_write_off_clusters_df

Unnamed: 0,label,n_users
0,Arranged Overdraft,10633
1,Credit,326
2,TBIL,7


By now it should be no surprise that when looking into write-off users with early user clusters, that the Barely Active user cluster is the most represented one for both Arranged Overdraft and Consumer Credit. As for TBIL users, since we only had 7 write-offs for this product at the time of this deep dive, these may be worth revisiting once we have more data on this one.

In [42]:
heatmap = (
    alt.Chart(write_off_clusters_df)
    .mark_rect()
    .encode(
        x="label:N", y="cluster_name:N", color="perc_users:Q", tooltip="perc_users:Q"
    )
    .properties(
        width=600,
        height=500,
        title="Early User Clusters per Written-off Bank Product and Baseline",
    )
)

# Configure text
text = heatmap.mark_text(baseline="middle").encode(
    text="perc_users:Q",
    color=alt.condition(
        alt.datum.perc_users < 20, alt.value("black"), alt.value("white")
    ),
)

graph = (
    alt.hconcat(heatmap + text)
    .configure_axis(
        labelFontSize=12,
        titleFontSize=14,
        labelAngle=30,
        labelColor="#666666",
        titleColor="#266678",
        grid=False,
    )
    .configure_title(fontSize=15)
)

graph

<a id='section6'></a>
# How many users sign-up for a Bank Product after the assignment of the early clusters?

In [32]:
product_adoption_query = """
with unions as (
select
'ft_savings' as label, 
user_created,
min(activation_date) as min_product_date
from st_fixed_term_plan st
where activation_date is not null
group by 1, 2
union all 
select 
'od_first_enabled' as label,
user_created,
min(end_time) 
from dbt.bp_overdraft_users
inner join dbt.zrh_users using (user_created)
where timeframe = 'day'
and od_enabled_flag 
group by 1, 2
union all 
select 
'od_first_using' as label,
user_created,
min(end_time) 
from dbt.bp_overdraft_users
inner join dbt.zrh_users using (user_created)
where timeframe = 'day'
and od_enabled_flag 
and outstanding_balance_eur is not null 
group by 1, 2
union all 
select 
'consumer_credit' as label,
user_created,
min(disbursed) 
from cc_credit_draft as a 
where disbursed is not null
group by 1, 2
union all 
select 
'easyflex_savings' as label,
user_created,
min(st.updated) 
from st_overnight_plan st 
inner join st_customer sc 
on st.customer_id = sc.id
and st.status = 'ACTIVE'
group by 1, 2
union all
select 
'tbil' as label,
user_created,
min(disbursement_date) 
from nh_transaction_instalment_loan
inner join dbt.zrh_users using (user_id)
where disbursement_date is not null
group by 1, 2
)
select 
label,
user_id,
user_created,
to_char(min_product_date, 'YYYY-MM-DD') as min_product_date,
to_char(ft_mau + interval '35 day', 'YYYY-MM-DD') as early_activity_end,
min_product_date > early_activity_end  as user_after_early_activity, 
tnc_country_group
from unions
inner join dbt.zrh_users using (user_created)
where ft_mau >= '2016-01-01'
"""

In [50]:
product_adoption_df = df_from_sql("redshiftreader", product_adoption_query)

In [34]:
total_adoption_split_query = """
with totals as (
select 
p.label, 
sum(case when not user_after_early_activity then 1 else 0 end) as n_users_before_early_activity,
sum(case when user_after_early_activity then 1 else 0 end) as n_users_after_early_activity,
count(*) as n_users
from product_adoption_df p
inner join bp_users_df bp using (user_id)
where early_cluster is not null
group by 1
)
select *, 
round(n_users_after_early_activity::numeric/n_users::numeric, 3) *100 as perc_after_early_activity
from totals
"""

Finally, we try to understand what percentage of users get their Bank Product after being assigned an early cluster. Higher percentages of users getting their product after their cluster assignment mean that eventual early cluster patterns can be highly correlated with users getting a certain Bank Product in the future. 

We can verify that for more recent products such as EasyFlex Savings and TBIL the majority of users have indeed signed-up for the product after the assignment of the early user cluster. 

Another interesting finding is that only 47% of the enabled overdraft users got their product after the assignment. This proportion will most likely change moving forward since more and more we are granting Overdraft based on our internal score, which requires at least 3 months of activity before being calculated (therefore these users will be way past their early cluster assignment when they get the product).

In [35]:
# First we register the table name to existing dataframe
con.register("product_adoption_df", product_adoption_df)
# Then we execute que query and store the outupt in a different df (we could store it in the same one, ofc)
total_adoption_split_df = con.execute(total_adoption_split_query).fetchdf()
total_adoption_split_df

Unnamed: 0,label,n_users_before_early_activity,n_users_after_early_activity,n_users,perc_after_early_activity
0,ft_savings,258.0,672.0,930,72.3
1,od_first_enabled,34637.0,30457.0,65094,46.8
2,od_first_using,10671.0,35091.0,45762,76.7
3,consumer_credit,755.0,2799.0,3554,78.8
4,easyflex_savings,66.0,1253.0,1319,95.0
5,tbil,0.0,617.0,617,100.0


In [36]:
monhtly_adoption_split_query = """
with totals as (
select p.label, 
date_trunc('month', early_activity_end::date) as early_activity_month,
sum(case when not user_after_early_activity then 1 else 0 end) as n_users_before_early_activity,
sum(case when user_after_early_activity then 1 else 0 end) as n_users_after_early_activity,
count(*) as n_users
from product_adoption_df p
inner join bp_users_df bp using (user_id)
where early_cluster is not null
group by 1, 2
)
select *, 
round(n_users_after_early_activity::numeric/n_users::numeric, 3) *100 as perc_after_early_activity
from totals
where n_users_after_early_activity > 0
"""

In [37]:
monhtly_adoption_split_df = con.execute(monhtly_adoption_split_query).fetchdf()

In [38]:
af.column_multi(
    monhtly_adoption_split_df,
    "label:N",
    "early_activity_month:O",
    "perc_after_early_activity:Q",
    "label:N",
    100,
    400,
    "x",
)

<a id='section7'></a>
# Do we find the same patterns for users who got their Bank Product after their early activity?

After finding which users got their Bank Products after the early cluster assignment, we can end with comparing the proportion of these users with the proportion of all bank users shown in the [What is the proportion of early user clusters per Bank Product?](#section2) section. Curiously, the main difference between these 2 is that the percentage of Barely Active users for users that sign-up for Overdraft after the cluster assignment don't have the substantial increase for Overdraft Arrears seen before. In fact, these users have the same proportion of Barely Active users as the baseline (5%). These deltas could be due to differences in activity for users that got their overdraft through our internal credit score vs. external ones, so it might be worth it to compare these moving forward.  

In [39]:
product_after_ea_query = """
with pa_label_match as (
select *, 
case when label = 'od_first_enabled' then 'Overdraft'
when label = 'ft_savings' then 'FT savings users'
when label = 'consumer_credit' then 'Credit users'
when label = 'easyflex_savings' then 'Easyflex savings users'
when label = 'tbil' then 'TBIL users'
end as fixed_label
from product_adoption_df
),
bpu_label_match as (
select *,
case when label like '%Overdraft%' then 'Overdraft'
else label end as fixed_label 
from bp_users_df
)
select 
bpu.label,
early_cluster || ' | ' ||cluster_name as cluster_name,
count(*) as n_users,
round(count(*)::numeric/(sum(count(*)) over (partition by bpu.label))::numeric, 2)*100 as perc_users
from bpu_label_match bpu
inner join pa_label_match using (user_created, fixed_label)
inner join grouped_clusters_df using (early_cluster)
where user_after_early_activity
group by 1, 2
union all
select 
' User Cluster Baseline' as label,
early_cluster || ' | ' ||cluster_name,
sum(n_users),
round(sum(n_users)::numeric/(sum(sum(n_users)) over ())::numeric, 2)*100 as perc_users
from grouped_clusters_df
group by 1, 2
order by 1, 2
"""

In [40]:
# Then we execute que query and store the outupt in a different df (we could store it in the same one, ofc)
product_after_ea_df = con.execute(product_after_ea_query).fetchdf()

In [41]:
product_after_ea = (
    alt.Chart(product_after_ea_df)
    .mark_rect()
    .encode(
        x="label:N", y="cluster_name:N", color="perc_users:Q", tooltip="perc_users:Q"
    )
    .properties(
        width=250, height=500, title="Bank Products Acquired After Early Activity"
    )
)

all_products = (
    alt.Chart(bp_clusters_df)
    .mark_rect()
    .encode(
        x="label:N", y="cluster_name:N", color="perc_users:Q", tooltip="perc_users:Q"
    )
    .properties(width=250, height=500, title="All Early User Clusters per Bank Product")
)


# Configure text
text_product_after_ea = product_after_ea.mark_text(baseline="middle").encode(
    text="perc_users:Q",
    color=alt.condition(
        alt.datum.perc_users < 20, alt.value("black"), alt.value("white")
    ),
)

# Configure text
text_all_products = all_products.mark_text(baseline="middle").encode(
    text="perc_users:Q",
    color=alt.condition(
        alt.datum.perc_users < 20, alt.value("black"), alt.value("white")
    ),
)


graph = (
    alt.hconcat(
        product_after_ea + text_product_after_ea | all_products + text_all_products
    )
    .configure_axis(
        labelFontSize=12,
        titleFontSize=14,
        labelAngle=30,
        labelColor="#666666",
        titleColor="#266678",
        grid=False,
    )
    .configure_title(fontSize=15)
)
graph

# Future Recommendations

- Split Overdraft users per credit score provider
- Look into dbt.nps_score to check if users truly use their N26 account as their main bank account 
- Compare primary vs. secondary account users
- Compare Bank Products users with mature clusters once available