title: Did the overdraft risk reduction measures impact user behaviour?
author: Helder Silva 
date: 2022-01-22
region: EU 
tags: bank products, balance, transactions, overdraft, reduction, cancellation, correlation
summary: Between the start of the overdraft reduction process in June 2021 up until the end of the year, roughly 11.7K users had their overdraft reduced or cancelled (58% of these were reduced). For these, the total limit decrease was 14.8M€ (61% of this was for reductions). When looking into the Pearson correlations between having a limit reduction and the other variables considered for this deep dive, we can see a strong correlation with the average probability of default (PD), which is expected since these were the users selected for this reduction in the first place. We can also see some moderate negative correlations with the average limit and outstanding balance (which is once again expected as seen in the previous sections). As for the remaining variables (percentage of users in arrears, percentage and count of write-off users, and transaction related metrics), it seems we only verify weak correlations. This seems to indicate that the measures applied in this limit reduction don't seem to relate to user behaviour in those areas.

<div class="alert alert-block alert-success">
    <H1>Did the overdraft risk reduction measures impact user behaviour?</H1>
</div>

Currently the Overdraft portfolio consists of 2 types of users based on underwriting decision:

OD granted based on Schufa Score 😞 (pre-2020)

OD granted based on Lisbon score 🙂 (July 2020 onward 50% decisions, Jan 2021 onwards 100% decisions)

The old cohorts are showing significantly higher losses as they mature at month-on-book 6 & 12 compared to the new Lisbon cohorts. The current monitoring process does not perform limit reduction on an ongoing basis based on score deterioration once the user has enabled an overdraft.

This is leading to a situation where users whose current Lisbon score is higher than risk rating class 12 (N26’s max risk appetite 5% PD) are able to access overdraft and build up balances. To control the losses and  avoid users taking on more debt than they can repay, we need to perform ongoing limit management.

[Here](https://number26-jira.atlassian.net/wiki/spaces/ProdTech/pages/2691959597/Improvements+Overdraft+Monitoring+Limit+Reduction) you can find more context to this project. 

In this research, we looked into users who had an overdraft limit reduction or cancellations from the beginning of this process in June 2021 to the end of 2021. The difference between these 2 groups is as follows:
- **Overdraft Limit Reduction:** Users who are eligible from a risk perspective and have used Overdraft in the previous 6 months (i.e. went into a negative balance)
- **Overdraft Cancellation:** Users who are eligible from a risk perspective and have not used Overdraft in the previous 6 months 

For simplicity, we will consider both reduction and cancellation users as "reduced users" moving forward, and compare them with users that were not part of this process, we're mentioning them as "baseline users".

#### Here are the main topics we looked into:

- [How much did we reduce in 2021?](#section1)
- [How do reduced users differ from non-reduced users?](#section2)
 - [Users in Dec 2020 comparison](#section2.1)
   - [How many users had OD enabled in December last year/ were in the reduction list?](#section2.1.1)
   - [How many users have used OD in 2021?](#section2.1.2)
   - [How has usage changed for users who were reduced and the ones who didn't?](#section2.1.3)
 - [Cohort Analysis](#section2.2)
   - [How many users do we have per cohort?](#section2.2.1)
   - [Is there a correlation between being reduced or cancelled and our other variables?](#section2.2.2)
   - [Overdraft Enabled users cohorts](#section2.2.3)
   - [Overdraft Arrears Users cohorts](#section2.2.4)
   - [Overdraft Cumulative Written Off User Percentage cohorts](#section2.2.5)
   - [Overdraft Cumulative Written Off User Count cohorts](#section2.2.6)
   - [Avg outstanding balance cohorts](#section2.2.7)
   - [Avg limit cohorts](#section2.2.8)
   - [Avg percentage usage cohorts](#section2.2.9)
   - [Transaction Behaviour](#section2.2.10)
   
#### Our main findings are:
- Between the start of the overdraft reduction process in June 2021 up until the end of the year, roughly 11.7K users had their overdraft reduced or cancelled (58% of these were reduced). For these, the total limit decrease was 14.8M€ (61% of this was for reductions).
- When looking into the Pearson correlations between having a limit reduction and the other variables considered for this deep dive, we can see a strong correlation with the average probability of default (PD), which is expected since these were the users selected for this reduction in the first place. We can also see some moderate negative correlations with the average limit and outstanding balance (which is once again expected as seen in the previous sections).
- As for the remaining variables (percentage of users in arrears, percentage and count of write-off users, and transaction related metrics), it seems we only verify weak correlations. This seems to indicate that the measures applied in this limit reduction don't seem to relate to user behaviour in those areas. 
- There seems to be a slight increase of arrears users after limit reductions started, but since these are not yet materialising into write-offs and these users can still bounce back to be within their overdraft limit, these rates aren't necessarily worrying (we should look into arrears and write-off rates over a longer period of time to check if these users do leave arrears through top-ups).

#### Future Recommendations:
Since we will continue running these limit reduction on a regular basis, we suggest creating a dashboard that looks into the metrics analysed in this research, with a special focus on:
 - Arrears and write-off rates, as mentioned in our main findings.
 - In this research, we didn't focus on the split between users with limit reductions and users with cancellations given the small number of users in these groups. Moving forward, with an ever increasing population of users going through this process, we recommend splitting them into these 2 groups to see if any distinctive behaviours arise between these.
 - In the eventuality of having a visible decrease in the transaction volumes over time, we can also look into the potential revenue lost in transactions interchange, and compare it with potential write-off losses, to check if this risk reduction limits other sources of revenue.

In [1]:
%%capture

!pip install jupyter_contrib_nbextensions
!jupyter contrib nbextension install --user
!jupyter nbextension enable spellchecker/main
!pip install duckdb
!pip install altair

!pip uninstall typing pyserial --yes # this line might be needed if pandas-profiling cannot be installed
!pip install pandas-profiling
!pip install ipywidgets # this is needed if you get an error ModuleNotFoundError: No module named 'ipywidgets'

In [2]:
%%capture
cd /app/

In [3]:
import pandas as pd
from utils.datalib_database import df_from_sql
import utils.altair_functions as af
import duckdb
import altair as alt
import pandas_profiling as pp
from IPython.display import display_html, Markdown as md

con = duckdb.connect(database=":memory:", read_only=False)
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

In [4]:
# Chart functions


def heatmap(df, color, x, y, color_condition, width, height, tooltip):
    heatmap = (
        alt.Chart(df)
        .mark_rect()
        .encode(alt.X(x), alt.Y(y), color=color, tooltip=tooltip)
        .properties(width=width, height=height)
    )

    # Configure text
    text = heatmap.mark_text(baseline="middle").encode(
        text=color,
        color=alt.condition(color_condition, alt.value("black"), alt.value("white")),
    )

    return heatmap + text


def graph(variable, title, color_condition):
    graph = (
        alt.vconcat(
            heatmap(
                quarterly_users_df[
                    quarterly_users_df["user_split"] == "Reduction Users"
                ],
                variable,
                "quarter_diff:O",
                "cohort_start:O",
                color_condition,
                800,
                600,
                ["cohort_start", "quarter_diff", variable],
            ).properties(title=title + " - Reduction Users"),
            heatmap(
                quarterly_users_df[
                    quarterly_users_df["user_split"] == "Baseline Users"
                ],
                variable,
                "quarter_diff:O",
                "cohort_start:O",
                color_condition,
                800,
                600,
                ["cohort_start", "quarter_diff", variable],
            ).properties(title=title + " - Baseline Users"),
            heatmap(
                quarterly_users_df[
                    quarterly_users_df["user_split"] == "Baseline minus reduction"
                ],
                variable,
                "quarter_diff:O",
                "cohort_start:O",
                color_condition,
                800,
                600,
                ["cohort_start", "quarter_diff", variable],
            ).properties(title=title + " - Baseline minus reduction"),
        )
        .configure_axis(
            labelFontSize=12,
            titleFontSize=14,
            labelAngle=30,
            labelColor="#666666",
            titleColor="#266678",
            grid=False,
        )
        .configure_title(fontSize=15)
    )
    return graph


# Functions to fetch values
def two_col_cell(
    df, filter_column1, filter_row1, filter_column2, filter_row2, output_column
):
    filtered_df = df[
        (df[filter_column1] == filter_row1) & (df[filter_column2] == filter_row2)
    ]
    string = filtered_df.iloc[0][output_column].astype(str)

    return string


def one_col_cell(df, filter_column, filter_row, output_column):
    filtered_df = df[df[filter_column] == filter_row]
    string = filtered_df.iloc[0][output_column].astype(str)

    return string

In [6]:
cohorts_df = df_from_sql(
    "redshiftreader",
    "research/product/bank_products/20211209_overdraft_risk_reduction_measures/risk_reduction_cohorts_query.sql",
)

In [7]:
od_users_df = df_from_sql(
    "redshiftreader",
    "research/product/bank_products/20211209_overdraft_risk_reduction_measures/od_users_2021_query.sql",
)

In [8]:
reduced_users_df = df_from_sql(
    "redshiftreader",
    "research/product/bank_products/20211209_overdraft_risk_reduction_measures/reduced_users_and_volume_query.sql",
)

<a id='section1'></a>
# How much did we reduce in 2021?

Between the start of the reduction process in June 2021 up until the end of the year, roughly 11.7K users had their overdraft reduced or cancelled (58% of these were reduced). For these, the total limit decrease was 14.8M€ (61% of this was for reductions).

In [9]:
reduced_query = """
with totals as (
select
month, 
case when limit_eur = 0 then 'Cancellation' else 'Reduction' end as split,
round(sum(reduced_volume)::numeric/1000000, 2) as total_reduced_volume_m_euro,
round(sum(reduced_volume)::numeric/count(*), 2) as avg_reduction_per_user,
count(*) as n_users
from reduced_users_df
group by 1, 2
)
select 
*,
round(sum(total_reduced_volume_m_euro) over(partition by split order by month rows unbounded preceding), 2) as cumulative_sum_m_euro,
(sum(n_users) over(partition by split order by month rows unbounded preceding))::int as cumulative_users
from totals
order by 2, 1
"""

con.register("reduced_users_df", reduced_users_df)
reduced_df = con.execute(reduced_query).fetchdf()

In [10]:
user_chart = af.line_multi(
    reduced_df, "split", "month", "cumulative_users", 350, 400, "x"
)
volume_chart = af.line_multi(
    reduced_df, "split", "month", "cumulative_sum_m_euro", 350, 400, "x"
)

user_chart.properties(
    title="Number of users with an Overdraft Cancellation vs. Overdraft Reduction"
) | volume_chart.properties(title="Cumulative reduced volume (M€)")

In [11]:
reduced_df

Unnamed: 0,month,split,total_reduced_volume_m_euro,avg_reduction_per_user,n_users,cumulative_sum_m_euro,cumulative_users
0,2021-06,Cancellation,4.0,649.18,6166,4.0,6166
1,2021-08,Cancellation,1.08,2897.72,372,5.08,6538
2,2021-10,Cancellation,0.31,2809.63,109,5.39,6647
3,2021-12,Cancellation,0.44,2802.23,157,5.83,6804
4,2021-06,Reduction,3.37,1368.99,2464,3.37,2464
5,2021-08,Reduction,2.44,2292.74,1063,5.81,3527
6,2021-10,Reduction,2.23,2486.19,896,8.04,4423
7,2021-12,Reduction,0.93,2093.17,443,8.97,4866


<a id='section2'></a>
# How do reduced users differ from baseline users?

In order to answer this question, we will start by looking into all overdraft enabled users we had at the end of December 2020, split them between "in reduction list" and "not in reduction list", and see how the behaviour of these 2 groups evolves over 2021. 

<a id='section2.1'></a>
## Users in Dec 2020 comparison

<a id='section2.1.1'></a>
### How many users had OD enabled in December last year/ were in the reduction list? 

Out of the 116.6K users that had overdraft enabled in December 2020 or existed in the reduction list, 1% was in the reduction list and was not enabled in December (i.e. only had overdraft enabled in 2021). Given this small percentage, we are excluding these users moving forward to guarantee a fair comparison of all users that had overdraft enabled at the end of 2020.

In [12]:
groups_query = """
with totals as (
select 
case when od_enabled_in_dec and reduced_or_cancelled then 'Enabled in Dec and in reduction list'
when od_enabled_in_dec then 'Enabled in Dec and NOT in reduction list'
else 'NOT Enabled in Dec and in reduction list' end as group,
count(distinct user_id)  as n_users
from od_users_df
group by 1
)
select *, 
round(n_users::numeric/sum(n_users) over(), 2)*100 as perc_users
from totals
"""

con.register("od_users_df", od_users_df)
con.execute(groups_query).fetchdf()

Unnamed: 0,group,n_users,perc_users
0,Enabled in Dec and NOT in reduction list,116592,90.0
1,Enabled in Dec and in reduction list,11050,9.0
2,NOT Enabled in Dec and in reduction list,1390,1.0


In [13]:
users_query = """
select 
*
from od_users_df
where od_enabled_in_dec
"""

users_df = con.execute(users_query).fetchdf()

<a id='section2.1.2'></a>
### How many users have used OD in 2021?

After filtering for users with overdraft enabled in 2021, we can see that 8.7% of these were in the reduction list. Also, out of all users in the reduction list, only 39.4% used overdraft (went into a negative balance) at least 1 day in 2021, whereas this percentage increases to 65% of the users that weren't in the reduction list.

In [14]:
no_usage_query = """
with totals as (
select
user_id,
case when reduced_or_cancelled then 'reduced' else 'not reduced' end as label,
sum(n_days_using) as total_days_using_od
from users_df
where month >= '2021-01-01'
group by 1, 2
)
select
'all users' as od_usage_split,
' ' || label as label,
count(*) as n_users,
round(count(*)::numeric/ sum(count(*)) over(), 3)*100 as perc_users
from totals
group by 1, 2
union all
select
case when total_days_using_od = 0 then 'never used od' else 'used od at least 1 day' end as od_usage_split,
label,
count(*) as n_users,
round(count(*)::numeric/ sum(count(*)) over(partition by label), 3)*100 as perc_users
from totals
group by 1, 2
order by 2, 1
"""

no_usage_df = con.execute(no_usage_query).fetchdf()
no_usage_df

Unnamed: 0,od_usage_split,label,n_users,perc_users
0,all users,not reduced,115924,91.3
1,all users,reduced,11050,8.7
2,never used od,not reduced,40542,35.0
3,used od at least 1 day,not reduced,75382,65.0
4,never used od,reduced,6694,60.6
5,used od at least 1 day,reduced,4356,39.4


<a id='section2.1.3'></a>
### How has usage changed for users who were reduced and the ones who didn't?

Here we look at the difference between users who were in the reduced vs. the baseline, and check if there are any differences between these over the year of 2021 across the following variables:
- **Average outstanding balance:** Below we can see a relatively stable average throughout 2021 for the reduced users (starting from 955€ in the end of 2020, to 1093€ in the end of 2021), whereas baseline users started with a substantially higher average and also registered a steep increase throughout 2021 (from 1464€ to 2115€).


- **Average limit:** Here we can see a similar increasing pattern for baseline users, reaching an average of 4258€ at the end of 2021. Reduced users started the year with a lower limit (1098€ less than their counterparts) which is expected since they are riskier in terms of credit scoring. We can also see the result of our measures as a sharp decrease of their limits between June and December 2021, hitting an average of 1188€ by the end of the year.


- **Average percentage of overdraft usage:** For baseline users, we can see a stable figure of the percentage of overdraft usage (outstanding balance divided by the overdraft limit), ending 2021 with an average usage of 54%. As for the reduced users, since they had a stable average balance and saw their limit being reduced from June onwards, we saw a spike of the percentage of usage from June 2021 onwards, from about 43% in May to 69% in December.


- **Average number of incoming/ outgoing transactions:** As for number of transactions, both groups seem to have similar patterns and no irregular changes in 2021, meaning that the limit reductions don't seem to have impacted this metric.


- **Average volume of incoming/ outgoing transactions:** Same for volume of transactions, although reduced users generally have lower transaction volumes throughout 2021, there seems to be no visible decrease in these volumes over the year. Reduced users actually peaked in December 2021 in the average volume of both incoming and outgoing transactions (with 2142€ and 1942€ respectively).

In [15]:
group_comparison_query = """
with all_users as (
select 
case when reduced_or_cancelled then 'reduced' else 'baseline' end as label,
count(distinct user_id) as all_users 
from users_df
group by 1
),
totals as (
select
user_id,
sum(n_days_using) as total_days_using_od
from users_df
where month >= '2021-01-01'
group by 1
)
select 
'reduced' as label,
month,
all_users,
count(distinct user_id),
count(distinct user_id)::numeric/min(all_users) as perc_remaining_users,
count(distinct case when od_enabled_flag then user_id end)::numeric/min(all_users) as perc_enabled_users,
count(distinct case when outstanding_balance_eur is not null then user_id end)::numeric/min(all_users) as perc_using_users,
round(avg(outstanding_balance_eur), 2) as avg_outstanding_balance_eur,
round(avg(max_amount_eur), 2) as avg_limit,
round(avg(perc_usage)*100, 2) as avg_perc_usage,
round(avg(n_ext_total_out), 2) as n_ext_total_out,
round(avg(n_ext_total_in), 2) as n_ext_total_in,
round(avg(total_volume_eur_out), 2) as total_volume_eur_out,  
round(avg(total_volume_eur_in), 2) as total_volume_eur_in
from users_df
inner join totals using (user_id)
inner join all_users on 1=1
where reduced_or_cancelled
and label = 'reduced'
and total_days_using_od > 0
group by 1, 2, 3
union all 
select 
'baseline' as label,
month,
all_users,
count(distinct user_id),
count(distinct user_id)::numeric/min(all_users) as perc_remaining_users,
count(distinct case when od_enabled_flag then user_id end)::numeric/min(all_users) as perc_enabled_users,
count(distinct case when outstanding_balance_eur is not null then user_id end)::numeric/min(all_users) as perc_using_users,
round(avg(outstanding_balance_eur), 2) as avg_outstanding_balance_eur,
round(avg(max_amount_eur), 2) as avg_limit,
round(avg(perc_usage)*100, 2) as avg_perc_usage,
round(avg(n_ext_total_out), 2) as n_ext_total_out,
round(avg(n_ext_total_in), 2) as n_ext_total_in,
round(avg(total_volume_eur_out), 2) as total_volume_eur_out,  
round(avg(total_volume_eur_in), 2) as total_volume_eur_in
from users_df
inner join totals using (user_id)
inner join all_users on 1=1
where not reduced_or_cancelled
and label = 'baseline'
and total_days_using_od > 0
group by 1, 2, 3
order by 1, 2, 3
"""

con.register("users_df", users_df)
group_comparison_df = con.execute(group_comparison_query).fetchdf()

In [16]:
balance_chart = af.line_multi(
    group_comparison_df, "label", "month", "avg_outstanding_balance_eur", 230, 400, "x"
)
limit_chart = af.line_multi(
    group_comparison_df, "label", "month", "avg_limit", 230, 400, "x"
)
perc_usage_chart = af.line_multi(
    group_comparison_df, "label", "month", "avg_perc_usage", 230, 400, "x"
)

balance_chart.properties(
    title="Avg. Oustanding Balance Split"
) | limit_chart.properties(title="Avg. Limit Split") | perc_usage_chart.properties(
    title="Avg. Percentage of Overdraft Usage Split"
)

In [17]:
out_txns_chart = af.line_multi(
    group_comparison_df, "label", "month", "n_ext_total_out", 330, 400, "x"
)
in_txns_chart = af.line_multi(
    group_comparison_df, "label", "month", "n_ext_total_in", 330, 400, "x"
)

out_txns_chart.properties(
    title="Avg. number of outgoing transactions per user"
) | in_txns_chart.properties(title="Avg. number of incoming transactions per user")

In [18]:
out_volume_chart = af.line_multi(
    group_comparison_df, "label", "month", "total_volume_eur_out", 330, 400, "x"
)
in_volume_chart = af.line_multi(
    group_comparison_df, "label", "month", "total_volume_eur_in", 330, 400, "x"
)

out_volume_chart.properties(
    title="Avg. volume of outgoing transactions per user"
) | in_volume_chart.properties(title="Avg. volume of incoming transactions per user")

<a id='section2.2'></a>
## Cohort Analysis

<a id='section2.2.1'></a>
### How many users do we have per cohort? 

Below we can see the split of baseline users vs. reduction users based on the first quarter they had overdraft enabled.

In [19]:
users_per_cohort_query = """
select 
enabled_quarter,
case when reduced_or_cancelled then 'Reduction Users' else 'Baseline Users' end as user_split,
count(distinct user_id) as n_cohort_users
from cohorts_df 
where enabled_quarter is not null
group by 1, 2
"""

con.register("cohorts_df", cohorts_df)
users_per_cohort_df = con.execute(users_per_cohort_query).fetchdf()

In [20]:
af.column_multi(
    users_per_cohort_df,
    "user_split",
    "enabled_quarter",
    "n_cohort_users",
    "user_split",
    400,
    400,
    "x",
).properties(title="Baseline vs. Reduction Users per Enabled Cohort Quarter")

<a id='section2.2.2'></a>
### Is there a correlation between being reduced or cancelled and our other variables?

When looking into the Pearson correlations between having a limit reduction and the other variables considered for this deep dive, we can see a strong correlation with the average probability of default (*avg_pd*), which is expected since these were the users selected for this reduction in the first place. We can also see some moderate negative correlations with the average limit and outstanding balance (*avg_limit_eur* and *avg_balance_eur* respectively), which is once again expected as seen in the previous sections.

As for the remaining variables - percentage of overdraft enabled users (*perc_users*), percentage of users in arrears (*perc_arrears_users*), percentage and count of write-off users (*perc_write_off_users* and *n_write_off_users* respectively), and transaction related metrics (*avg_outgoing_txns*, *avg_incoming_txns*, *avg_volume_out_eur*, and *avg_volume_in_eur*), we only verify weak correlations. This seems to indicate that the measures applied in this limit reduction don't relate to user behaviour in those areas. 

In [21]:
summary_users_query = """
with cohort_start_users as (
select enabled_month,
reduced_or_cancelled,
count(distinct user_id) as n_cohort_users
from cohorts_df
group by 1, 2
)
select 
reduced_or_cancelled,
enabled_month,
n_cohort_users,
enabled_month || ' | ' || n_cohort_users as cohort_start,
month_diff::int as month_diff, 
round(count(*)::numeric/ min(n_cohort_users), 2)*100 as perc_users,
round(count(case when outstanding_balance_eur > max_amount_eur then 1 end)::numeric/ count(*), 2)*100 as perc_arrears_users,
round(avg(outstanding_balance_eur)) as avg_balance_eur, 
round(avg(max_amount_eur)) as avg_limit_eur,
round(avg(outstanding_balance_eur::numeric/ max_amount_eur::numeric), 2)*100 as avg_perc_usage,
round(avg(pd), 3) as avg_pd,
count(case when eur_written_off is not null then 1 end)::numeric as n_write_off_users,
round(count(case when eur_written_off is not null then 1 end)::numeric/ count(*), 4)*100 as perc_write_off_users,
round(avg(n_ext_total_out), 2) as avg_outgoing_txns,
round(avg(n_ext_total_in), 2) as avg_incoming_txns,
round(avg(total_volume_eur_out)) as avg_volume_out_eur,
round(avg(total_volume_eur_in)) as avg_volume_in_eur
from cohorts_df
inner join cohort_start_users using (enabled_month, reduced_or_cancelled)
group by 1, 2, 3, 4, 5

"""

summary_users_df = con.execute(summary_users_query).fetchdf()

In [22]:
# Run correlation (pandas uses Pearson as standard)
corr_df = round(summary_users_df.corr(), 3).reset_index()

In [23]:
corr_df = corr_df.iloc[:, 0:2].rename({"index": "variable"}, axis="columns")

In [24]:
# exclude irrelevant variables
corr_df = corr_df[
    ~corr_df["variable"].isin(["reduced_or_cancelled", "month_diff", "n_cohort_users"])
]

In [25]:
af.column_single(
    corr_df, af.teal, "variable", "reduced_or_cancelled", 800, 600, "-y"
).properties(
    title="Pearson Correlation between being reduced/ cancelled and all other variables"
)

Below we will be looking into the quarterly cohorts for our variables. For each of those, you can find 3 charts:
1. Baseline users cohorts
2. Reduction users cohorts
3. Baseline user values minus Reduction user values

<a id='section2.2.3'></a>
### Overdraft Enabled users cohorts

Here we can see that even though the percentage of baseline users who kept their overdraft enabled has slowly decreased over time, for reduction users this percentage was quite stable around 100% over time up until Q3 2021, when we started with the overdraft cancellations. These cancellations seem to be more visible on older cohorts, more recent cohorts tend to have similar percentages as baseline users in Q3 and Q4 2021.

In [26]:
quarterly_users_query = """
with cohort_start_users as (
select enabled_quarter,
reduced_or_cancelled,
count(distinct user_id) as n_cohort_users
from cohorts_df
group by 1, 2
), 
totals as (
select 
case when reduced_or_cancelled then 'Reduction Users' else 'Baseline Users' end as user_split,
enabled_quarter,
n_cohort_users,
enabled_quarter || ' | ' || n_cohort_users as cohort_start,
quarter_diff::int as quarter_diff, 
round(count(*)::numeric/ min(n_cohort_users), 2)*100 as perc_users,
round(count(case when outstanding_balance_eur > max_amount_eur then 1 end)::numeric/ count(*), 2)*100 as perc_arrears_users,
round(avg(outstanding_balance_eur)) as avg_balance_eur, 
round(avg(max_amount_eur)) as avg_limit_eur,
round(avg(outstanding_balance_eur::numeric/ max_amount_eur::numeric), 2)*100 as avg_perc_usage,
round(avg(pd), 3) as avg_pd,
count(case when eur_written_off is not null then 1 end)::numeric as n_write_off_users,
round(count(case when eur_written_off is not null then 1 end)::numeric/ count(*), 4)*100 as perc_write_off_users,
round(avg(n_ext_total_out), 2) as avg_outgoing_txns,
round(avg(n_ext_total_in), 2) as avg_incoming_txns,
round(avg(total_volume_eur_out)) as avg_volume_out_eur,
round(avg(total_volume_eur_in)) as avg_volume_in_eur
from cohorts_df
inner join cohort_start_users using (enabled_quarter, reduced_or_cancelled)
where month = quarter_date
group by 1, 2, 3, 4, 5
)
select 
user_split,
cohort_start,
quarter_diff,
perc_users,
perc_arrears_users,
avg_balance_eur, 
avg_limit_eur,
avg_perc_usage,
avg_pd,
n_write_off_users,
perc_write_off_users,
avg_outgoing_txns,
avg_incoming_txns,
avg_volume_out_eur,
avg_volume_in_eur
from totals 
union all 
select 
'Baseline minus reduction',
enabled_quarter,
quarter_diff,
min(case when user_split = 'Baseline Users' then perc_users end) - min(case when user_split = 'Reduction Users' then perc_users end),
min(case when user_split = 'Baseline Users' then perc_arrears_users end) - min(case when user_split = 'Reduction Users' then perc_arrears_users end),
min(case when user_split = 'Baseline Users' then avg_balance_eur end) - min(case when user_split = 'Reduction Users' then avg_balance_eur end),
min(case when user_split = 'Baseline Users' then avg_limit_eur end) - min(case when user_split = 'Reduction Users' then avg_limit_eur end),
min(case when user_split = 'Baseline Users' then avg_perc_usage end) - min(case when user_split = 'Reduction Users' then avg_perc_usage end),
min(case when user_split = 'Baseline Users' then avg_pd end) - min(case when user_split = 'Reduction Users' then avg_pd end),
min(case when user_split = 'Baseline Users' then n_write_off_users end) - min(case when user_split = 'Reduction Users' then n_write_off_users end),
min(case when user_split = 'Baseline Users' then perc_write_off_users end) - min(case when user_split = 'Reduction Users' then perc_write_off_users end),
min(case when user_split = 'Baseline Users' then avg_outgoing_txns end) - min(case when user_split = 'Reduction Users' then avg_outgoing_txns end),
min(case when user_split = 'Baseline Users' then avg_incoming_txns end) - min(case when user_split = 'Reduction Users' then avg_incoming_txns end),
min(case when user_split = 'Baseline Users' then avg_volume_out_eur end) - min(case when user_split = 'Reduction Users' then avg_volume_out_eur end),
min(case when user_split = 'Baseline Users' then avg_volume_in_eur end) - min(case when user_split = 'Reduction Users' then avg_volume_in_eur end)
from totals 
group by 1, 2, 3
"""

quarterly_users_df = con.execute(quarterly_users_query).fetchdf()

In [27]:
graph("perc_users", "Percentage of Overdraft Enabled Users", alt.datum.perc_users < 40)

<a id='section2.2.4'></a>
### Overdraft Arrears Users cohorts
As for arrears users, we tend to see a similar pattern as the one above. Since these are not yet materialising into write-offs as we will see below and these users can still bounce back to be within their overdraft limit, these rates aren't necessarily worrying, but we should look into these rates over a longer period of time to check if these users| do leave arrears through top-ups.

In [28]:
graph(
    "perc_arrears_users",
    "Percentage of Overdraft Arrears Users",
    alt.datum.perc_arrears_users < 1,
)

<a id='section2.2.5'></a>
### Overdraft Cumulative Written Off User Percentage cohorts
As we saw above, since nearly all reduction users kept their overdraft enabled up until the limit reductions started, it is expected that their write-offs would only start from that point onwards. Still, in H2 2021, only a small percentage of users in the reduction group was written-off (some of the percentages may look bigger than the baseline group, but these differences may come from the size of the groups as we will see in the next section).

In [29]:
graph(
    "perc_write_off_users",
    "Percentage of Overdraft Written-off Users",
    alt.datum.perc_write_off_users < 1,
)

<a id='section2.2.6'></a>
### Overdraft Cumulative Written Off User Count cohorts

When looking into the actual number of users written-off, we can see that only a very small number of reduced users was written-off in H2 2021. Once again, it looks like the limit reduction didn't impact this metric, but similar to arrears rates, we should keep a look into this one over a longer period of time.

In [30]:
graph(
    "n_write_off_users",
    "Count of Overdraft Written-off Users",
    alt.datum.n_write_off_users < 40,
)

<a id='section2.2.7'></a>
### Avg outstanding balance cohorts

As expected, the balances of baseline users are generally higher than the reduced ones. Another interesting trend we can see in this cohort is the generally increase of outstanding balances in both groups for the 2021 cohorts.

In [31]:
graph(
    "avg_balance_eur",
    "Average outstanding balance (€)",
    alt.datum.avg_balance_eur < 1500,
)

<a id='section2.2.8'></a>
### Avg limit cohorts

This cohort limit trend is very similar to the balances seen above, the main difference is that for the reduced users we can see the sharp decrease of limit from Q2 2021 onwards.

In [32]:
graph("avg_limit_eur", "Average Overdraft Limit (€)", alt.datum.avg_limit_eur < 3000)

<a id='section2.2.9'></a>
### Avg percentage usage cohorts

And once again, if balances stay about the same and the overdraft limits decrease, we can see an increase of the percentage of overdraft usage for the reduction users in the last 3 quarters of 2021.

In [33]:
graph(
    "avg_perc_usage",
    "Average Percentage of Overdraft Usage",
    alt.datum.avg_perc_usage < 25,
)

<a id='section2.2.10'></a>
### Transaction Behaviour 
Below you can see the evolution of the transaction behaviour per group and cohort, we didn't find any relevant patterns for these (which seems to indicate that there were no relevant changes on these after the application of the limit reductions).

### Avg incoming txn count cohorts

In [34]:
graph(
    "avg_incoming_txns",
    "Average incoming transaction count",
    alt.datum.avg_incoming_txns < 2,
)

### Avg outgoing txn count cohorts

In [35]:
graph(
    "avg_outgoing_txns",
    "Average outgoing transaction count",
    alt.datum.avg_outgoing_txns < 20,
)

### Avg incoming txn volume cohorts

In [36]:
graph(
    "avg_volume_in_eur",
    "Average incoming transaction volume (€)",
    alt.datum.avg_volume_in_eur < 2500,
)

### Avg outgoing txn volume cohorts

In [37]:
graph(
    "avg_volume_out_eur",
    "Average outgoing transaction volume (€)",
    alt.datum.avg_volume_out_eur < 2500,
)