
title: How desirable is it to use both overdraft and installments at the same time? 
author: Helder Silva 
date: 2021-10-07 
region: EU 
tags: N26 Installments, TBIL, overdraft, bank products, credit, balance 
summary: When comparing the balances for all installment loans and overdraft users, we can find three main trends: 1. The <= 10% usage bucket deceased substantially from the day before the loan to the day of the loan and 30 days after from 22% to 15% and 11% respectively; 2. This decrease was accompanied by a substantial increase of users with a non-negative balance, by day 30 we had 11 % of users in this bucket; 3. As expected, the percentage of users using more than 90% of their overdraft limit decreased from 20% on the day before the loan to 16% on the day of the loan. However, we see this trend reversed 30 days after the loan, by peaking at 25%. Since currently the overdraft product has a fixed interest rate of 8.9% of APR (Annual Percentage Rate), and the installment interest rates can range between 7.24% of APR and 11.38% of APR, we will be splitting these users into 2 groups: 1. Users who have an installment interest below their overdraft interest: these tend to use less of their overdraft, only 17% of the users are using more than 90% of their overdraft limit after 30 days. 2. Users who have an installment interest above their overdraft interest: these tend to be on the higher buckets of the overdraft usage with 36% of the users using more than 90% of their overdraft limit after 30 days. Also, the majority of users in the installments and overdraft users group (61.1%) is within the more beneficial group of installment interest below overdraft.

<div class="alert alert-block alert-success">
    <H1>How desirable is it to use both overdraft and installments at the same time?</H1>
</div>

In order to answer this question, we looked into users that took an installment loan and had an enabled overdraft with a negative balance the day before. There we compared these users with the remaining users who took an installment loan, and also checked how their balance evolved on the day of the loan and 30 days after the loan. As for the timeframe, we considered loans disbursed between July 1st and August 31st, 2021.

Here are the questions we looked into:
- [How many users have both an installment loan and an arranged overdraft with a negative balance?](#section1)
- [How do users who have both an installment loan and a negative balance differ from the remaining installment loan users?](#section2)
 - [Number of loans](#section2.1)
 - [Interest rate](#section2.2)
 - [Loan account state](#section2.3)
 - [MCC Categories](#section2.4)
- [What is the evolution of users' balance 30 days after the loan?](#section3)
 - [How many users do we have in each loan bucket over time?](#section3.1)
 - [How do users move in between buckets over time?](#section3.2)
 - [How do users in each balance bucket before the loan move in between buckets over time?](#section3.3)
 - [Bucket to bucket comparison - to which bucket do users move over time?](#section3.4)
 
And here are our main findings:
- When comparing the balances for all installment loans and overdraft users, we can find three main trends:
 1. The <= 10% usage bucket deceased substantially from the day before the loan to the day of the loan and 30 days after from 22% to 15% and 11% respectively;
 2. This decrease was accompanied by a substantial increase of users with a non-negative balance, by day 30 we had 11 % of users in this bucket;
 3. As expected, the percentage of users using more than 90% of their overdraft limit decreased from 20% on the day before the loan to 16% on the day of the loan. However, we see this trend reversed 30 days after the loan, by peaking at 25%.
- Since currently the overdraft product has a fixed interest rate of 8.9% of APR (Annual Percentage Rate), and the installment interest rates can range between 7.24% of APR and 11.38% of APR, we will be splitting these users into 2 groups:
 1. **Users who have an installment interest below their overdraft interest:** these tend to use less of their overdraft, only 17% of the users are using more than 90% of their overdraft limit after 30 days.
 2. **Users who have an installment interest above their overdraft interest:** these tend to be on the higher buckets of the overdraft usage with 36% of the users using more than 90% of their overdraft limit after 30 days.
- Also, the majority of users in the installments and overdraft users group (61.1%) is within the more beneficial group of installment interest below overdraft.

In [1]:
%%capture

!pip install jupyter_contrib_nbextensions
!jupyter contrib nbextension install --user
!jupyter nbextension enable spellchecker/main
!pip install duckdb
!pip install altair

In [2]:
%%capture
cd /app/

In [3]:
import pandas as pd
from utils.datalib_database import df_from_sql
import utils.altair_functions as af
import duckdb
import altair as alt
from IPython.display import display_html, Markdown as md

con = duckdb.connect(database=":memory:", read_only=False)

In [4]:
# Chart functions


def heatmap(df, color, x, y, color_condition, width, height, tooltip):
    heatmap = (
        alt.Chart(df)
        .mark_rect()
        .encode(alt.X(x), alt.Y(y), color=color, tooltip=tooltip)
        .properties(width=width, height=height)
    )

    # Configure text
    text = heatmap.mark_text(baseline="middle").encode(
        text=color,
        color=alt.condition(color_condition, alt.value("black"), alt.value("white")),
    )

    return heatmap + text


def graph(heatmap1, title1, heatmap2, title2):
    graph = (
        alt.hconcat(
            heatmap1.properties(title=title1) | heatmap2.properties(title=title2)
        )
        .configure_axis(
            labelFontSize=12,
            titleFontSize=14,
            labelAngle=30,
            labelColor="#666666",
            titleColor="#266678",
            grid=False,
        )
        .configure_title(fontSize=15)
    )
    return graph


# Functions to fetch values
def two_col_cell(
    df, filter_column1, filter_row1, filter_column2, filter_row2, output_column
):
    filtered_df = df[
        (df[filter_column1] == filter_row1) & (df[filter_column2] == filter_row2)
    ]
    string = filtered_df.iloc[0][output_column].astype(str)

    return string


def one_col_cell(df, filter_column, filter_row, output_column):
    filtered_df = df[df[filter_column] == filter_row]
    string = filtered_df.iloc[0][output_column].astype(str)

    return string

In [5]:
tbil_od_df = df_from_sql(
    "redshiftreader",
    "research/product/bank_products/20210921_overdraft_and_installment_users/tbil_and_od_users.sql",
)

In [7]:
baseline_df = df_from_sql(
    "redshiftreader",
    "research/product/bank_products/20210921_overdraft_and_installment_users/loan_baseline.sql",
)

<a id='section1'></a>
# How many users have both an installment loan and an arranged overdraft with a negative balance?

In order to answer this and the next few questions, we will compare users that have both installments and overdraft (tbil and od users) with the remaining instalment loan users in the selected timeframe (baseline users).

In [8]:
## Add user counts of baseline and tbil od users here
n_users_query = """
with users as (
select 
distinct user_id
from tbil_od_df t 
), 
totals as (
select 
case when u.user_id is not null then 'tbil and od users'
else 'baseline users' end as split,
count(distinct user_created) as n_users
from baseline_df b
left join users u using (user_id)
group by 1
)
select *,
round(sum(n_users)::numeric/ (sum(n_users) over())::numeric, 3)*100 as perc_users
from totals
group by 1, 2 
"""

n_users_df = con.execute(n_users_query).fetchdf()
n_users_df

Unnamed: 0,split,n_users,perc_users
0,tbil and od users,1760,38.2
1,baseline users,2849,61.8


In [9]:
n_users = one_col_cell(n_users_df, "split", "tbil and od users", "n_users")
perc_users = one_col_cell(n_users_df, "split", "tbil and od users", "perc_users")
md(
    f"In the selected timeframe (between July 1st and August 31st, 2021), we had {n_users} users\
    that were using their overdraft on the day before the got their loan,\
    which corresponds to {perc_users}% of all users."
)

In the selected timeframe (between July 1st and August 31st, 2021), we had 1760 users    that were using their overdraft on the day before the got their loan,    which corresponds to 38.2% of all users.

In [10]:
af.bar_single_label(n_users_df, af.teal, "n_users", "split", 800, 150, "x")

<a id='section2'></a>
# How do users who have both an installment loan and a negative balance differ from the remaining installment loan users?

<a id='section2.1'></a>
## Number of loans

In [11]:
n_loans_query = """
with users as (
select 
distinct user_id
from tbil_od_df t 
),
totals as (
select 
case when u.user_id is not null then 'tbil and od users'
else 'baseline users' end as split,
user_id, 
n_loans
from baseline_df b
left join users u using (user_id)
group by 1, 2, 3
)
select 
split,
n_loans,
count(*) as n_users, 
round(count(*)::numeric/ sum(count(*)) over(partition by split)::numeric, 3)*100 as perc_users
from totals
group by 1, 2
order by 1, 2
"""

n_loans_df = con.execute(n_loans_query).fetchdf()
n_loans_df.pivot(index="n_loans", columns="split", values=["n_users"])

Unnamed: 0_level_0,n_users,n_users
split,baseline users,tbil and od users
n_loans,Unnamed: 1_level_2,Unnamed: 2_level_2
1,1879.0,776.0
2,514.0,388.0
3,206.0,219.0
4,119.0,155.0
5,56.0,97.0
6,42.0,55.0
7,19.0,40.0
8,7.0,16.0
9,4.0,7.0
10,1.0,6.0


In [12]:
perc_baseline = two_col_cell(
    n_loans_df, "split", "baseline users", "n_loans", 1, "perc_users"
)
perc_tbil_od = two_col_cell(
    n_loans_df, "split", "tbil and od users", "n_loans", 1, "perc_users"
)
md(
    f"In terms of number of transactions, we can see that we have a bigger portion of baseline users\
    with one installment loan ({perc_baseline}% of all baseline users) whereas only {perc_tbil_od}%\
    of the installment and overdraft users have 1 loan only. In order to make these groups comparable moving forward\
    we are looking into the first loan of tbil and od users, and baseline users who only have 1 loan."
)

In terms of number of transactions, we can see that we have a bigger portion of baseline users    with one installment loan (66.0% of all baseline users) whereas only 44.1%    of the installment and overdraft users have 1 loan only. In order to make these groups comparable moving forward    we are looking into the first loan of tbil and od users, and baseline users who only have 1 loan.

In [13]:
af.column_multi(n_loans_df, "split", "split", "perc_users", "n_loans", 50, 400, "x")

<a id='section2.3'></a>
## Loan account state

In [14]:
account_state_query = """
with totals as (
select 
account_state,
count(distinct b.user_created) - count(distinct t.user_id) as n_baseline_users,
count(distinct t.user_id) as n_tbil_users
from baseline_df b
left join tbil_od_df t 
on t.nh_id = b.nh_id
where t.user_id is not null or n_loans =1
group by 1
)
select 
'tbil and od users' as label,
account_state,
n_tbil_users as n_users,
round(sum(n_tbil_users)::numeric/(sum(n_tbil_users) over())::numeric, 3)*100 as perc_users
from totals
group by 1, 2, 3
union all
select 
'baseline users' as label,
account_state,
n_baseline_users as n_users,
round(sum(n_baseline_users)::numeric/(sum(n_baseline_users) over())::numeric, 3)*100 as perc_baseline_users
from totals
group by 1, 2, 3
order by 1
"""
account_state_df = con.execute(account_state_query).fetchdf()
account_state_df

Unnamed: 0,label,account_state,n_users,perc_users
0,baseline users,ACTIVE,719,38.3
1,baseline users,CLOSED,1141,60.7
2,baseline users,ACTIVE_IN_ARREARS,19,1.0
3,tbil and od users,ACTIVE,825,46.9
4,tbil and od users,CLOSED,853,48.5
5,tbil and od users,ACTIVE_IN_ARREARS,82,4.7


When comparing installment loan account states, we can't find any major differences between the baseline and the installment and overdraft users.

In [15]:
af.column_multi(
    account_state_df, "label", "label", "perc_users", "account_state", 100, 400, "x"
)

<a id='section2.4'></a>
## MCC Categories

In [16]:
mcc_cat_query = """
with totals as (
select 
mcc_category,
count(distinct b.user_created) - count(distinct t.user_id) as n_baseline_users,
count(distinct t.user_id) as n_tbil_users
from baseline_df b
left join tbil_od_df t 
on t.nh_id = b.nh_id
where t.user_id is not null or n_loans =1
group by 1
), 
tbil as (
select 
'tbil and od users' as label,
mcc_category,
n_tbil_users as n_users,
round(sum(n_tbil_users)::numeric/(sum(n_tbil_users) over())::numeric, 3)*100 as perc_users
from totals
group by 1, 2, 3
order by 4 desc 
limit 10
),
baseline as (
select 
'baseline users' as label,
mcc_category,
n_baseline_users as n_users,
round(sum(n_baseline_users)::numeric/(sum(n_baseline_users) over())::numeric, 3)*100 as perc_baseline_users
from totals
group by 1, 2, 3
order by 4 desc
limit 10
)
select * from tbil
union all 
select * from baseline
order by 1, 4 desc
"""
mcc_cat_df = con.execute(mcc_cat_query).fetchdf()
mcc_cat_df.pivot(index="mcc_category", columns="label", values=["n_users"])

Unnamed: 0_level_0,n_users,n_users
label,baseline users,tbil and od users
mcc_category,Unnamed: 1_level_2,Unnamed: 2_level_2
airline,141,89
bookstores,140,141
clothing_depart_store,408,405
computer_electronic_stores,206,161
hotel_lodging,187,158
household_store,178,188
local_transport_railway,75,58
no_cat,125,142
retail_store,133,134
subscriptions,91,82


Same for MCC categories, there aren't any relevant differences between the baseline and the installment and overdraft users.

In [17]:
af.column_multi(
    mcc_cat_df, "label", "label", "perc_users", "mcc_category", 100, 400, "-y"
)

<a id='section2.2'></a>
## Interest rate

In [18]:
apr_query = """
with totals as (
select 
interest_rate,
count(distinct b.user_created) - count(distinct t.user_id) as n_baseline_users,
count(distinct t.user_id) as n_tbil_users
from baseline_df b
left join tbil_od_df t 
on t.nh_id = b.nh_id
where t.user_id is not null or n_loans =1
group by 1
)
select 
'tbil and od users' as label,
interest_rate,
n_tbil_users as n_users,
round(sum(n_tbil_users)::numeric/(sum(n_tbil_users) over())::numeric, 3)*100 as perc_users
from totals
group by 1, 2, 3
union all
select 
'baseline users' as label,
interest_rate,
n_baseline_users as n_users,
round(sum(n_baseline_users)::numeric/(sum(n_baseline_users) over())::numeric, 3)*100 as perc_baseline_users
from totals
group by 1, 2, 3
order by 1, 2
"""
apr_df = con.execute(apr_query).fetchdf()
apr_df

Unnamed: 0,label,interest_rate,n_users,perc_users
0,baseline users,7.24,125,6.7
1,baseline users,7.71,585,31.1
2,baseline users,8.64,507,27.0
3,baseline users,9.56,269,14.3
4,baseline users,10.47,198,10.5
5,baseline users,11.38,195,10.4
6,tbil and od users,7.24,80,4.5
7,tbil and od users,7.71,547,31.1
8,tbil and od users,8.64,433,24.6
9,tbil and od users,9.56,263,14.9


As for interest rates, the percentage of users in each bucket is once again very similar between the 2 groups. 

In [19]:
af.column_multi(apr_df, "label", "label", "perc_users", "interest_rate", 100, 400, "x")

In [20]:
apr_split_query = """
with totals as (
select 
case when interest_rate < 8.9 then 'Interest below od'
else 'Interest above od' end as interest_split,
count(distinct b.user_created) - count(distinct t.user_id) as n_baseline_users,
count(distinct t.user_id) as n_tbil_users
from baseline_df b
left join tbil_od_df t 
on t.nh_id = b.nh_id
where t.user_id is not null or n_loans =1
group by 1
)
select 
'tbil and od users' as label,
interest_split,
n_tbil_users as n_users,
round(sum(n_tbil_users)::numeric/(sum(n_tbil_users) over())::numeric, 2)*100 as perc_users
from totals
group by 1, 2, 3
union all
select 
'baseline users' as label,
interest_split,
n_baseline_users as n_users,
round(sum(n_baseline_users)::numeric/(sum(n_baseline_users) over())::numeric, 3)*100 as perc_baseline_users
from totals
group by 1, 2, 3
order by 1, 2
"""
apr_split_df = con.execute(apr_split_query).fetchdf()

In the next section we will be looking into the balance evolution of the installments and overdraft users over time. Since currently the overdraft product has a fixed interest rate of 8.9% of APR (Annual Percentage Rate), and the installment interest rates can range between 7.24% of APR and 11.38% of APR, we will be splitting these users into:
 - **Users who have an installment interest below their overdraft interest** (for these it would be more beneficial to use installment loans since this credit is cheaper for them)
 - **Users who have an installment interest above their overdraft interest** (for these it might not be so recommendable to use installment loans since this credit is more expensive for them)

In [21]:
interest_above_users = two_col_cell(
    apr_split_df,
    "label",
    "tbil and od users",
    "interest_split",
    "Interest below od",
    "perc_users",
)
md(
    f"In the chart below, we can actually see that the majority of users in the\
    installments and overdraft users group ({interest_above_users}%) is within the more beneficial group\
    of installment interest below overdraft."
)

In the chart below, we can actually see that the majority of users in the    installments and overdraft users group (60.0%) is within the more beneficial group    of installment interest below overdraft.

In [22]:
apr_split_df

Unnamed: 0,label,interest_split,n_users,perc_users
0,baseline users,Interest above od,662,35.2
1,baseline users,Interest below od,1217,64.8
2,tbil and od users,Interest above od,700,40.0
3,tbil and od users,Interest below od,1060,60.0


In [23]:
af.column_multi(
    apr_split_df, "label", "label", "perc_users", "interest_split", 100, 400, "x"
)

<a id='section3'></a>
# What is the evolution of users' balance after the loan?

In order to make this comparison, we looked into the balance users had the day before the loan and see what the balance for those same users looks like on the day of the installment loan and 30 days after the loan. Since we are filtering for users that had a negative balance before taking a loan, it is to be expected that on the day before the loan there are only negative balances for these users. These negative balances are then split into overdraft usage buckets, i.e, what percentage of the maximum overdraft limit is being used.

<a id='section3.1'></a>
## How many users do we have in each loan bucket over time?

In [24]:
usage_query = """
with unions as (
select 
case when interest_rate < 8.9 then 'Interest below od'
else 'Interest above od' end as interest_split,
case when diff = -1 then ' day before loan' 
when diff = 0 then ' day of loan'
when diff = 30 then '30 days after loan'
end as label,
usage_buckets,
count(*) as n_users
from tbil_od_df
inner join baseline_df using (nh_id)
where diff in (-1, 0, 30)
group by 1, 2, 3
union all
select 
'All Users' as interest_split,
case when diff = -1 then ' day before loan' 
when diff = 0 then ' day of loan'
when diff = 30 then '30 days after loan'
end as label,
usage_buckets,
count(*) as n_users
from tbil_od_df
inner join baseline_df using (nh_id)
where diff in (-1, 0, 30)
group by 1, 2, 3
)
select *,
(round(sum(n_users)::numeric/ (sum(n_users) over(partition by interest_split, label))::numeric, 2)*100)::int as perc_users
from unions
group by 1, 2, 3, 4
order by 1, 2, 3
"""
usage_df = con.execute(usage_query).fetchdf()

### All users

In [25]:
day_loan_0 = two_col_cell(
    usage_df[usage_df["interest_split"] == "All Users"],
    "label",
    " day of loan",
    "usage_buckets",
    "   non-negative balance",
    "perc_users",
)
day_thirty_0 = two_col_cell(
    usage_df[usage_df["interest_split"] == "All Users"],
    "label",
    "30 days after loan",
    "usage_buckets",
    "   non-negative balance",
    "perc_users",
)
day_before_10 = two_col_cell(
    usage_df[usage_df["interest_split"] == "All Users"],
    "label",
    " day before loan",
    "usage_buckets",
    "  <=10%",
    "perc_users",
)
day_loan_10 = two_col_cell(
    usage_df[usage_df["interest_split"] == "All Users"],
    "label",
    " day of loan",
    "usage_buckets",
    "  <=10%",
    "perc_users",
)
day_thirty_10 = two_col_cell(
    usage_df[usage_df["interest_split"] == "All Users"],
    "label",
    "30 days after loan",
    "usage_buckets",
    "  <=10%",
    "perc_users",
)
day_before_90 = two_col_cell(
    usage_df[usage_df["interest_split"] == "All Users"],
    "label",
    " day before loan",
    "usage_buckets",
    " > 90%",
    "perc_users",
)
day_loan_90 = two_col_cell(
    usage_df[usage_df["interest_split"] == "All Users"],
    "label",
    " day of loan",
    "usage_buckets",
    " > 90%",
    "perc_users",
)
day_thirty_90 = two_col_cell(
    usage_df[usage_df["interest_split"] == "All Users"],
    "label",
    "30 days after loan",
    "usage_buckets",
    " > 90%",
    "perc_users",
)

md(
    f"When comparing the balances for all installment loans and overdraft users, we can find three main trends:<br>\
    <li> The <= 10% usage bucket decreased substantially from the day before the loan to the day of the loan and\
    30 days after from {day_before_10}% to {day_loan_10}% and {day_thirty_10}% respectively;</li>\
    <li> This decrease was accompanied by a substantial increase of users with a non-negative balance,\
    by day 30 we had {day_thirty_0} % of users in this bucket;</li>\
    <li> As expected, the percentage of users using more than 90% of their overdraft limit decreased from\
    {day_before_90}% on the day before the loan to {day_loan_90}% on the day of the loan. However, we see this\
    trend reversed 30 days after the loan, by peaking at {day_thirty_90}%.</li>"
)

When comparing the balances for all installment loans and overdraft users, we can find three main trends:<br>    <li> The <= 10% usage bucket decreased substantially from the day before the loan to the day of the loan and    30 days after from 22% to 15% and 11% respectively;</li>    <li> This decrease was accompanied by a substantial increase of users with a non-negative balance,    by day 30 we had 11 % of users in this bucket;</li>    <li> As expected, the percentage of users using more than 90% of their overdraft limit decreased from    20% on the day before the loan to 16% on the day of the loan. However, we see this    trend reversed 30 days after the loan, by peaking at 25%.</li>

In [26]:
af.column_multi(
    usage_df[usage_df["interest_split"] == "All Users"],
    "label",
    "usage_buckets",
    "perc_users",
    "label",
    250,
    400,
    "x",
)

### Users with an installment interest below their overdraft interest

In [27]:
day_thirty_90 = two_col_cell(
    usage_df[usage_df["interest_split"] == "Interest below od"],
    "label",
    "30 days after loan",
    "usage_buckets",
    " > 90%",
    "perc_users",
)

md(
    f"By filtering for users with an installment interest below their overdraft limit, we can see that this\
    subset of users tends to have a bigger proportion of users in the lower usage buckets and only \
    {day_thirty_90}% of the users are using more than 90% of their overdraft limit after 30 days."
)

By filtering for users with an installment interest below their overdraft limit, we can see that this    subset of users tends to have a bigger proportion of users in the lower usage buckets and only     17% of the users are using more than 90% of their overdraft limit after 30 days.

In [28]:
af.column_multi(
    usage_df[usage_df["interest_split"] == "Interest below od"],
    "label",
    "usage_buckets",
    "perc_users",
    "label",
    250,
    400,
    "x",
)

### Users with an installment interest above their overdraft interest

In [29]:
day_thirty_90 = two_col_cell(
    usage_df[usage_df["interest_split"] == "Interest above od"],
    "label",
    "30 days after loan",
    "usage_buckets",
    " > 90%",
    "perc_users",
)

md(
    f"On the other hand, users with an installment interest above their overdraft interest then to be on the\
    higher buckets of the overdraft usage with {day_thirty_90}% of the users using more than 90% of their\
    overdraft limit after 30 days."
)

On the other hand, users with an installment interest above their overdraft interest then to be on the    higher buckets of the overdraft usage with 36% of the users using more than 90% of their    overdraft limit after 30 days.

In [30]:
af.column_multi(
    usage_df[usage_df["interest_split"] == "Interest above od"],
    "label",
    "usage_buckets",
    "perc_users",
    "label",
    250,
    400,
    "x",
)

<a id='section3.2'></a>
## How do users move in between buckets over time?

In [31]:
day_0_query = """
with totals as (
select 
t.user_id,
interest_rate,
sum(case when diff = -1 then perc_usage end) as day_before_usage,
sum(case when diff = 0 then perc_usage end) as day_0_usage
from tbil_od_df t
inner join baseline_df using (nh_id)
group by 1, 2
)
select
case when interest_rate < 8.9 then ' Interest below od'
else 'Interest above od' end as interest_split,
case 
when day_before_usage is null then '   non-negative balance'
when day_before_usage > 1 then 'in arrears'
when day_before_usage < 0.1 then '  <=10%'
when day_before_usage between 0.1 and 0.2 then ' 10% to 20%'
when day_before_usage between 0.2 and 0.4 then ' 20% to 40%'
when day_before_usage between 0.4 and 0.6 then ' 40% to 60%'
when day_before_usage between 0.6 and 0.8 then ' 60% to 80%'
when day_before_usage between 0.8 and 0.9 then ' 80% to 90%'
when day_before_usage > 0.9 then ' > 90%'
end as day_before_perc_usage,
case 
when day_0_usage is null then '   non-negative balance'
when day_0_usage > 1 then 'in arrears'
when day_0_usage < 0.1 then '  <=10%'
when day_0_usage between 0.1 and 0.2 then ' 10% to 20%'
when day_0_usage between 0.2 and 0.4 then ' 20% to 40%'
when day_0_usage between 0.4 and 0.6 then ' 40% to 60%'
when day_0_usage between 0.6 and 0.8 then ' 60% to 80%'
when day_0_usage between 0.8 and 0.9 then ' 80% to 90%'
when day_0_usage > 0.9 then ' > 90%'
end as day_0_perc_usage,
user_id,
day_before_usage,
day_0_usage
from totals
"""
day_0_df = con.execute(day_0_query).fetchdf()

In [32]:
day_30_query = """
with totals as (
select 
t.user_id,
interest_rate,
sum(case when diff = -1 then perc_usage end) as day_before_usage,
sum(case when diff = 30 then perc_usage end) as day_30_usage
from tbil_od_df t
inner join baseline_df using (nh_id)
group by 1, 2
)
select
case when interest_rate < 8.9 then ' Interest below od'
else 'Interest above od' end as interest_split,
case 
when day_before_usage is null then '   non-negative balance'
when day_before_usage > 1 then 'in arrears'
when day_before_usage < 0.1 then '  <=10%'
when day_before_usage between 0.1 and 0.2 then ' 10% to 20%'
when day_before_usage between 0.2 and 0.4 then ' 20% to 40%'
when day_before_usage between 0.4 and 0.6 then ' 40% to 60%'
when day_before_usage between 0.6 and 0.8 then ' 60% to 80%'
when day_before_usage between 0.8 and 0.9 then ' 80% to 90%'
when day_before_usage > 0.9 then ' > 90%'
end as day_before_perc_usage,
case 
when day_30_usage is null then '   non-negative balance'
when day_30_usage > 1 then 'in arrears'
when day_30_usage < 0.1 then '  <=10%'
when day_30_usage between 0.1 and 0.2 then ' 10% to 20%'
when day_30_usage between 0.2 and 0.4 then ' 20% to 40%'
when day_30_usage between 0.4 and 0.6 then ' 40% to 60%'
when day_30_usage between 0.6 and 0.8 then ' 60% to 80%'
when day_30_usage between 0.8 and 0.9 then ' 80% to 90%'
when day_30_usage > 0.9 then ' > 90%'
end as day_30_perc_usage,
user_id,
day_before_usage,
day_30_usage
from totals
order by 1
"""
day_30_df = con.execute(day_30_query).fetchdf()

In [33]:
all_day_0_usage_query = """
with unions as (
select
' day of loan' as day,
interest_split as interest_split,
case when day_before_perc_usage = day_0_perc_usage then ' Same Usage Bucket'
when day_before_usage < day_0_usage then 'Increased Usage'
else ' Decreased Usage'
end as usage_shift,
count(*) as n_users
from day_0_df
group by 1, 2, 3
union all
select
' day of loan' as day,
'All Users' as interest_split,
case when day_before_perc_usage = day_0_perc_usage then ' Same Usage Bucket'
when day_before_usage < day_0_usage then 'Increased Usage'
else ' Decreased Usage'
end as usage_shift,
count(*) as n_users
from day_0_df
group by 1, 2, 3
)
select *, 
round(sum(n_users)::numeric/ sum(n_users) over (partition by interest_split), 3)*100 as perc_users
from unions
group by 1, 2, 3, 4
order by 1, 2, 3
"""
con.register("day_0_df", day_0_df)
all_day_0_usage_df = con.execute(all_day_0_usage_query).fetchdf()

In [34]:
all_day_30_usage_query = """
with unions as (
select
'30 days after' as day,
interest_split,
case when day_before_perc_usage = day_30_perc_usage then ' Same Usage Bucket'
when day_before_usage < day_30_usage then 'Increased Usage'
else ' Decreased Usage'
end as usage_shift,
count(*) as n_users
from day_30_df
group by 1, 2, 3
union all
select
'30 days after' as day,
'All Users' as interest_split,
case when day_before_perc_usage = day_30_perc_usage then ' Same Usage Bucket'
when day_before_usage < day_30_usage then 'Increased Usage'
else ' Decreased Usage'
end as usage_shift,
count(*) as n_users
from day_30_df
group by 1, 2, 3
)
select *, 
round(sum(n_users)::numeric/ sum(n_users) over (partition by interest_split), 3)*100 as perc_users
from unions
group by 1, 2, 3, 4
order by 1, 2, 3
"""
con.register("day_30_df", day_30_df)
all_day_30_usage_df = con.execute(all_day_30_usage_query).fetchdf()

In [35]:
all_usage_df = pd.concat([all_day_0_usage_df, all_day_30_usage_df])
all_usage_df

Unnamed: 0,day,interest_split,usage_shift,n_users,perc_users
0,day of loan,Interest below od,Decreased Usage,348,32.8
1,day of loan,Interest below od,Same Usage Bucket,649,61.2
2,day of loan,Interest below od,Increased Usage,63,5.9
3,day of loan,All Users,Decreased Usage,568,32.3
4,day of loan,All Users,Same Usage Bucket,1064,60.5
5,day of loan,All Users,Increased Usage,128,7.3
6,day of loan,Interest above od,Decreased Usage,220,31.4
7,day of loan,Interest above od,Same Usage Bucket,415,59.3
8,day of loan,Interest above od,Increased Usage,65,9.3
0,30 days after,Interest below od,Decreased Usage,332,31.3


### All users

In [36]:
day_loan_increase = two_col_cell(
    all_usage_df[all_usage_df["interest_split"] == "All Users"],
    "day",
    " day of loan",
    "usage_shift",
    "Increased Usage",
    "perc_users",
)
day_30_increase = two_col_cell(
    all_usage_df[all_usage_df["interest_split"] == "All Users"],
    "day",
    "30 days after",
    "usage_shift",
    "Increased Usage",
    "perc_users",
)

md(
    f"When looking into specific balance variations between the day before the loan and the day of the loan,\
    we can see that the vast majority of users stayed in the same overdraft user bucket, and only\
    {day_loan_increase}% of users increased their usage on this day. However, 30 days after the loan this proportion\
    verifies a considerable increase and becomes the biggest bucket with {day_30_increase}% of the users."
)

When looking into specific balance variations between the day before the loan and the day of the loan,    we can see that the vast majority of users stayed in the same overdraft user bucket, and only    7.3% of users increased their usage on this day. However, 30 days after the loan this proportion    verifies a considerable increase and becomes the biggest bucket with 36.5% of the users.

In [37]:
af.column_multi(
    all_usage_df[all_usage_df["interest_split"] == "All Users"],
    "usage_shift",
    "usage_shift",
    "perc_users",
    "day",
    200,
    400,
    "x",
)

### Users with an installment interest below their overdraft interest

In [38]:
md(
    f"Once again, users with an installment interest below their overdraft interest tend to have a reduced increase\
    of overdraft usage when comparing to the group of all users. By day 30 after the loan we can see an almost exact\
    three way split between the 3 buckets."
)

Once again, users with an installment interest below their overdraft interest tend to have a reduced increase    of overdraft usage when comparing to the group of all users. By day 30 after the loan we can see an almost exact    three way split between the 3 buckets.

In [39]:
af.column_multi(
    all_usage_df[all_usage_df["interest_split"] == " Interest below od"],
    "usage_shift",
    "usage_shift",
    "perc_users",
    "day",
    200,
    400,
    "x",
)

### Users with an installment interest above their overdraft interest

In [40]:
day_30_increase = two_col_cell(
    all_usage_df[all_usage_df["interest_split"] == "Interest above od"],
    "day",
    "30 days after",
    "usage_shift",
    "Increased Usage",
    "perc_users",
)
md(
    f"And as expected, we see a trend toward a more intense usage of overdraft for users that have an installment\
    interest above their overdraft interest, with {day_30_increase}% of users increasing their overdraft usage 30 days\
    after the loan."
)

And as expected, we see a trend toward a more intense usage of overdraft for users that have an installment    interest above their overdraft interest, with 39.7% of users increasing their overdraft usage 30 days    after the loan.

In [41]:
af.column_multi(
    all_usage_df[all_usage_df["interest_split"] == "Interest above od"],
    "usage_shift",
    "usage_shift",
    "perc_users",
    "day",
    200,
    400,
    "x",
)

<a id='section3.3'></a>
## How do users in each balance bucket before the loan move in between buckets over time?
In the comparisons below, we go one step more granular and look into how users in each of the balance buckets in the day before the loan increase/ decrease their overdraft usage on the day of the loan/ 30 days after the loan. Therefore, each line of the charts below represents 100% of users that had an overdraft balance in the specified bucket on the day before the loan.


The main trend we can identify is that we have a smaller proportion of users in extreme buckets (<=10% and >90%) with an increase of overdraft usage after 30 days than the less extreme user buckets. Therefore, it seems that especially for these users, it can be more beneficial to use both overdraft and installment loans.

In [42]:
day_0_usage_query = """
with unions as (
select
interest_split,
day_before_perc_usage,
case when day_before_perc_usage = day_0_perc_usage then ' Same Usage Bucket'
when day_before_usage < day_0_usage then 'Increased Usage in day of loan'
else ' Decreased Usage in day of loan'
end as usage_shift,
count(*) as n_users
from day_0_df
group by 1, 2, 3
union all
select
'All Users' as interest_split,
day_before_perc_usage,
case when day_before_perc_usage = day_0_perc_usage then ' Same Usage Bucket'
when day_before_usage < day_0_usage then 'Increased Usage in day of loan'
else ' Decreased Usage in day of loan'
end as usage_shift,
count(*) as n_users
from day_0_df
group by 1, 2, 3
)
select *, 
round(sum(n_users)::numeric/ sum(n_users) over (partition by interest_split, day_before_perc_usage), 3)*100 as perc_users
from unions
group by 1, 2, 3, 4
order by 1, 2
"""
day_0_usage_df = con.execute(day_0_usage_query).fetchdf()

In [43]:
day_30_usage_query = """
with unions as (
select
interest_split,
day_before_perc_usage,
case when day_before_perc_usage = day_30_perc_usage then ' Same Usage Bucket'
when day_before_usage < day_30_usage then 'Increased Usage after 30 days'
else ' Decreased Usage after 30 days'
end as usage_shift,
count(*) as n_users
from day_30_df
group by 1, 2, 3
union all
select
'All Users' as interest_split,
day_before_perc_usage,
case when day_before_perc_usage = day_30_perc_usage then ' Same Usage Bucket'
when day_before_usage < day_30_usage then 'Increased Usage after 30 days'
else ' Decreased Usage after 30 days'
end as usage_shift,
count(*) as n_users
from day_30_df
group by 1, 2, 3
)
select *, 
round(sum(n_users)::numeric/ sum(n_users) over (partition by interest_split, day_before_perc_usage), 3)*100 as perc_users
from unions
group by 1, 2, 3, 4
order by 1, 2, 3
"""
day_30_usage_df = con.execute(day_30_usage_query).fetchdf()

### All Users

In [44]:
# add heatmap with before and after
day0 = heatmap(
    day_0_usage_df[day_0_usage_df["interest_split"] == "All Users"],
    "perc_users:Q",
    "usage_shift:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    250,
    400,
    ["n_users", "day_0_perc_usage:N", "day_before_perc_usage:N"],
)
day30 = heatmap(
    day_30_usage_df[day_30_usage_df["interest_split"] == "All Users"],
    "perc_users:Q",
    "usage_shift:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    250,
    400,
    ["n_users", "day_30_perc_usage:N", "day_before_perc_usage:N"],
)
graph(
    day0,
    "Difference between day before loan and day of loan",
    day30,
    "Difference between day before loan and 30 days after loan",
)

### Users with an installment interest below their overdraft interest

In [45]:
# add heatmap with before and after
day0 = heatmap(
    day_0_usage_df[day_0_usage_df["interest_split"] == " Interest below od"],
    "perc_users:Q",
    "usage_shift:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    250,
    400,
    ["n_users", "day_0_perc_usage:N", "day_before_perc_usage:N"],
)
day30 = heatmap(
    day_30_usage_df[day_30_usage_df["interest_split"] == " Interest below od"],
    "perc_users:Q",
    "usage_shift:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    250,
    400,
    ["n_users", "day_30_perc_usage:N", "day_before_perc_usage:N"],
)
graph(
    day0,
    "Difference between day before loan and day of loan",
    day30,
    "Difference between day before loan and 30 days after loan",
)

### Users with an installment interest above their overdraft interest

In [46]:
# add heatmap with before and after
day0 = heatmap(
    day_0_usage_df[day_0_usage_df["interest_split"] == "Interest above od"],
    "perc_users:Q",
    "usage_shift:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    250,
    400,
    ["n_users", "day_0_perc_usage:N", "day_before_perc_usage:N"],
)
day30 = heatmap(
    day_30_usage_df[day_30_usage_df["interest_split"] == "Interest above od"],
    "perc_users:Q",
    "usage_shift:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    250,
    400,
    ["n_users", "day_30_perc_usage:N", "day_before_perc_usage:N"],
)
graph(
    day0,
    "Difference between day before loan and day of loan",
    day30,
    "Difference between day before loan and 30 days after loan",
)

<a id='section3.4'></a>
## Bucket to bucket comparison - to which bucket do users move over time?

We can go even one step more granular and compare all buckets of overdraft usage on the day before the loan vs. the buckets on the day of the loan/ 30 days after the loan. The main trend that we can see with this extra granularity is that users in the two buckets between 60% and 90% of overdraft usage tend to increase their usage to the >90% bucket 30 days after the loan. Also, the majority of users on the <= 10% bucket tend to either stay in the same bucket or decrease their usage 30 days after the loan, meaning once again that using both products can have a beneficial trend on the overdraft usage for these.

In [47]:
day_b4_and_0_usage_query = """
with unions as (
select
interest_split,
day_before_perc_usage,
day_0_perc_usage,
count(*) as n_users
from day_0_df
group by 1, 2, 3
union all
select
'All Users' as interest_split,
day_before_perc_usage,
day_0_perc_usage,
count(*) as n_users
from day_0_df
group by 1, 2, 3
)
select *, 
round(sum(n_users)::numeric/ sum(n_users) over (partition by interest_split, day_before_perc_usage), 3)*100 as perc_users
from unions
group by 1, 2, 3, 4
"""
day_b4_and_0_usage_df = con.execute(day_b4_and_0_usage_query).fetchdf()

In [48]:
day_b4_and_30_usage_query = """
with unions as (
select
interest_split,
day_before_perc_usage,
day_30_perc_usage,
count(*) as n_users
from day_30_df
group by 1, 2, 3
union all
select
'All Users' as interest_split,
day_before_perc_usage,
day_30_perc_usage,
count(*) as n_users
from day_30_df
group by 1, 2, 3
)
select *, 
round(sum(n_users)::numeric/ sum(n_users) over (partition by interest_split, day_before_perc_usage), 3)*100 as perc_users
from unions
group by 1, 2, 3, 4
"""
day_b4_and_30_usage_df = con.execute(day_b4_and_30_usage_query).fetchdf()

### All users

In [49]:
# add heatmap with before and after
day0 = heatmap(
    day_b4_and_0_usage_df[day_b4_and_0_usage_df["interest_split"] == "All Users"],
    "perc_users:Q",
    "day_0_perc_usage:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    370,
    400,
    ["n_users", "day_0_perc_usage:N", "day_before_perc_usage:N"],
)
day30 = heatmap(
    day_b4_and_30_usage_df[day_b4_and_30_usage_df["interest_split"] == "All Users"],
    "perc_users:Q",
    "day_30_perc_usage:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    370,
    400,
    ["n_users", "day_30_perc_usage:N", "day_before_perc_usage:N"],
)
graph(
    day0,
    "Difference between day before loan and day of loan",
    day30,
    "Difference between day before loan and 30 days after loan",
)

### Users with an installment interest below their overdraft interest

In [50]:
# add heatmap with before and after
day0 = heatmap(
    day_b4_and_0_usage_df[
        day_b4_and_0_usage_df["interest_split"] == " Interest below od"
    ],
    "perc_users:Q",
    "day_0_perc_usage:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    370,
    400,
    ["n_users", "day_0_perc_usage:N", "day_before_perc_usage:N"],
)
day30 = heatmap(
    day_b4_and_30_usage_df[
        day_b4_and_30_usage_df["interest_split"] == " Interest below od"
    ],
    "perc_users:Q",
    "day_30_perc_usage:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    370,
    400,
    ["n_users", "day_30_perc_usage:N", "day_before_perc_usage:N"],
)
graph(
    day0,
    "Difference between day before loan and day of loan",
    day30,
    "Difference between day before loan and 30 days after loan",
)

### Users with an installment interest above their overdraft interest

In [51]:
# add heatmap with before and after
day0 = heatmap(
    day_b4_and_0_usage_df[
        day_b4_and_0_usage_df["interest_split"] == "Interest above od"
    ],
    "perc_users:Q",
    "day_0_perc_usage:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    370,
    400,
    ["n_users", "day_0_perc_usage:N", "day_before_perc_usage:N"],
)
day30 = heatmap(
    day_b4_and_30_usage_df[
        day_b4_and_30_usage_df["interest_split"] == "Interest above od"
    ],
    "perc_users:Q",
    "day_30_perc_usage:N",
    "day_before_perc_usage:N",
    alt.datum.perc_users < 30,
    370,
    400,
    ["n_users", "day_30_perc_usage:N", "day_before_perc_usage:N"],
)
graph(
    day0,
    "Difference between day before loan and day of loan",
    day30,
    "Difference between day before loan and 30 days after loan",
)

# Recommendations:
 - Maintain the possibility to use both products, especially users with an interest below overdraft, and users that are either using <= 10% or > 90% of their overdraft.
 - We should also re-run this research once we have the installment product running for long enough to see how the overdraft balance of these users evolved 60 and 90 days after the loan.
 - We could also re-run the research once these users have an overdraft interest charge, and use the timestamp of the charged interest to compare them with the remaining overdraft users that didn't take a loan.