title: Who are our overdraft users?
author: Helder Silva 
date: 2020-11-16
region: EU  
tags: overdraft, bank products, platform, mau, arrears, write-off, balance, transactions
summary: - The vast majority of overdraft users are using this product as intended (60.9%). <br> - Users with overdraft enabled have a bigger proportion of MAUs than our user base.  <br> - The misusing overdraft group has a considerably bigger proportion of disabled overdraft in between their overdraft enabled, but does so for a shorter period of time than the other 2 groups. <br> - The group properly using overdraft the majority of users tends to use overdraft in a substantially smaller percentage in comparison to the users misusing overdraft. The misusing overdraft group has the highest percentages of premium users of all groups (which makes sense since premium fees are a common cause for users to exceed their overdraft limit and go into arrears). <br> - As for the group of users who never needed to use overdraft has a higher average balance (2094.71€) than the users using overdraft (575.63€ for the properly using overdraft group and the misusing overdraft group has a negative average balance of -180.61€).

<div class="alert alert-block alert-success">
    <H1>Who are our overdraft users? </H1>

</div>

In this investigation we try to better understand how our customers are using our arranged overdraft product (hereinafter overdraft) . To do this, we will be splitting our overdraft users into 3 groups, and compare them across several parameters. These groups are: 
1. __Never used overdraft__ - users that have enabled the overdraft product for more than 30 days and haven't had a negative balance in any day of having overdraft enabled
2. __Properly using overdraft__ - users that have enabled the overdraft product for more than 30 days, have ended at least 1 day with a negative balance, but never had an arrears for this overdraft
3. __Misusing overdraft__ - users that have enabled the overdraft product for more than 30 days, have ended at least 1 day with a negative balance, and were in arrears for this overdraft

This research includes all users who had their overdraft enabled at any point in time since the overdraft product was launched up until October 31st 2020. 

Here are our main findings:
- The vast majority of overdraft users are using this product as intended (60.9%).
- Users with __overdraft enabled have a bigger proportion of MAUs__ than our user base.
- The misusing overdraft group has a considerably bigger proportion of disabled overdraft in between their overdraft enabled, but does so for a shorter period of time than the other 2 groups.
-  The group properly using overdraft the majority of users tends to use overdraft in a substantially smaller percentage in comparison to the users misusing overdraft.
- The __misusing overdraft__ group has the __highest percentages of premium users__ of all groups (which makes sense since premium fees are a common cause for users to exceed their overdraft limit and go into arrears).
- As for the group of users who never needed to use overdraft has a higher __average balance__ (2094.71€) than the users using overdraft (575.63€ for the properly using overdraft group and the misusing overdraft group has a negative average balance of -180.61€).
- When it comes to __transaction types__: 
 - users who never used overdraft have a higher percentage of SEPA transfers, 
 - users properly using overdraft have a higher percentage of Presentment transactions, 
 - and users misusing overdraft have a lower percentage of Direct Debits and a higher percentage of Authorization Rejects, Fees by N26 and Direct Debit Reversal than the other groups.

In [2]:
import pandas as pd
import numpy as np
from utils.datalib_database import df_from_sql

import utils.altair_functions as af
import altair as alt

In [3]:
od_users_query = """
select *,
to_char(first_enabled, 'YYYY') as first_enabled_year
from dev_dbt.temp_overdraft_users_groups
"""


balance_overview_query = """
select 
main_group,
date_trunc('month', date) as month,
count(distinct g.user_created) as users,
avg(balance_eur) as avg_balance_eur,
count(case when balance_eur <0 then 1 end) as negative_balance_days,
negative_balance_days::numeric/users::numeric as neg_balance_days_per_user
from dev_dbt.temp_overdraft_users_groups g
inner join dbt.mmb_daily_balance_aud ba
on g.user_created = ba.user_created 
and ba.date <= '2020-10-31'
and product_key_group = 'PRIMARY'
and enabled_more_than_30days
group by 1, 2
"""

transactions_overview_query = """
select 
main_group,
date_trunc('month', created) as month,
type,
count(distinct g.user_created) as users,
count(*) as txn_count,
txn_count::numeric/ users::numeric as txns_per_user,
sum(amount_cents::numeric)/100 as txn_total_volume,
avg(amount_cents::numeric)/100 as txn_avg_volume,
txn_avg_volume::numeric/ users::numeric as txn_avg_volume_per_user
from dev_dbt.temp_overdraft_users_groups g
inner join dwh_sneaky_transaction st
on g.user_created = st.user_created 
and created between '2019-11-01' and '2020-10-31'
and enabled_more_than_30days
group by 1, 2, 3
"""

mau_user_base_query = """
select 
coalesce(up.product_id, 'STANDARD') as product_id,
count(*) as user_count
from dbt.zrh_users u
left join dbt.zrh_user_product up
on u.user_created = up.user_created
and '2020-10-31' between subscription_valid_from  and subscription_valid_until
where closed_at is null
and is_mau
and tnc_country_group in ('DEU', 'AUT')
group by 1
"""

mau_end_of_oct_query = """
select 
o.main_group, 
case when at.user_created is not null then 'Is MAU' 
else 'Is not MAU' end as is_mau,
count(*)
from dev_dbt.temp_overdraft_users_groups o
inner join dbt.zrh_users u using (user_created)
left join dbt.zrh_user_activity_txn at
on o.user_created = at.user_created
and activity_type = '1_tx_35'
and '2020-10-31' between activity_start and activity_end
where u.closed_at is null
group by 1, 2
union all 
select 
'All DEU AUT Users',
case when at.user_created is not null then 'Is MAU' 
else 'Is not MAU' end as is_mau,
count(*) as user_count
from dbt.zrh_users u
left join dbt.zrh_user_activity_txn at
on u.user_created = at.user_created
and activity_type = '1_tx_35'
and '2020-10-31' between activity_start and activity_end
where u.closed_at is null
and tnc_country_group in ('DEU', 'AUT')
group by 1, 2
order by 1
"""

In [34]:
df_od_users = df_from_sql("redshiftreader", od_users_query)

# Overdraft user groups 
### What percentage of users default on us?

For this analysis we will be looking into which of our users have enabled our overdraft product. In order to ensure that overdraft users have some time with the product before starting using it, we will only include users who have had overdraft enabled for more than 30 days (97% of all overdraft users).

In [5]:
df_30_day_filter = (
    df_od_users.groupby(["enabled_more_than_30days"]).count().reset_index()
)
df_30_day_filter[["Percentage Users"]] = df_30_day_filter[["user_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
df_30_day_filter[["enabled_more_than_30days", "user_created", "Percentage Users"]]

Unnamed: 0,enabled_more_than_30days,user_created,Percentage Users
0,False,4502,2.9
1,True,148865,97.1


In [6]:
df_od_users = df_od_users[df_od_users["enabled_more_than_30days"] == True]

Below we can verify that the vast majority of overdraft users are using this product as intended (60.9%).

In [7]:
df_main_group_filter = df_od_users.groupby(["main_group"]).count().reset_index()
df_main_group_filter[["Percentage Users"]] = df_main_group_filter[
    ["user_created"]
].apply(lambda x: round((x / x.sum()) * 100, 1), axis=0)
df_main_group_filter[["main_group", "user_created", "Percentage Users"]]

Unnamed: 0,main_group,user_created,Percentage Users
0,1. never used od,42359,28.5
1,2. properly using od,90680,60.9
2,3. misusing od,15826,10.6


In [8]:
af.bar_single_label(
    df_main_group_filter, af.teal, "Percentage Users:Q", "main_group:N", 800, 200, "y"
).properties(title="Percentage of users per group")

# User Groups Comparison

Next we will be comparing each of these 3 groups through the following parameters:
- [Arrears vs. Write-off split](#section1)
- [MAU split](#section2)
- [Overdraft enabled days overview](#section3)
- [Overdraft disabled days overview](#section4)
- [Percentage of days using overdraft](#section5)
- [Membership per group](#section6)
- [Monthly average balance per group](#section7)
- [Transaction types per group](#section8)


In [9]:
# Split df into 3 cases
df_never_used = df_od_users[df_od_users["main_group"] == "1. never used od"]
df_properly_using = df_od_users[df_od_users["main_group"] == "2. properly using od"]
df_misusing = df_od_users[df_od_users["main_group"] == "3. misusing od"]

<a id='section1'></a>
## Arrears vs. Write-off split
### How many users misusing overdraft actually get written-off?

Within the group of users misusing overdraft, we can split users that were in arrears and users with write-offs. The difference between these 2 subgroups is that the first subgroup only includes users that at some point exceeded their allowed overdraft limit but didn't get a write-off, whereas the second subgroup had a write-off after going into arrears. 

This means that about 44.6% of the users misusing overdraft has a write-off, below you can find the distribution of this ratio based on the years of when these users have first enabled their overdraft.

For more details on write-off cohorts and volumes, you can check [this dashboard](https://metabase-product.tech26.de/dashboard/299).

In [10]:
df_arrears_type = df_misusing.groupby(["arrears_type"]).count().reset_index()
df_arrears_type[["Percentage Users"]] = df_arrears_type[["user_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
df_arrears_type[["arrears_type", "user_created", "Percentage Users"]]

Unnamed: 0,arrears_type,user_created,Percentage Users
0,has write-off,7066,44.6
1,was in arrears,8760,55.4


In [11]:
df_arrears_type_year = (
    df_misusing.groupby(["arrears_type", "first_enabled_year"]).count().reset_index()
)
df_arrears_type_year[["Percentage Users"]] = df_arrears_type_year[
    ["user_created"]
].apply(lambda x: round((x / x.sum()) * 100, 1), axis=0)

In [12]:
af.column_multi(
    df_arrears_type_year,
    "arrears_type:N",
    "arrears_type:N",
    "Percentage Users:Q",
    "first_enabled_year:O",
    150,
    300,
    "-y",
).properties(
    title="Percentage Arrears and Write-off users per first enabled overdraft year"
)

<a id='section2'></a>
## MAU split

### Are overdraft users more active?

In order to compare each of the overdraft user groups with all of our users with the whole user base in Germany and Austria (countries where the overdraft product is currently available), we looked into the activity status of all the users with an open account in these markets on October 31st 2020, and split them between those who are MAUs and those who aren't. 

We can verify that generally, groups of users with overdraft enabled have a bigger proportion of MAUs than our user base, especially the groups that are properly using overdraft or misusing overdraft have respectively 88.6% and 83.4% of MAUs, whereas our user base only had 40.3% of MAUs in the selected date.

In [35]:
# df_mau_split = df_from_sql("redshiftreader", mau_end_of_oct_query)

In [14]:
df_mau_split_grouped = df_mau_split.groupby(["main_group", "is_mau"]).agg(
    {"count": "sum"}
)
df_mau_split_grouped = round(
    df_mau_split_grouped.groupby(level=0).apply(lambda x: 100 * x / float(x.sum())), 1
)
df_mau_split_grouped = df_mau_split_grouped.reset_index()
df_mau_split_grouped = df_mau_split_grouped[df_mau_split_grouped["is_mau"] == "Is MAU"]
df_mau_split_grouped["Percentage MAUs"] = df_mau_split_grouped["count"]

In [15]:
alt.Chart(df_mau_split_grouped).mark_bar().encode(
    alt.X("Percentage MAUs:Q", sort="-y"),
    alt.Y("main_group:N"),
    color="main_group:N",
    tooltip="Percentage Users:Q",
).properties(width=600, height=300, title="Percentage of MAUs per group")

<a id='section3'></a>
## Overdraft enabled days overview

### Which groups have their overdraft enabled for the longest period?

Here we will be looking into how many days do our users have their overdraft enabled. For each group, we have:
 - __1. never used overdraft__: 25% of the users having overdraft enabled for __394__ days or less
 - __2. properly using overdraft__: 25% of the users having overdraft enabled for __377__ days or less
 - __3. misusing overdraft__: 25% of the users having overdraft enabled for __231__ days or less
 
This means that users misusing overdraft tend to have the product enabled for a shorter period of time (which makes sense since about half of these will have their overdraft untimely cancelled if they don't bring their balance back to the allowed limit). As for the remaining 2 groups, these seem to have a similar pattern of overdraft enabled days.

In [16]:
# enabled days over time
df_enabled_group1 = df_never_used.groupby(["enabled_days_count"]).count().reset_index()
df_enabled_group1["label"] = "1. never used od"
df_enabled_group1["percent_users"] = round(
    (
        df_enabled_group1["user_created"].cumsum()
        / df_enabled_group1["user_created"].sum()
    )
    * 100,
    1,
)

df_enabled_group2 = (
    df_properly_using.groupby(["enabled_days_count"]).count().reset_index()
)
df_enabled_group2["label"] = "2. properly using od"
df_enabled_group2["percent_users"] = round(
    (
        df_enabled_group2["user_created"].cumsum()
        / df_enabled_group2["user_created"].sum()
    )
    * 100,
    1,
)

df_enabled_group3 = df_misusing.groupby(["enabled_days_count"]).count().reset_index()
df_enabled_group3["label"] = "3. misusing od"
df_enabled_group3["percent_users"] = round(
    (
        df_enabled_group3["user_created"].cumsum()
        / df_enabled_group3["user_created"].sum()
    )
    * 100,
    1,
)


df_enabled_group = pd.concat([df_enabled_group1, df_enabled_group2, df_enabled_group3])

af.line_multi(
    df_enabled_group,
    "label:N",
    "enabled_days_count:O",
    "percent_users:Q",
    800,
    400,
    "x",
).properties(title="Percentage of users per overdraft enabled days")

<a id='section4'></a>

## Overdraft disabled days overview

### Which group has more users disabling and reenabling their overdraft?

And in this section we will explore the proportion of days each of our main groups have disabled their overdraft in between their first and last overdraft enabled day. The first trend we can see is that the misusing overdraft group has a considerably bigger proportion (18.2%) of users who have disabled their overdraft in between having their overdraft enabled. 

In [17]:
df_disabled_status = df_od_users.groupby(["main_group", "disabled_status"]).agg(
    {"user_created": "count"}
)
df_disabled_status = round(
    df_disabled_status.groupby(level=0).apply(lambda x: 100 * x / float(x.sum())), 1
)
df_disabled_status = df_disabled_status.reset_index()
df_disabled_status["Percentage Users"] = df_disabled_status["user_created"]
df_disabled_status = df_disabled_status[
    df_disabled_status["disabled_status"] == "has disabled days in between"
]

In [18]:
alt.Chart(df_disabled_status).mark_bar().encode(
    alt.X("Percentage Users:Q", sort="-y"),
    alt.Y("main_group:N"),
    color="main_group:N",
    tooltip="Percentage Users:Q",
).properties(width=600, height=300, title="Percentage of users with overdraft disabled")

Here we can see how many days our users have their overdraft disabled. For each group, we have:
 - __1. never used overdraft__: 25% of the users having overdraft disabled for __70__ days or less
 - __2. properly using overdraft__: 25% of the users having overdraft disabled for __31__ days or less
 - __3. misusing overdraft__: 25% of the users having overdraft disabled for __10__ days or less
 
This means that users misusing overdraft tend to also have the product disabled for a shorter period of time, and those who never used overdraft tend to have it disabled for a longer period of time (which makes sense since they are not using this product).

In [19]:
# disabled days over time
df_disabled_group1 = df_never_used[
    df_never_used["disabled_status"] == "has disabled days in between"
]
df_disabled_group1 = df_disabled_group1.groupby(["disabled_days"]).count().reset_index()
df_disabled_group1["label"] = "1. never used od"
df_disabled_group1["percent_users"] = round(
    (
        df_disabled_group1["user_created"].cumsum()
        / df_disabled_group1["user_created"].sum()
    )
    * 100,
    1,
)

df_disabled_group2 = df_properly_using[
    df_properly_using["disabled_status"] == "has disabled days in between"
]
df_disabled_group2 = df_disabled_group2.groupby(["disabled_days"]).count().reset_index()
df_disabled_group2["label"] = "2. properly using od"
df_disabled_group2["percent_users"] = round(
    (
        df_disabled_group2["user_created"].cumsum()
        / df_disabled_group2["user_created"].sum()
    )
    * 100,
    1,
)

df_disabled_group3 = df_misusing[
    df_misusing["disabled_status"] == "has disabled days in between"
]
df_disabled_group3 = df_disabled_group3.groupby(["disabled_days"]).count().reset_index()
df_disabled_group3["label"] = "3. misusing od"
df_disabled_group3["percent_users"] = round(
    (
        df_disabled_group3["user_created"].cumsum()
        / df_disabled_group3["user_created"].sum()
    )
    * 100,
    1,
)


df_disabled_group = pd.concat(
    [df_disabled_group1, df_disabled_group2, df_disabled_group3]
)

af.line_multi(
    df_disabled_group, "label:N", "disabled_days:O", "percent_users:Q", 800, 400, "x"
).properties(title="Percentage of users per overdraft disabled days")

<a id='section5'></a>
## Percentage of days using overdraft

### How often do users actually use overdraft?

Another metric worth considering is the percentage of days users use overdraft out of all days they have overdraft enabled. For this we will only consider the groups 'properly using overdraft' and 'misusing overdraft' (since by definition the group 'never used overdraft' has no days of using overdraft).

We can easily see that in the group properly using overdraft, the majority of users tends to use overdraft in a substantially smaller percentage in comparison to the users misusing overdraft.

In [20]:
df_perc_days_using = (
    df_properly_using.groupby(["perc_days_using_od"]).count().reset_index()
)
using_od_chart = af.column_single(
    df_perc_days_using,
    "#e78a39",
    "perc_days_using_od:O",
    "user_created:Q",
    400,
    400,
    "x",
).properties(title="Properly Using Ovedraft - Percentage of days using Overdraft")

df_perc_days_using = df_misusing.groupby(["perc_days_using_od"]).count().reset_index()
misusing_od_chart = af.column_single(
    df_perc_days_using,
    "#d45f5b",
    "perc_days_using_od:O",
    "user_created:Q",
    400,
    400,
    "x",
).properties(title="Misusing Ovedraft - Percentage of days using Overdraft")

using_od_chart | misusing_od_chart

Below we will explore what percentage of users corresponds to what percentage of days using overdraft. For each group, we have:
 - __2. properly using overdraft__: 25% of the users use overdraft for __8%__ of the enabled days or less
 - __3. misusing overdraft__: 25% of the users use overdraft for __70%__ of the enabled days or less
 
Given that there is such a distinct pattern between these 2 groups, in future analysis we can try to predict whether the users who have overdraft above a high threshold of the enabled days are in risk of going into arrears.

In [21]:
# enabled days over time
df_days_using_group1 = (
    df_never_used.groupby(["perc_days_using_od"]).count().reset_index()
)
df_days_using_group1["label"] = "1. never used od"
df_days_using_group1["percent_users"] = round(
    (
        df_days_using_group1["user_created"].cumsum()
        / df_days_using_group1["user_created"].sum()
    )
    * 100,
    1,
)

df_days_using_group2 = (
    df_properly_using.groupby(["perc_days_using_od"]).count().reset_index()
)
df_days_using_group2["label"] = "2. properly using od"
df_days_using_group2["percent_users"] = round(
    (
        df_days_using_group2["user_created"].cumsum()
        / df_days_using_group2["user_created"].sum()
    )
    * 100,
    1,
)

df_days_using_group3 = df_misusing.groupby(["perc_days_using_od"]).count().reset_index()
df_days_using_group3["label"] = "3. misusing od"
df_days_using_group3["percent_users"] = round(
    (
        df_days_using_group3["user_created"].cumsum()
        / df_days_using_group3["user_created"].sum()
    )
    * 100,
    1,
)


df_days_using_group = pd.concat(
    [df_days_using_group1, df_days_using_group2, df_days_using_group3]
)

af.line_multi(
    df_days_using_group,
    "label:N",
    "perc_days_using_od:O",
    "percent_users:Q",
    800,
    400,
    "x",
).properties(title="Percentage of users per percentage of days using overdraft")

<a id='section6'></a>
## Membership per group

### Which group has the biggest percentage of premium users?

When exploring memberships per each of our overdraft groups, we looked into the membership of these users by the time they enabled overdraft. Since it is expected that these users would be active, we decided to use our MAU users in DEU and AUT as of October 31st 2020 as a proxy for a user base. 

In [22]:
df_product_group = (
    df_od_users.groupby(["main_group", "product_id"]).count().reset_index()
)
df_product = df_product_group.groupby(["product_id"]).sum().reset_index()
df_product[["Percentage Users"]] = df_product[["user_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
df_product[["product_id", "user_created", "Percentage Users"]].sort_values(
    by=["Percentage Users"], ascending=False
)

Unnamed: 0,product_id,user_created,Percentage Users
6,STANDARD,125653,84.4
2,BUSINESS_CARD,12018,8.1
0,BLACK_CARD_MONTHLY,6487,4.4
5,METAL_CARD_MONTHLY,3163,2.1
1,BUSINESS_BLACK,1028,0.7
4,FLEX_ACCOUNT_MONTHLY,470,0.3
3,BUSINESS_METAL,46,0.0


In [36]:
# df_product_userbase = df_from_sql("redshiftreader", mau_user_base_query)

The most interesting differences when comparing our groups to the user base are:
 - __MAU user base__: these have higher percentages of standard business users and flex users (and therefore a lower percentage of standard users) than our overdraft groups
 - __1. never used overdraft__: has the highest percentage of standard users and very low percentages of premium users
 - __2. properly using overdraft__: has the highest percentage of standard business users out of the overdraft enabled groups
 - __3. misusing overdraft__: has the highest percentages of premium users of all groups (which makes sense since premium fees are a common cause for users to exceed their overdraft limit and go into arrears)

In [38]:
df_product_group1 = df_product_group[
    df_product_group["main_group"] == "1. never used od"
]
df_product_group2 = df_product_group[
    df_product_group["main_group"] == "2. properly using od"
]
df_product_group3 = df_product_group[df_product_group["main_group"] == "3. misusing od"]

df_product_group1[["Percentage Users"]] = df_product_group1[["user_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
df_product_group2[["Percentage Users"]] = df_product_group2[["user_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
df_product_group3[["Percentage Users"]] = df_product_group3[["user_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
df_product_userbase[["Percentage Users"]] = df_product_userbase[["user_count"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)

product_group1 = af.column_single_label(
    df_product_group1, "#5579a4", "product_id:N", "Percentage Users:Q", 130, 300, "x"
)
product_group2 = af.column_single_label(
    df_product_group2, "#e78a39", "product_id:N", "Percentage Users:Q", 130, 300, "x"
)
product_group3 = af.column_single_label(
    df_product_group3, "#d45f5b", "product_id:N", "Percentage Users:Q", 130, 300, "x"
)
product_userbase = af.column_single_label(
    df_product_userbase, "#82b5b2", "product_id:N", "Percentage Users:Q", 130, 300, "x"
)

product_userbase.properties(
    title="MAU user base as of 2020-10-31"
) | product_group1.properties(
    title="Users who never used od per membership"
) | product_group2.properties(
    title="Users properly using od per membership"
) | product_group3.properties(
    title="Users misusing od per membership"
)

<a id='section7'></a>
## Monthly average balance per group
### How often do overdraft users actually have a negative balance?

We can find very distinct patterns when comparing each of our groups against their monthly average balance. As expected, the group of users who never needed to use overdraft has a higher average balance (2094.71€) than the users using overdraft (575.63€ for the properly using overdraft group and the misusing overdraft group has a negative average balance of -180.61€).

Also, as seen in the more intense usage of the users misusing overdraft, they have an average of 16 days per month with a negative balance, whereas users properly using overdraft only do so 8 days per month on average.

In [39]:
df_balances = df_from_sql("redshiftreader", balance_overview_query)

In [26]:
df_balances_overview = df_balances.groupby(["main_group"]).mean().reset_index()
df_balances_overview["avg_balance_eur"] = round(
    df_balances_overview["avg_balance_eur"], 2
)
df_balances_overview["neg_balance_days_per_user"] = round(
    df_balances_overview["neg_balance_days_per_user"]
).astype(int)
df_balances_overview[["main_group", "avg_balance_eur", "neg_balance_days_per_user"]]

Unnamed: 0,main_group,avg_balance_eur,neg_balance_days_per_user
0,1. never used od,2094.71,0
1,2. properly using od,575.63,8
2,3. misusing od,-180.61,16


In [27]:
df_balances_grouped = df_balances.groupby(["main_group", "month"]).mean().reset_index()
df_balances_grouped["avg_balance_eur"] = round(
    df_balances_grouped["avg_balance_eur"], 2
)
df_balances_grouped["neg_balance_days_per_user"] = round(
    df_balances_grouped["neg_balance_days_per_user"], 1
)

When it comes to average balances, we can see that there for the group of users who never used overdraft, their average balance tends to increase over time, whereas for the users misusing overdraft we can see a decreasing trend. 

Also, we can see a positive balance impact on from March 2020 to April 2020, most likely due to the Covid-19 first lockdown, where users were spending less.

In [28]:
af.column_multi(
    df_balances_grouped,
    "main_group:N",
    "month:O",
    "avg_balance_eur:Q",
    "main_group:N",
    250,
    400,
    "x",
).properties(title="Average monthly balance per overdraft group")

And by looking into the average of days per month with a negative balance, we can see that for users properly using overdraft this average has been increasing over time, having reached its highest value in October 2020 (13.7 days). As for the misusing overdraft group, the average of days with a negative balance peaked in October 2019 with 20.8 days, and has been generally decreasing ever since, reaching 15.6 days in October 2020.

We can also see that sometimes the group of users who never used overdraft has a marginal negative balance, this happens when these users go into negative balance while their overdraft was disabled (and therefore were not using overdraft at the time).

In [29]:
af.column_multi(
    df_balances_grouped,
    "main_group:N",
    "month:O",
    "neg_balance_days_per_user:Q",
    "main_group:N",
    250,
    400,
    "x",
).properties(title="Average number of days negative balance per month per group")

<a id='section8'></a>
## Transaction types per group

### Are there transactions patterns that can help us differentiate our overdraft groups?

In [40]:
df_txn_types = df_from_sql("redshiftreader", transactions_overview_query)

In [31]:
df_txn_types_overview = df_txn_types.groupby(["main_group"]).mean().reset_index()
df_txn_types_overview["txn_avg_volume"] = round(
    df_txn_types_overview["txn_avg_volume"], 2
)
df_txn_types_overview["txns_per_user"] = round(
    df_txn_types_overview["txns_per_user"], 1
)

df_txn_grouped = df_txn_types.groupby(["main_group", "type"]).mean().reset_index()

df_txn_grouped2 = df_txn_types[df_txn_types["type"] != "AA"]
df_txn_grouped2 = df_txn_grouped2.groupby(["main_group", "type"]).agg(
    {"txn_count": "mean"}
)
df_txn_grouped2 = round(
    df_txn_grouped2.groupby(level=0).apply(lambda x: 100 * x / float(x.sum())), 1
)
df_txn_grouped2 = df_txn_grouped2.reset_index()
df_txn_grouped2["perc_txns"] = df_txn_grouped2["txn_count"]

Our last comparison is looking into which transaction types are the most used for each of the overdraft groups. In the first chart we look into the percentage of the number of transactions of each type per group (excluding transaction Authorizations - AA). The most interesting patterns found here are:

 - __1. never used overdraft__: has a higher percentage of SEPA transfers (Direct Transfers (DT) and Credit Transfers (CT)) as in comparison to the other groups
 - __2. properly using overdraft__: has a higher percentage of Presentment transactions (PT) than the other groups
 - __3. misusing overdraft__: has a lower percentage of Direct Debits (DD) and a higher percentage of Authorization Rejects (AR), Fees by N26 (WEE) and Direct Debit Reversal (DR) than the other groups

In [32]:
af.column_multi(
    df_txn_grouped2,
    "main_group:N",
    "type:N",
    "perc_txns:Q",
    "main_group:N",
    250,
    300,
    "-y",
).properties(title="Percentage of transactions per type per group")

In the chart below you can see the comparison of the number of monthly transactions per transaction type per group, including AAs:

In [33]:
df_txn_grouped["txns_per_user"] = round(df_txn_grouped["txns_per_user"], 1)
af.column_multi(
    df_txn_grouped,
    "main_group:N",
    "main_group:N",
    "txns_per_user:Q",
    "type:N",
    50,
    300,
    "-y",
).properties(title="Average number of transacctions per month per group")

# Future Research Suggestions

This research can be complemented with more deep-dives to better understand how we can get more users to enable the overdraft product and which of these will default on us. Here are some suggestions on topics to be explored in these deep-dives:

## Adding More Overdraft Users
- Check for users stuck at early stages (Requested, Generated) who haven't enabled their overdraft
- Check for potentially eligible users who didn't get an accepted/ rejected overdraft
- Check how long after opening their accounts do users apply to overdraft
- Use user research insights to guide further questions on this research

## Arrears Overview 
- How is overdraft interest pushing people to arrears? (overdraft days vs. volume)
- On average, how close are users to their allowed overdraft limit on each month?
- How to distinguish users that go into overdraft due to a limit change vs users who surpass their set limit?
- Why are there flex account users with overdraft?
- Try to predict which good users will go into arrears?