title: ARs due to insufficient funds overview
author: Helder Silva 
date: 2020-09-01
region: EU  
tags: transactions, rejected, balance, engage
summary:  - We had an average of 1.45 Million ARs per month, which translates into an average volume of 66.5M€ per month. - 5.6% of the transactions could have been funded through the user balance in spaces. - We found an inconsistency that would imply that the users had enough funds to perform these transactions (25.1%) - these need further investigation. - 8.2% of the users with an AR who don't have an Arranged Overdraft would potentially be eligible and open to acquire this product. - As for potential revenue we could have for the ARs for which we didn't find a successful PT, our calculations led to a monthly average of 127.4K €. France is the country registering the biggest potential loss of revenue (with a monthly average of 59.7K € for ARs excl. PTs). As for memberships, Business Card is ahead with a monthly average of 76.7K € for ARs excl. PTs. - One possible way to tap into this potential revenue is keeping on converting potential users into getting an overdraft. We need to have a deeper understanding of how these ARs relate to the user's balance at the time of the transaction to be able to identify and fix potential issues.

<div class="alert alert-block alert-success">
    <H1>ARs due to insufficient funds overview</H1>

</div>

## Summary:
This analysis aims to better understand ARs due to insufficient funds (named as ARs from here on for simplicity), namely what did the user's balance look like at the time of the transaction, their overdraft status and the potential fees lost due to these rejected transactions.

We looked into ARs that happened between June and August 2020. 

These are our main findings:
 - We had an average of 1.45 Million ARs per month, which translates into an average volume of 66.5M€ per month.
 - 5.6% of the transactions could have been funded through the user balance in spaces. 
 - We found an inconsistency that would imply that the users had enough funds to perform these transactions (25.1%) - these need further investigation.
 - 8.2% of the users with an AR who don't have an Arranged Overdraft would potentially be eligible and open to acquire this product.
 - As for potential revenue we could have for the ARs for which we didn't find a successful PT, our calculations led to a monthly average of 127.4K €. France is the country registering the biggest potential loss of revenue (with a monthly average of 59.7K € for ARs excl. PTs). As for memberships, Business Card is ahead with a monthly average of 76.7K € for ARs excl. PTs.
 - One possible way to tap into this potential revenue is keeping on converting potential users into getting an overdraft. We need to have a deeper understanding of how these ARs relate to the user's balance at the time of the transaction to be able to identify and fix potential issues.

In [53]:
cd /app/

import pandas as pd
import altair as alt
import numpy as np
from utils.datalib_database import df_from_sql
from pathlib import Path

#N26 colors
#Primary
teal = '#48AC98' 
rhubarb = '#CB7C7A'
petrol = '#266678 '
wheat = '#CDA35F'

#Secondary
pink = '#E5C3C7' #Goes with Teal
green= '#CAD7CA' #Goes with Rhubarb
blue = '#C8D7E5' #Goes with Wheat
beige = '#F5D5B9' #Goes with Petrol

In [4]:
query_jun = """
select * from dev_dbt.temp_ar_insuf_funds where txn_created::date between '2020-06-01' and '2020-06-30'
"""
query_jul = """
select * from dev_dbt.temp_ar_insuf_funds where txn_created::date between '2020-07-01' and '2020-07-31'
"""
query_aug = """
select * from dev_dbt.temp_ar_insuf_funds where txn_created::date between '2020-08-01' and '2020-08-31'
"""

In [55]:
df_jun = df_from_sql("redshiftreader", query_jun)

In [56]:
df_jul = df_from_sql("redshiftreader", query_jul)

In [57]:
df_aug = df_from_sql("redshiftreader", query_aug)

In [54]:
categories_df = pd.concat([df_jun, df_jul, df_aug])
categories_df.info()

In [11]:
categories_df["created_month"] = categories_df["txn_created"].dt.date.apply(
    lambda x: x.strftime("%Y-%m")
)
categories_df.fillna(0, inplace=True)

In [14]:
# Monthly Split
month_ar_count_df = categories_df.groupby("created_month").count().reset_index()
month_ar_count_df["txn_created"] = round(month_ar_count_df["txn_created"] / 1000000, 1)
month_ar_count_df = month_ar_count_df[["created_month", "txn_created"]]

month_ar_sum_df = categories_df.groupby("created_month").sum().reset_index()
month_ar_sum_df["txn_amount_eur"] = round(
    month_ar_sum_df["txn_amount_eur"] / 1000000, 1
)
month_ar_sum_df = month_ar_sum_df[["created_month", "txn_amount_eur"]]


# Country Split
country_ar_count_df = categories_df.groupby("tnc_country_group").count().reset_index()
country_ar_count_df["txn_created"] = round(
    country_ar_count_df["txn_created"] / 1000000, 1
)
country_ar_count_df = country_ar_count_df[["tnc_country_group", "txn_created"]]

country_ar_sum_df = categories_df.groupby("tnc_country_group").sum().reset_index()
country_ar_sum_df["txn_amount_eur"] = round(
    country_ar_sum_df["txn_amount_eur"] / 1000000, 1
)
country_ar_sum_df = country_ar_sum_df[["tnc_country_group", "txn_amount_eur"]]

# Membership Split
memb_ar_count_df = categories_df.groupby("product_id").count().reset_index()
memb_ar_count_df["txn_created"] = round(memb_ar_count_df["txn_created"] / 1000000, 2)
memb_ar_count_df = memb_ar_count_df[["product_id", "txn_created"]]

memb_ar_sum_df = categories_df.groupby("product_id").sum().reset_index()
memb_ar_sum_df["txn_amount_eur"] = round(memb_ar_sum_df["txn_amount_eur"] / 1000000, 1)
memb_ar_sum_df = memb_ar_sum_df[["product_id", "txn_amount_eur"]]

In [17]:
# Monthly Chart
month_txn_count = (
    alt.Chart(month_ar_count_df)
    .mark_rect(color=petrol, size=70)
    .encode(
        alt.X("created_month:O", title="Transaction Month"),
        alt.Y("txn_created:Q", title="Count of Rejected transactions (Millions)"),
    )
    .properties(width=400, height=400)
)


month_txn_volume = (
    alt.Chart(month_ar_sum_df)
    .mark_rect(color=blue, size=70)
    .encode(
        alt.X("created_month:O", title="Transaction Month"),
        alt.Y("txn_amount_eur:Q", title="Volume of Rejected transactions (Million €)"),
    )
    .properties(width=400, height=400)
)

month_count_text = month_txn_count.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="txn_created:Q")

month_volume_text = month_txn_volume.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="txn_amount_eur:Q")

In [18]:
# Country Chart
country_txn_count = (
    alt.Chart(country_ar_count_df)
    .mark_rect(color=wheat, size=40)
    .encode(
        alt.X("tnc_country_group:N", title="TNC Country Group"),
        alt.Y("txn_created:Q", title="Count of Rejected transactions (Millions)"),
    )
    .properties(width=400, height=400)
)


country_txn_volume = (
    alt.Chart(country_ar_sum_df)
    .mark_rect(color=beige, size=40)
    .encode(
        alt.X("tnc_country_group:N", title="TNC Country Group"),
        alt.Y("txn_amount_eur:Q", title="Volume of Rejected transactions (Million €)"),
    )
    .properties(width=400, height=400)
)

country_count_text = country_txn_count.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="txn_created:Q")

country_volume_text = country_txn_volume.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="txn_amount_eur:Q")

In [19]:
# Membership Chart
memb_txn_count = (
    alt.Chart(memb_ar_count_df)
    .mark_rect(color=teal, size=40)
    .encode(
        alt.X("product_id:N", title="Membership"),
        alt.Y("txn_created:Q", title="Count of Rejected transactions (Millions)"),
    )
    .properties(width=400, height=400)
)


memb_txn_volume = (
    alt.Chart(memb_ar_sum_df)
    .mark_rect(color=green, size=40)
    .encode(
        alt.X("product_id:N", title="Membership"),
        alt.Y("txn_amount_eur:Q", title="Volume of Rejected transactions (Million €)"),
    )
    .properties(width=400, height=400)
)

memb_count_text = memb_txn_count.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="txn_created:Q")

memb_volume_text = memb_txn_volume.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="txn_amount_eur:Q")

## ARs per Month

We had an average of 1.45 Million ARs per month, which translates into an average volume of 66.5M€ per month.

In [20]:
(month_txn_count + month_count_text) | (month_txn_volume + month_volume_text)

## ARs per TNC Country Group

In [21]:
(country_txn_count + country_count_text) | (country_txn_volume + country_volume_text)

## ARs per Membership

In [22]:
(memb_txn_count + memb_count_text) | (memb_txn_volume + memb_volume_text)

# Primary Balance Split 
### What do the balances in the primary account look like at the time of the transaction?

3% of ARs are attributed to users that have an Overdraft Enabled.

84.1% of all transactions happened when there was a positive balance in the user's primary account.


In [23]:
categories_df["balance_split"] = np.where(
    (categories_df["primary_balance"] >= 0) & (categories_df["od_enabled"] == True),
    "NON-NEGATIVE BALANCE & ARRANGED OD",
    np.where(
        (categories_df["primary_balance"] >= 0)
        & (categories_df["od_enabled"] == False),
        "NON-NEGATIVE BALANCE & UNARRANGED OD",
        np.where(
            (categories_df["primary_balance"] < 0)
            & (categories_df["od_enabled"] == True),
            "NEGATIVE BALANCE & ARRANGED OD",
            np.where(
                (categories_df["primary_balance"] < 0)
                & (categories_df["od_enabled"] == False),
                "NEGATIVE BALANCE & UNARRANGED OD",
                "UNEXPECTED BALANCE",
            ),
        ),
    ),
)

In [24]:
balance_split_df = (
    categories_df.groupby("balance_split")
    .count()
    .reset_index()
    .sort_values(by=["user_created"], ascending=False)
)
balance_split_df[["percentage"]] = balance_split_df[["txn_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
balance_split_perc_df = balance_split_df[["balance_split", "txn_created", "percentage"]]
balance_split_perc_df

Unnamed: 0,balance_split,txn_created,percentage
3,NON-NEGATIVE BALANCE & UNARRANGED OD,3644326,83.6
1,NEGATIVE BALANCE & UNARRANGED OD,585370,13.4
0,NEGATIVE BALANCE & ARRANGED OD,109276,2.5
2,NON-NEGATIVE BALANCE & ARRANGED OD,19662,0.5


In [25]:
bars = (
    alt.Chart(balance_split_df)
    .mark_rect(color=wheat, size=50)
    .encode(
        alt.X("percentage:Q", title="% of rejected transactions"),
        alt.Y("balance_split:N", title="Balance Split", sort="-x"),
    )
    .properties(width=700, height=400)
)

text = bars.mark_text(
    align="center",
    baseline="middle",
    dx=25,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="percentage:Q")

(bars + text).configure_axis(labelLimit=1000)

# Potential Balance 
### Would users with unarranged Overdraft have enough funds in their spaces?
When filtering out users with arranged overdraft at the time of the transaction, we found out that 5.6% of the transactions could have been funded through the user balance in spaces. 

However, we have inadvertently found an inconsistency that would imply that the users had enough funds to perform these transactions (25.1%).

In [26]:
unarranged_od_txns_df = categories_df[categories_df.od_enabled == False]

In [60]:
unarranged_od_txns_df["funds_categories"] = np.where(
    unarranged_od_txns_df["txn_amount_eur"] <= unarranged_od_txns_df["primary_balance"],
    "HAS SUFFICIENT FUNDS (somehow)",
    np.where(
        unarranged_od_txns_df["txn_amount_eur"]
        <= unarranged_od_txns_df["total_balance"],
        "WOULD HAVE ENOUGH FUNDS IN SPACES",
        np.where(
            unarranged_od_txns_df["txn_amount_eur"]
            > unarranged_od_txns_df["total_balance"],
            "DOESNT HAVE SUFFICIENT FUNDS",
            "EXCEPTION - MISSIG FILTERS",
        ),
    ),
)

In [28]:
funds_categories_df = (
    unarranged_od_txns_df.groupby("funds_categories")
    .count()
    .reset_index()
    .sort_values(by=["user_created"], ascending=False)
)
funds_categories_df[["percentage"]] = funds_categories_df[["txn_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
funds_categories_perc_df = funds_categories_df[
    ["funds_categories", "txn_created", "percentage"]
]
funds_categories_perc_df

Unnamed: 0,funds_categories,txn_created,percentage
0,DOESNT HAVE SUFFICIENT FUNDS,2929652,69.3
1,HAS SUFFICIENT FUNDS (somehow),1061648,25.1
2,WOULD HAVE ENOUGH FUNDS IN SPACES,238396,5.6


In [29]:
bars = (
    alt.Chart(funds_categories_df)
    .mark_rect(color=petrol, size=50)
    .encode(
        alt.X("percentage:Q", title="% of rejected transactions"),
        alt.Y("funds_categories:N", title="Funds Categories", sort="-x"),
    )
    .properties(width=700, height=400)
)

text = bars.mark_text(
    align="center",
    baseline="middle",
    dx=25,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="percentage:Q")

(bars + text).configure_axis(labelLimit=1000)

# 'HAS SUFFICIENT FUNDS' label detail

Even though the majority of these unexpected cases had insufficient funds for the transaction at some point in the transaction day (which could mean there is a time delta between the AR decision and the balance calculation in Mambu), there are still quite a few cases that always had sufficient funds in the transaction day (37.8%). 

One potential explanation for this could be eventual blocking rules applied to fraudsters. However, when looking into it, we found that only 0.9% of the users that apparently had sufficient funds at the time of the transaction were identified as fraudsters.

Therefore, these discrepancies should be further investigated by looking into other possible reasons for this mismatch, such as potential differences between Mambu and Aurum, or investigating if our customers are reaching out to us due to this issue. 

In [30]:
suff_funds_df = unarranged_od_txns_df[
    unarranged_od_txns_df.funds_categories == "HAS SUFFICIENT FUNDS (somehow)"
]
len(suff_funds_df)

1061648

In [61]:
suff_funds_df["value_in_day"] = np.where(
    (suff_funds_df["txn_amount_eur"] >= suff_funds_df["min_balance_in_txn_day"])
    & (suff_funds_df["txn_amount_eur"] <= suff_funds_df["max_balance_in_txn_day"]),
    "HAD INSUFFICIENT VALUE IN DAY",
    "ALWAYS HAD SUFFICIENT FUNDS IN DAY (somehow)",
)

In [32]:
suff_funds_group_df = (
    suff_funds_df.groupby("value_in_day")
    .count()
    .reset_index()
    .sort_values(by=["user_created"], ascending=False)
)
suff_funds_group_df[["percentage"]] = suff_funds_group_df[["txn_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
suff_funds_group_perc_df = suff_funds_group_df[
    ["value_in_day", "txn_created", "percentage"]
]
suff_funds_group_perc_df

Unnamed: 0,value_in_day,txn_created,percentage
1,HAD INSUFFICIENT VALUE IN DAY,660287,62.2
0,ALWAYS HAD SUFFICIENT FUNDS IN DAY (somehow),401361,37.8


In [33]:
bars = (
    alt.Chart(suff_funds_group_df)
    .mark_rect(color=blue, size=50)
    .encode(
        alt.X("percentage:Q", title="% of rejected transactions"),
        alt.Y("value_in_day:N", title="Funds in transaction day", sort="-x"),
    )
    .properties(width=600, height=400)
)

text = bars.mark_text(
    align="center",
    baseline="middle",
    dx=25,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="percentage:Q")

(bars + text).configure_axis(labelLimit=1000)

In [34]:
fraudster_query = """
select 
tnc_country_group,
count(distinct case when n.user_created is not null then taif.user_created end) as fraudster_user_count,
count(distinct case when n.user_created is null then taif.user_created end) as non_fraudster_user_count,
round((fraudster_user_count::numeric / non_fraudster_user_count::numeric)*100, 1) as perc_fraudsters
from dev_dbt.temp_ar_insuf_funds taif 
left join  ni_fraudster_category n using(user_created)
where txn_amount_eur <= primary_balance
group by 1
"""

In [62]:
fraudster_df = df_from_sql("redshiftreader", fraudster_query)

In [36]:
fraudster_df

Unnamed: 0,tnc_country_group,fraudster_user_count,non_fraudster_user_count,perc_fraudsters
0,NEuro,52,2837,1.8
1,AUT,22,9698,0.2
2,GrE,189,28237,0.7
3,ESP,135,15672,0.9
4,ITA,238,31152,0.8
5,DEU,367,71962,0.5
6,FRA,1497,127260,1.2


# Arranged Overdraft Users detail

As expected, the majority of ARs for users with Overdraft Enabled would exceed their overdraft limit (54.5% of the transactions) however we still found some balance inconsistencies (41.1% of these transactions seem to have happened when the users could have transacted without exceeding their Overdraft limit).

In [38]:
od_users_df = categories_df[categories_df.od_enabled == True]

In [63]:
od_users_df["funds_categories"] = np.where(
    od_users_df["txn_amount_eur"] <= od_users_df["primary_balance"],
    "HAS SUFFICIENT FUNDS (somehow)",
    np.where(
        od_users_df["txn_amount_eur"] <= od_users_df["total_balance"],
        "WOULD HAVE ENOUGH FUNDS IN SPACES",
        np.where(
            (-1 * od_users_df["primary_balance"]) + od_users_df["txn_amount_eur"]
            > od_users_df["arranged_od_limit_eur"],
            "TXN EXCEEDS OD LIMIT",
            np.where(
                (-1 * od_users_df["primary_balance"]) + od_users_df["txn_amount_eur"]
                <= od_users_df["arranged_od_limit_eur"],
                "TXN DOESNT EXCEED OD LIMIT (somewhow)",
                "EXCEPTION - MISSIG FILTERS",
            ),
        ),
    ),
)

In [40]:
od_users_group_df = (
    od_users_df.groupby("funds_categories")
    .count()
    .reset_index()
    .sort_values(by=["user_created"], ascending=False)
)
od_users_group_df[["percentage"]] = od_users_group_df[["txn_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
od_users_group_perc_df = od_users_group_df[
    ["funds_categories", "txn_created", "percentage"]
]
od_users_group_perc_df

Unnamed: 0,funds_categories,txn_created,percentage
2,TXN EXCEEDS OD LIMIT,70301,54.5
1,TXN DOESNT EXCEED OD LIMIT (somewhow),45222,35.1
0,HAS SUFFICIENT FUNDS (somehow),7696,6.0
3,WOULD HAVE ENOUGH FUNDS IN SPACES,5719,4.4


In [41]:
bars = (
    alt.Chart(od_users_group_perc_df)
    .mark_rect(color="#266678", size=50)
    .encode(
        alt.X("percentage:Q", title="% of rejected transactions"),
        alt.Y("funds_categories:N", title="", sort="-x"),
    )
    .properties(width=600, height=400)
)

text = bars.mark_text(
    align="center",
    baseline="middle",
    dx=25,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="percentage:Q")

(bars + text).configure_axis(labelLimit=1000)

# Which Unarranged Overdraft Users are potentially eligible/ open to apply for Overdraft?

In this case we consider any user that has German or Austrian T&Cs and an internal rating class up to 12 as a proxy of eligibility for Allowed Overdraft.

Also, in the last few months we have launched some Overdraft inforcard campaigns. Assuming that users that saw these inforcards and did not apply for Overdraft are currently not interested in the product, we can consider that only the users that didn't see these inforcard would potentially be open to apply to it.

Therefore, we can assume that 8.2% of the users who don't have an Arranged Overdraft would be eligible and potentially open to acquire this product.

In [42]:
od_eligibility_query = """
with od_inofcard_viewed as (
select 
u.user_created,
min(collector_tstamp) as first_viewed
from dbt.snowplow s 
inner join mcv_infocard on mcv_infocard.id = se_property
inner join mcv_infocard_template on mcv_infocard.template_id = mcv_infocard_template.id
inner join dbt.zrh_users u using (user_id)
where event_type in ('686', '687', '454131163','881898916','-14084404','-56984998','-894','6257')
and name in ('OVERDRAFT_CRM_ONTOP', 'OVERDRAFT_ELIGIBLE_NOW', 'OVERDRAFT_CRM_STANDALONE')
and collector_tstamp >= '2020-06-01'
group by 1
),
od_eligibility as (
select 
f.user_created, 
case when f.tnc_country_group not in ('DEU', 'AUT') then 'Outside OD Markets'
when lsa.user_created is not null then 'Potentially Eligible for OD' 
else 'In OD Markets but not eligible for OD' end as eligible_for_od,
min(txn_created) as min_txn_created
from dev_dbt.temp_ar_insuf_funds f
left join etl_reporting.ls_score_aud lsa 
on f.user_created = lsa.user_created  
and f.txn_created between lsa.rev_timestamp and lsa.end_timestamp 
and score_status = 'VALID'
and purpose = 'OVERDRAFT'
where f.od_enabled is false
group by 1, 2
)
select 
case when iv.user_created is not null and eligible_for_od = 'Potentially Eligible for OD' then 'Eligible for OD and viewed inforcard' 
when iv.user_created is null and eligible_for_od = 'Potentially Eligible for OD' then 'Eligible for OD and did not view inforcard' 
else eligible_for_od
end as eligible_for_od,
count(*) as distinct_users
from od_eligibility e
left join od_inofcard_viewed iv
on e.user_created = iv.user_created 
and first_viewed < min_txn_created
group by 1
"""

In [64]:
od_eligibility_df = df_from_sql("redshiftreader", od_eligibility_query)
od_eligibility_df[["percentage"]] = od_eligibility_df[["distinct_users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
od_eligibility_df.head()

In [44]:
bars = (
    alt.Chart(od_eligibility_df)
    .mark_rect(color=teal, size=50)
    .encode(
        alt.X("percentage:Q", title="% of rejected transactions"),
        alt.Y("eligible_for_od:N", title="Arranged OD eligibility", sort="-x"),
    )
    .properties(width=600, height=400)
)

text = bars.mark_text(
    align="center",
    baseline="middle",
    dx=25,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="percentage:Q")

(bars + text).configure_axis(labelLimit=1000)

# Potentially Lost Revenue

In order to calculate the potentially lost revenue of these ARs, we looked at the average revenue per PT in the same period and multiplied it by the number of ARs. 

Since users can have an ARs and take immediate action upon it (e.g. transferring funds from spaces into the primary account), we also looked into AAs in the same day and same characteristics as ARs, and then checked if these AAs turned into PTs.

In [65]:
revenue_df = df_from_sql(
    "redshiftreader",
    "Research/ar_transactions_due_to_insufficient_funds_20200826/ar_potential_revenue.sql",
)

In [46]:
monthly_revenue_mean_df = revenue_df.groupby("month").mean().reset_index()
monthly_revenue_mean_df = monthly_revenue_mean_df[["month", "avg_fee_per_pt_eur"]]
monthly_revenue_mean_df

Unnamed: 0,month,avg_fee_per_pt_eur
0,2020-06-01,0.2548
1,2020-07-01,0.272041
2,2020-08-01,0.264898


In [47]:
monthly_revenue_df = revenue_df.groupby("month").sum().reset_index()
monthly_revenue_df["month"] = monthly_revenue_df["month"].apply(
    lambda x: x.strftime("%Y-%m")
)
monthly_revenue_df["estimated_all_ar_fee_k"] = round(
    monthly_revenue_df["all_ar_estimated_fee"] / 1000, 1
)
monthly_revenue_df["estimated_ar_excl_pt_fee_k"] = round(
    monthly_revenue_df["ar_excl_pt_estimated_fee"] / 1000, 1
)
monthly_revenue_df = monthly_revenue_df[
    [
        "month",
        "ar_count",
        "ar_with_also_pt_count",
        "all_ar_estimated_fee",
        "estimated_all_ar_fee_k",
        "ar_excl_pt_estimated_fee",
        "estimated_ar_excl_pt_fee_k",
    ]
]

## Potentially Lost Revenue per Month


In the charts below, you can find the potential revenue for all ARs (on the left), and ARs excluding PTs (on the right).

We can find a monthly average of 134.5 thousand euros for all ARs and 127.4 thousand euros for ARs excluding PTs. This corresponds to a difference of 7.1 thousand euros between these 2 averages.

In [66]:
al_ar_bars = (
    alt.Chart(monthly_revenue_df)
    .mark_rect(color=pink, size=100)
    .encode(
        alt.X("month:O", title="Transaction Month"),
        alt.Y(
            "estimated_all_ar_fee_k:Q", title="Potentially Lost Revenue (Thousands €)"
        ),
    )
    .properties(width=400, height=400)
)

all_ar_text = al_ar_bars.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="estimated_all_ar_fee_k:Q")


ar_excl_pt_bars = (
    alt.Chart(monthly_revenue_df)
    .mark_rect(color=rhubarb, size=100)
    .encode(
        alt.X("month:O", title="Transaction Month"),
        alt.Y(
            "estimated_ar_excl_pt_fee_k:Q",
            title="Potentially Lost Revenue (Thousands €)",
        ),
    )
    .properties(width=400, height=400)
)

ar_excl_pt_text = ar_excl_pt_bars.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="estimated_ar_excl_pt_fee_k:Q")

(al_ar_bars.properties(title="All ARs") + all_ar_text) | (
    ar_excl_pt_bars.properties(title="ARs without identified PTs") + ar_excl_pt_text
)

In [67]:
country_revenue_df = revenue_df.groupby("tnc_country_group").sum().reset_index()
country_revenue_df["estimated_all_ar_fee_k"] = round(
    country_revenue_df["all_ar_estimated_fee"] / 1000, 1
)
country_revenue_df["estimated_ar_excl_pt_fee_k"] = round(
    country_revenue_df["ar_excl_pt_estimated_fee"] / 1000, 1
)
country_revenue_df = country_revenue_df[
    [
        "tnc_country_group",
        "ar_count",
        "ar_with_also_pt_count",
        "all_ar_estimated_fee",
        "estimated_all_ar_fee_k",
        "ar_excl_pt_estimated_fee",
        "estimated_ar_excl_pt_fee_k",
    ]
]

## Potentially Lost Revenue per TNC Country

France is the country registering the biggest potential loss of revenue (with a monthly average of 59.7K € for ARs excl. PTs), which is consistent with the fact that France also had the highest total count and volume of ARs.

In [68]:
al_ar_bars = (
    alt.Chart(country_revenue_df)
    .mark_rect(color=beige, size=50)
    .encode(
        alt.X("tnc_country_group:O", title="Transaction TNC Country"),
        alt.Y(
            "estimated_all_ar_fee_k:Q", title="Potentially Lost Revenue (Thousands €)"
        ),
    )
    .properties(width=400, height=400)
)

all_ar_text = al_ar_bars.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="estimated_all_ar_fee_k:Q")


ar_excl_pt_bars = (
    alt.Chart(country_revenue_df)
    .mark_rect(color=wheat, size=50)
    .encode(
        alt.X("tnc_country_group:O", title="Transaction TNC Country"),
        alt.Y(
            "estimated_ar_excl_pt_fee_k:Q",
            title="Potentially Lost Revenue (Thousands €)",
        ),
    )
    .properties(width=400, height=400)
)

ar_excl_pt_text = ar_excl_pt_bars.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="estimated_ar_excl_pt_fee_k:Q")

(al_ar_bars.properties(title="All ARs") + all_ar_text) | (
    ar_excl_pt_bars.properties(title="ARs without identified PTs") + ar_excl_pt_text
)

In [51]:
memb_revenue_df = revenue_df.groupby("product_id").sum().reset_index()
memb_revenue_df["estimated_all_ar_fee_k"] = round(
    memb_revenue_df["all_ar_estimated_fee"] / 1000, 1
)
memb_revenue_df["estimated_ar_excl_pt_fee_k"] = round(
    memb_revenue_df["ar_excl_pt_estimated_fee"] / 1000, 1
)
memb_revenue_df = memb_revenue_df[
    [
        "product_id",
        "ar_count",
        "ar_with_also_pt_count",
        "all_ar_estimated_fee",
        "estimated_all_ar_fee_k",
        "ar_excl_pt_estimated_fee",
        "estimated_ar_excl_pt_fee_k",
    ]
]

## Potentially Lost Revenue per Membership

In the case of the memberships, there doesn't seem to be a clear pattern between count/volume of ARs and potential lost revenue due to these (e.g. even though Standard had by far the highest count/ volume of ARs, the Business Card membership seems to have the highest potentially lost revenue (monthly average of 76.7K € for ARs excl. PTs).

We can also see that Flex Account Monthly is showing a negative revenue. This is due to the fact that it seems that users in this product had higher volumes of ATM withdrawals than other card transactions, which generated a negative average fee per PT in this case. 

In [69]:
al_ar_bars = (
    alt.Chart(memb_revenue_df)
    .mark_rect(color=green, size=40)
    .encode(
        alt.X("product_id:O", title="Transaction Membership"),
        alt.Y(
            "estimated_all_ar_fee_k:Q", title="Potentially Lost Revenue (Thousands €)"
        ),
    )
    .properties(width=400, height=400)
)

all_ar_text = al_ar_bars.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="estimated_all_ar_fee_k:Q")


ar_excl_pt_bars = (
    alt.Chart(memb_revenue_df)
    .mark_rect(color=teal, size=40)
    .encode(
        alt.X("product_id:O", title="Transaction Membership"),
        alt.Y(
            "estimated_ar_excl_pt_fee_k:Q",
            title="Potentially Lost Revenue (Thousands €)",
        ),
    )
    .properties(width=400, height=400)
)

ar_excl_pt_text = ar_excl_pt_bars.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="estimated_ar_excl_pt_fee_k:Q")

(al_ar_bars.properties(title="All ARs") + all_ar_text) | (
    ar_excl_pt_bars.properties(title="ARs without identified PTs") + ar_excl_pt_text
)

# Future Recommendations:
Assuming the goal of tapping into these potentially lost revenues, here are some recommendations based on the results of this analysis:
- Keeping on converting potential users into getting an overdraft. These ARs can be a powerful contextual tool to drive users into getting an overdraft, since this product would make it less likely for them to have their transactions rejected due to insufficient funds.
- Conducting a more in-depth research to understand how these ARs relate to the user's balance at the time of the transaction. As mentioned above, we should be looking into potential differences between Mambu and Aurum balances at the time of these transactions, as well as investigating if our customers are reaching out to us due to this issue. 