title: What happens after transactions fail? 
author: Helder Silva 
date: 2020-09-22 
region: EU
link: https://docs.google.com/presentation/d/1CtA6llm9O_3wrJ8Hi75ZweRIh-MFM0CSgGg3Gn54gKE
tags: transactions, rejected, cs contacts, engage 
summary: - We have a monthly average of 2.67 million rejected transactions, out of these, 96% are card transactions and insufficient funds correspond to 56% of the failed card transactions in this period. - About 2% of the users with a rejected transaction contact CS on the same day of the transaction. - The total volume of CS contacts by these users leads to an average cost of 116.45K€ per month, and out of these, 38.37K€ (33%) are due to insufficient funds. - 'Suspected fraud' and 'PIN tries' lead to the biggest proportion of contacts per failed card transaction (with 7.2% and 5.8% respectively).- The 2 card rejection reasons with the most successful retries on the same day are 'Activity count limit exceeded' (for which we found 60% of PTs) and 'Incorrect PIN' (with 55% of PTs).- Also, the most common rejection reasons have a low percentage of successful retries (e.g. for the most common rejection reason, 'Insufficient Funds' we only found 5% of PTs).

<div class="alert alert-block alert-success">
    <H1>What happens after transactions fail? </H1>

</div>

This analysis is a follow-up to the analysis [Why do transactions fail?](https://docs.google.com/presentation/d/1BlSEc2HGTCRhxOylcXDteZl3XKuU-4vOC_VFyGZw_pQ) by Wendy Vu, and aims to look into what happens after a failed transaction, more specifically whether our users contact our Customer Support or have a successful attempt at completing the transaction. 

You can find a slide deck summarizing this research [here](https://docs.google.com/presentation/d/1CtA6llm9O_3wrJ8Hi75ZweRIh-MFM0CSgGg3Gn54gKE/).

We'll be looking at rejected transactions that occurred between June and August 2020, and the results are split in the following sections:
 1. [Failed Card Transactions vs Failed Direct Debits](#section1)
 2. [How much are we spending in CS contacts after failed transactions?](#section2)
 3. [How many of the failed card transactions are successful after failing?](#section3)
 

### Here are the main insights: 
- We have a **monthly average of 2.67 million rejected transactions**, out of these, 96% are card transactions and insufficient funds correspond to 56% of the failed card transactions in this period.
- About **2% of the users with a rejected transaction contact CS** on the same day of the transaction. 
- The total volume of CS contacts by these users leads to an **average cost of 116.45K€ per month**, and out of these, 38.37K€ (33%) are due to insufficient funds.
- 'Suspected fraud' and 'PIN tries' lead to the biggest proportion of contacts per failed card transaction (with 7.2% and 5.8% respectively).
- The 2 card rejection reasons with the most successful retries on the same day are 'Activity count limit exceeded' (for which we found 60% of PTs) and 'Incorrect PIN' (with 55% of PTs).
- Also, the most common rejection reasons have a low percentage of successful retries (e.g. for the most common rejection reason, 'Insufficient Funds' we only found 5% of PTs).


In [5]:
import pandas as pd
import numpy as np
import altair as alt
from utils.datalib_database import df_from_sql

import utils.altair_functions as af

<a id='section1'></a>

# Failed Card Transactions vs Failed Direct Debits
When looking into the whole volume of rejected transactions, we can see a monthly average of 2.67 million rejected transactions. Out of these, 96% correspond to Card Transactions, and 4% correspond to Direct Debits.

In [6]:
df_other_reasons = df_from_sql("redshiftreader", "other_reasons_query.sql")

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 114, "funcName": "df_from_sql", "created": "20201006T082918", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "53c96b65-885c-4da2-a29b-feb9645cb64a", "hostname": "172.18.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 94.731, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 124, "funcName": "df_from_sql", "created": "20201006T083053", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "53c96b65-885c-4da2-a29b-feb9645cb64a", "hostname": "172.18.0.4"}


In [10]:
split_ar_contacts_per_month = df_from_sql(
    "redshiftreader", "split_ar_contacts_per_month_query.sql"
)
split_ar_contacts_per_month.head()

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 114, "funcName": "df_from_sql", "created": "20201006T090021", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "53c96b65-885c-4da2-a29b-feb9645cb64a", "hostname": "172.18.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 290.5188, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 124, "funcName": "df_from_sql", "created": "20201006T090511", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "53c96b65-885c-4da2-a29b-feb9645cb64a", "hostname": "172.18.0.4"}


Unnamed: 0,txn_month,reject_reason,transaction_group,label,value
0,2020-07,Count limit exceeded,card,distinct_txn_users,81834.0
1,2020-07,other,card,distinct_txn_users,38213.0
2,2020-07,other,card,distinct_contact_users,2269.0
3,2020-07,Count limit exceeded,card,distinct_contact_users,736.0
4,2020-07,Count limit exceeded,card,sum_txn,133224.0


In [11]:
df_failed_txns_count = split_ar_contacts_per_month[
    split_ar_contacts_per_month["label"] == "sum_txn"
]
df_failed_txns_count = df_failed_txns_count[
    (df_failed_txns_count["reject_reason"] == "ALL AR")
    | (df_failed_txns_count["reject_reason"] == "ALL FAILED DD")
]
df_failed_txns_count = (
    df_failed_txns_count.groupby(["transaction_group", "txn_month"]).sum().reset_index()
)
df_failed_txns_count["count (K)"] = round(df_failed_txns_count["value"] / 1000, 2)
af.column_multi(
    df_failed_txns_count,
    "transaction_group:N",
    "txn_month:O",
    "count (K):Q",
    "transaction_group:N",
    250,
    400,
    "x",
).properties(title="All Failed Transactions")

In [12]:
df_failed_txns_count = split_ar_contacts_per_month[
    split_ar_contacts_per_month["label"] == "sum_txn"
]
df_failed_txns_count = df_failed_txns_count[
    (df_failed_txns_count["reject_reason"] == "ALL AR")
    | (df_failed_txns_count["reject_reason"] == "ALL FAILED DD")
]
df_failed_txns_count = df_failed_txns_count.groupby(["txn_month"]).sum().reset_index()
df_failed_txns_count["count (K)"] = round(df_failed_txns_count["value"] / 1000000, 2)
df_failed_txns_count["total"] = "total"
df_failed_txns_count = df_failed_txns_count.groupby("total").mean().reset_index()
df_failed_txns_count  # provides monthly avg of all txns

df_failed_txns_count = split_ar_contacts_per_month[
    split_ar_contacts_per_month["label"] == "sum_txn"
]
df_failed_txns_count = df_failed_txns_count[
    (df_failed_txns_count["reject_reason"] == "ALL AR")
    | (df_failed_txns_count["reject_reason"] == "ALL FAILED DD")
]
df_failed_txns_count = (
    df_failed_txns_count.groupby(["transaction_group"]).sum().reset_index()
)
df_failed_txns_count["count (K)"] = (
    round(df_failed_txns_count["value"] / 1000000, 2) / 3
)
df_failed_txns_count["Percentage ARs"] = df_failed_txns_count[["count (K)"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)

# df_failed_txns_count # provides percentage of all ARs based on monthly average

## Reasons for failed card transactions detail
Given that the vast majority of the failed transactions are Card Transactions, we'll be looking into these in more detail. Here are all the reasons we can find for these rejected transactions. The main insight here is that insufficient funds correspond to more than half of the failed card transactions in this period (56%). Therefore, we explored this one further in the analysis [ARs due to insufficient funds overview](http://research.tech26.de/reports/ar_transactions_due_to_insufficient_funds_20200901.html).

In [13]:
df_other_reasons[["Percentage ARs"]] = df_other_reasons[["ar_count"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
df_other_reasons

Unnamed: 0,reason_detail,ar_count,Percentage ARs
0,Insufficient funds,4360616,56.2
1,Do not honour,507264,6.5
2,Lost card pick up,458777,5.9
3,Transaction not allowed for cardholder,432311,5.6
4,Activity count limit exceeded,391966,5.1
5,Settings,383271,4.9
6,Incorrect PIN,241695,3.1
7,Card limits,231936,3.0
8,Expired card,141500,1.8
9,Account seized,135573,1.7


In [14]:
df_other_reasons["Reason Detail"] = df_other_reasons["reason_detail"]
df_other_reasons = df_other_reasons[df_other_reasons["Percentage ARs"] > 0]
af.column_single_label(
    df_other_reasons, af.petrol, "Reason Detail:O", "Percentage ARs:Q", 800, 400, "-y"
).configure_axis(labelLimit=1000)

<a id='section2'></a>

# How much are we spending in CS contacts after failed transactions?

Since we can't have a direct match between a user having a rejected transaction and contacting CS, we will need to make some assumptions to calculate the cost of the CS contacts we get due to failed transactions. Here are those assumptions:


### 1. CS Tags Match
First, we started by looking into what CS Tags that might correspond to these failed transactions. Since there is no specific tag for this, we defined that any tags under Manage Payments in the [Tagging System list](https://number26-jira.atlassian.net/wiki/spaces/CSKB/pages/1169227779/Tagging+System+2.0) will be considered for this analysis (you can find the full list of tags with a flag of which ones are being considered in the table below)

In [15]:
conatcts_per_tag_query = """
with ar as (
select user_created,
created::date as ar_date, 
count(*) as ar_count
from dwh_sneaky_transaction st
where type = 'AR'
and created between '2020-06-01' and '2020-08-31'
group by 1, 2
),
contacts as (
select user_created, 
contact_date::date as contact_date,
cs_tag,
count(*) as contact_count
from dbt.sf_all_contacts 
inner join dbt.zrh_users u
using(user_id)
where contact_date between '2020-06-01' and '2020-08-31'
and c_level_report = true
group by 1, 2, 3
), joins as (
select * from ar 
left join contacts c 
on ar.user_created = c.user_created 
and ar_date = contact_date 
where contact_count is not null
order by 6 desc 
)
select 
cs_tag, 
case when cs_tag in ('limits', 'pos/e_commerce', 'nfc', 'apple/google_pay', 'atm/fair_use', 'cash26', 'chargeback', 'transferwise', 'dt/standing_order', 'moneybeam', 'ct/missing_ct', 'direct_debit', 'top_up', 'fx_account_funding', 'payment_investigation')
then 'Y' else '' end as is_relevant_tag,
sum(contact_count) as contact_count
from joins 
group by 1,2
order by 3 desc
"""

In [16]:
df_contacts_per_tag = df_from_sql("redshiftreader", conatcts_per_tag_query)

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 114, "funcName": "df_from_sql", "created": "20201006T090519", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "53c96b65-885c-4da2-a29b-feb9645cb64a", "hostname": "172.18.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 173.7127, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 124, "funcName": "df_from_sql", "created": "20201006T090812", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "53c96b65-885c-4da2-a29b-feb9645cb64a", "hostname": "172.18.0.4"}


In [17]:
df_contacts_per_tag[["Percentage Contacts"]] = df_contacts_per_tag[
    ["contact_count"]
].apply(lambda x: round((x / x.sum()) * 100, 1), axis=0)
df_contacts_per_tag

Unnamed: 0,cs_tag,is_relevant_tag,contact_count,Percentage Contacts
0,pos/e_commerce,Y,23192,22.5
1,contact_data,,12457,12.1
2,card_activation/pin,,10747,10.4
3,login,,8456,8.2
4,chargeback,Y,5855,5.7
5,atm/fair_use,Y,5734,5.6
6,ct/missing_ct,Y,3571,3.5
7,garnishment,,3187,3.1
8,other,,2930,2.8
9,aml,,2760,2.7


In [18]:
df_failed_txns_contacts = split_ar_contacts_per_month[
    split_ar_contacts_per_month["label"] == "sum_contact"
]
df_failed_txns_contacts = df_failed_txns_contacts[
    (df_failed_txns_contacts["reject_reason"] == "ALL AR")
    | (df_failed_txns_contacts["reject_reason"] == "ALL FAILED DD")
]
df_failed_txns_contacts = (
    df_failed_txns_contacts.groupby(["transaction_group", "txn_month"])
    .sum()
    .reset_index()
)
df_failed_txns_contacts["count (K)"] = round(df_failed_txns_contacts["value"] / 1000, 2)

### 2. We will only consider CS contacts that happen on the same day as the transaction.

For this, we will look into users who have a failed transaction and contact us on the day of that transaction, under the CS tags filtered above. As for card transactions, we can verify a monthly average of 556.1K users with an AR, of which 10.9K (2%) contact CS on the same day of the transaction. 

In [19]:
card_contacts_per_month = split_ar_contacts_per_month[
    split_ar_contacts_per_month["transaction_group"] == "card"
]

ar_contacts_per_month = card_contacts_per_month[
    card_contacts_per_month["reject_reason"] == "ALL AR"
]

ar_contacts_per_month["value"] = round(ar_contacts_per_month["value"], 2)
ar_contacts_per_month["count (K)"] = round(ar_contacts_per_month["value"] / 1000, 2)


dd_contacts_per_month = split_ar_contacts_per_month[
    split_ar_contacts_per_month["transaction_group"] == "direct debit"
]
all_dd_contacts_per_month = dd_contacts_per_month[
    dd_contacts_per_month["reject_reason"] == "ALL FAILED DD"
]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [20]:
users_ars = af.column_single_label(
    ar_contacts_per_month[ar_contacts_per_month["label"] == "distinct_txn_users"],
    af.teal,
    "txn_month:O",
    "count (K):Q",
    350,
    400,
    "x",
)
users_contact = af.column_single_label(
    ar_contacts_per_month[ar_contacts_per_month["label"] == "distinct_contact_users"],
    af.rhubarb,
    "txn_month:O",
    "count (K):Q",
    350,
    400,
    "x",
)

users_ars.properties(title="All Users with ARs") | users_contact.properties(
    title="All Users with CS contacts in same day of AR"
)

In [21]:
all_dd_contacts_per_month["value"] = round(all_dd_contacts_per_month["value"], 2)
all_dd_contacts_per_month["count (K)"] = round(
    all_dd_contacts_per_month["value"] / 1000, 2
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


As for failed direct debits, we have 62.5K users with this transaction type on average per month. Out of these, 1.1K (2%) contacted CS on the same day of the transaction.

When looking into all users with failed transactions, we have 618.6K users, and 2% of these contacted CS on the same day of the transaction (12K per month on average).

In [22]:
users_dd = af.column_single_label(
    all_dd_contacts_per_month[
        all_dd_contacts_per_month["label"] == "distinct_txn_users"
    ],
    af.green,
    "txn_month:O",
    "count (K):Q",
    350,
    400,
    "x",
)
users_contact_dd = af.column_single_label(
    all_dd_contacts_per_month[
        all_dd_contacts_per_month["label"] == "distinct_contact_users"
    ],
    af.pink,
    "txn_month:O",
    "count (K):Q",
    350,
    400,
    "x",
)

users_dd.properties(title="All Users with failed DDs") | users_contact_dd.properties(
    title="All Users with CS contacts in same day of DD"
)

### 3. We sum the number of contacts found for the users above

Since the same user can have multiple failed transactions (see [Why do transactions fail?](https://docs.google.com/presentation/d/1BlSEc2HGTCRhxOylcXDteZl3XKuU-4vOC_VFyGZw_pQ)) and therefore multiple contacts, we will be focusing on the total number of contacts moving forward. Below is an overview of the total number of contacts that happened on the same day of a failed transaction. We have an average of 16.6K monthly contacts, where 92% corresponds to card transactions and 8% are attributed to direct debits. 

Since direct debits correspond to only 4% of failed transactions, we can see a 4% increase of their proportion when looking into CS contacts. 

In [23]:
af.column_multi(
    df_failed_txns_contacts,
    "transaction_group:N",
    "txn_month:O",
    "count (K):Q",
    "transaction_group:N",
    250,
    400,
    "x",
).properties(title="All CS contacts in same day as Failed Transactions")

In [24]:
df_failed_txns_contacts["total"] = "total"
df_failed_txns_contacts = df_failed_txns_contacts.groupby("total").mean().reset_index()
df_failed_txns_contacts  # all contacts count
df_failed_txns_contacts

df_failed_txns_count = split_ar_contacts_per_month[
    split_ar_contacts_per_month["label"] == "sum_contact"
]
df_failed_txns_count = df_failed_txns_count[
    (df_failed_txns_count["reject_reason"] == "ALL AR")
    | (df_failed_txns_count["reject_reason"] == "ALL FAILED DD")
]
df_failed_txns_count = (
    df_failed_txns_count.groupby(["transaction_group"]).sum().reset_index()
)
df_failed_txns_count["count (K)"] = round(df_failed_txns_count["value"] / 1000, 2) / 3
df_failed_txns_count["Percentage contacts"] = df_failed_txns_count[["count (K)"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
# df_failed_txns_count # perc contacts per transaction group

And below we can see the detail of contact volume per transaction rejection reason. Here we can once again see that the reason 'Insufficient funds' has the highest volume both for card transactions and direct debits.

In [25]:
card_contacts_reason = card_contacts_per_month[
    card_contacts_per_month["label"] == "sum_contact"
]
card_contacts_reason = card_contacts_reason[
    card_contacts_reason["reject_reason"] != "ALL AR"
]
af.column_multi(
    card_contacts_reason,
    "reject_reason:N",
    "txn_month:O",
    "value:Q",
    "reject_reason:N",
    80,
    400,
    "x",
).properties(title="Volume of Contacts per Card Rejection Reason")

In [26]:
dd_contacts_reason = dd_contacts_per_month[
    dd_contacts_per_month["label"] == "sum_contact"
]
dd_contacts_reason = dd_contacts_reason[
    dd_contacts_reason["reject_reason"] != "ALL FAILED DD"
]

af.column_multi(
    dd_contacts_reason,
    "reject_reason:N",
    "txn_month:O",
    "value:Q",
    "reject_reason:N",
    135,
    400,
    "x",
).properties(title="Volume of Contacts per Direct Debit Rejection Reason")

In [27]:
card_contacts_perc = card_contacts_per_month[
    (card_contacts_per_month["label"] == "sum_contact")
    | (card_contacts_per_month["label"] == "sum_txn")
]
card_contacts_perc = (
    card_contacts_perc.groupby(["reject_reason", "label"]).mean().reset_index()
)
card_contacts_perc["value"] = round(card_contacts_perc["value"], 2)
card_contacts_perc = card_contacts_perc.pivot(
    index="reject_reason", columns="label", values="value"
).reset_index()
card_contacts_perc["perc_contacts"] = round(
    (card_contacts_perc["sum_contact"] / card_contacts_perc["sum_txn"]) * 100, 1
)
card_contacts_perc.sort_values(by=["perc_contacts"], ascending=False, inplace=True)


dd_contacts_perc = dd_contacts_per_month[
    (dd_contacts_per_month["label"] == "sum_contact")
    | (dd_contacts_per_month["label"] == "sum_txn")
]
dd_contacts_perc = (
    dd_contacts_perc.groupby(["reject_reason", "label"]).mean().reset_index()
)
dd_contacts_perc["value"] = round(dd_contacts_perc["value"], 2)
dd_contacts_perc = dd_contacts_perc.pivot(
    index="reject_reason", columns="label", values="value"
).reset_index()
dd_contacts_perc["perc_contacts"] = round(
    (dd_contacts_perc["sum_contact"] / dd_contacts_perc["sum_txn"]) * 100, 1
)
dd_contacts_perc.sort_values(by=["perc_contacts"], ascending=False, inplace=True)

Another metric we can look into to understand which failed transactions lead to more contacts per failed transaction is the percentage of contacts per rejection reason for failed transactions. As for card transactions, 'Suspected fraud' and 'PIN tries' lead to the biggest proportion (with 7.2% and 5.8% respectively), and 'incorrect account number' is the one creating the most contacts per direct debit rejected transaction with 15.8%. 

In [28]:
card_perc = af.column_single_label(
    card_contacts_perc, af.pink, "reject_reason:N", "perc_contacts:Q", 550, 400, "-y"
)
dd_perc = af.column_single_label(
    dd_contacts_perc, af.blue, "reject_reason:N", "perc_contacts:Q", 250, 400, "-y"
)
card_perc.properties(
    title="Percentage of Contacts per failed Card Transaction"
) | dd_perc.properties(title="Percentage of Contacts per failed DD Transaction")

### 4. Multiply the number of contacts by an estimate of cost per contact

In this last step, we multiply the average of the monthly contact volume by an estimate of cost per contact (7€). The main takeaway here is that we have spent on average 116.45K€ per month of contacts from card and direct debit failed transactions, and out of these, 38.37K€ (33%) are due to insufficient funds.

In [29]:
df_contact_cost = card_contacts_per_month[
    card_contacts_per_month["label"] == "sum_contact"
]
df_contact_cost = df_contact_cost.groupby(["reject_reason"]).mean().reset_index()
df_contact_cost["avg_month_contact_cost"] = round(df_contact_cost["value"] * 7, 2)
df_contact_cost["avg_month_contact_cost (K)"] = round(
    df_contact_cost["avg_month_contact_cost"] / 1000, 2
)

In [30]:
df_dd_contact_cost = dd_contacts_per_month[
    dd_contacts_per_month["label"] == "sum_contact"
]
df_dd_contact_cost = df_dd_contact_cost.groupby(["reject_reason"]).mean().reset_index()
df_dd_contact_cost["avg_month_contact_cost"] = round(df_dd_contact_cost["value"] * 7, 2)
df_dd_contact_cost["avg_month_contact_cost (K)"] = round(
    df_dd_contact_cost["avg_month_contact_cost"] / 1000, 2
)

In [31]:
df_contact_cost_excl_all = df_contact_cost[df_contact_cost["reject_reason"] != "ALL AR"]
df_contact_cost_excl_all["Percentage Cost"] = df_contact_cost_excl_all[
    ["avg_month_contact_cost (K)"]
].apply(lambda x: round((x / x.sum()) * 100, 1), axis=0)

df_dd_contact_cost_excl_all = df_dd_contact_cost[
    df_dd_contact_cost["reject_reason"] != "ALL FAILED DD"
]
df_dd_contact_cost_excl_all["Percentage Cost"] = df_dd_contact_cost_excl_all[
    ["avg_month_contact_cost (K)"]
].apply(lambda x: round((x / x.sum()) * 100, 1), axis=0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


In [32]:
af.column_single_label(
    df_contact_cost,
    af.rhubarb,
    "reject_reason:N",
    "avg_month_contact_cost (K):Q",
    800,
    400,
    "-y",
).properties(
    title="Monthly cost of contacts in the same day as failed card transactions"
)

In [33]:
af.column_single_label(
    df_dd_contact_cost,
    af.pink,
    "reject_reason:N",
    "avg_month_contact_cost (K):Q",
    400,
    300,
    "-y",
).properties(
    title="Monthly cost of contacts in the same day as failed direct debit transactions"
)

Below we can see the percentage of the costs per failed transaction reason. The most interesting aspect here is that even though only 3% of the failed card transactions due to insufficient funds result in a contact, these contact volumes correspond to 22.7% of our cost in CS contacts.

In [34]:
card = af.bar_single_label(
    df_contact_cost_excl_all,
    af.petrol,
    "Percentage Cost:Q",
    "reject_reason:N",
    300,
    500,
    "-x",
).properties(title="Percentage of costs per card rejection reason")
dd = af.bar_single_label(
    df_dd_contact_cost_excl_all,
    af.blue,
    "Percentage Cost:Q",
    "reject_reason:N",
    300,
    500,
    "-x",
).properties(title="Percentage of costs per direcr debit rejection reason")
card | dd

<a id='section3'></a>
# How many of the failed card transactions are successful after the first failure?

Since users can have a failed card transaction and take immediate action upon it (e.g. introducing the correct pin after an attempt with an incorrect one), we also looked into AAs in the same day and same characteristics as ARs, and then checked if these AAs turned into PTs. We will exclude failed direct debits here since these are triggered by the merchant, and eventual new attempts are unlikely to happen in the same day. 

Below you can find a chart with the percentage of PTs found for these failed card transactions:

In [35]:
df_success_ar = df_from_sql("redshiftreader", "all_ar_potential_revenue.sql")

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 114, "funcName": "df_from_sql", "created": "20201006T090814", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "53c96b65-885c-4da2-a29b-feb9645cb64a", "hostname": "172.18.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 173.6507, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 124, "funcName": "df_from_sql", "created": "20201006T091108", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "53c96b65-885c-4da2-a29b-feb9645cb64a", "hostname": "172.18.0.4"}


In [36]:
df_success_ar_reasons = df_success_ar.groupby(["reject_reason"]).sum().reset_index()
df_success_ar_reasons["perc_pts"] = round(
    (df_success_ar_reasons["ar_with_also_pt_count"] / df_success_ar_reasons["ar_count"])
    * 100,
    1,
)
df_success_ar_reasons = df_success_ar_reasons[
    ["reject_reason", "ar_count", "ar_with_also_pt_count", "perc_pts"]
].sort_values(by=["ar_count"], ascending=False)
df_success_ar_reasons

Unnamed: 0,reject_reason,ar_count,ar_with_also_pt_count,perc_pts
8,Insufficient funds,4360616,238797.0,5.5
5,Do not honour,506802,74912.0,14.8
9,Lost card pick up,458761,29262.0,6.4
4,Count limit exceeded,391966,236260.0,60.3
11,Settings,383271,46432.0,12.1
13,other,259800,13988.0,5.4
3,Cardholder Not allowed,250713,10810.0,4.3
7,Incorrect PIN,241613,131695.0,54.5
2,Card limits,231936,67987.0,29.3
6,Expired card,139572,41233.0,29.5


Here we can see that the 2 rejection reasons with the most successful retries are 'Activity count limit exceeded' (for which we found 60% of PTs) and 'Incorrect PIN' (with 55% of PTs). We can also verify that the most common rejection reasons have a low percentage of PTs (e.g. for the most common rejection reason, 'Insufficient Funds' we only found 5% of PTs).

In [37]:
af.column_single_label(
    df_success_ar_reasons, af.petrol, "reject_reason:N", "perc_pts:Q", 800, 400, "-y"
)

In [38]:
df_success_ar_country = df_success_ar.groupby(["tnc_country_group"]).sum().reset_index()
df_success_ar_country["perc_pts"] = round(
    (df_success_ar_country["ar_with_also_pt_count"] / df_success_ar_country["ar_count"])
    * 100,
    1,
)

df_success_ar_product = df_success_ar.groupby(["product_id"]).sum().reset_index()
df_success_ar_product["perc_pts"] = round(
    (df_success_ar_product["ar_with_also_pt_count"] / df_success_ar_product["ar_count"])
    * 100,
    1,
)

### Successful PTs after ARs - Country & Membership details

On average, Austria and Germany have the highest percentage of PTs (17.1%) whereas France has the lowest with 8.5%. We seem to have less differences across Memberships, with You leading with 13.9% of found PTs and Flex falling behind with 9.1%.

In [39]:
country = af.column_single_label(
    df_success_ar_country, af.teal, "tnc_country_group:N", "perc_pts:Q", 400, 400, "-y"
)
product = af.column_single_label(
    df_success_ar_product, af.wheat, "product_id:N", "perc_pts:Q", 400, 400, "-y"
)

country.properties(title="Percetage of PTs after AR per Country") | product.properties(
    title="Percetage of PTs after AR per Membership"
)

In [40]:
df_success_ar_product = (
    df_success_ar.groupby(["reject_reason", "product_id"]).sum().reset_index()
)
df_success_ar_product["perc_pts"] = round(
    df_success_ar_product["ar_with_also_pt_count"] / df_success_ar_product["ar_count"],
    2,
)

Below you can find a detailed view of the Percentage of PTs per Country and Membership:

In [41]:
af.column_multi(
    df_success_ar_product,
    "reject_reason:N",
    "product_id:N",
    "perc_pts:Q",
    "reject_reason:N",
    80,
    400,
    "-y",
).properties(title="Percentage of PTs after AR per Country")

In [42]:
df_success_ar_country = (
    df_success_ar.groupby(["reject_reason", "tnc_country_group"]).sum().reset_index()
)
df_success_ar_country["perc_pts"] = round(
    df_success_ar_country["ar_with_also_pt_count"] / df_success_ar_country["ar_count"],
    2,
)
df_success_ar_country["country"] = df_success_ar_country["tnc_country_group"]

In [43]:
af.column_multi(
    df_success_ar_country,
    "reject_reason:N",
    "country:N",
    "perc_pts:Q",
    "reject_reason:N",
    80,
    400,
    "-y",
).properties(title="Percentage of PTs after AR per Membership")