title: Quick look at increase of chargebacks after API automaiton was implemented
author: Vladas Jankus 
date: 2021-01-08
region: EU  
tags: chargeback, cards, card, product, automation, api, unauthorized, unauthorised, back office ticket, bo ticket, bo, back office
summary: - It was noticed that the volume of unauthorized chargebacks spiked after API automation was implemented. This happened for ITA, AUT and FRA in May and now in other markets in November. This small ugly research identifies, that around 40-50% of spikes are caused by customers raising repeated BO tickets and customers raising invalid tickets that get rejected. Remaining volume is of distinct customers with valid requests.

In [1]:
import pandas as pd
import altair as alt
from utils.datalib_database import df_from_sql
from IPython.display import HTML

In [39]:
# monthly aggregated data of overall volumes
cbs_m = df_from_sql(
    "redshiftreader",
    """
        select
            to_char(sf_back_office_ticket.created_date, 'yyyy-mm') as created,
            case
                when zrh_users.country_tnc_legal in ('DEU','AUT','FRA','ITA','ESP','GBR') 
                    then trim(zrh_users.country_tnc_legal)
                when zrh_users.country_tnc_legal in ('NLD','IRL','GRC','PRT','BEL','LUX','FIN','EST','LVA','LTU','SVK','SVN') 
                    then 'GrE'
                when zrh_users.country_tnc_legal in ('POL', 'NOR', 'SWE', 'DNK', 'ISL', 'LIE', 'CHE') 
                    then 'NEuro'
                else 'RoE'
            end as market,
            'automated' as type,
            count(*)
        from sf_back_office_ticket
        inner join dbt.zrh_users
            on sf_back_office_ticket.contact = zrh_users.contact_id
        where chargeback_type = 'Unauthorized Transaction'
            and created_date > date('2020-01-01')
            and created_date < date('2021-01-01')
            and created_by_role = 'API User'
        group by 1, 2
        
        union all 
        
        select
            to_char(sf_back_office_ticket.created_date, 'yyyy-mm') as created,
            case
                when zrh_users.country_tnc_legal in ('DEU','AUT','FRA','ITA','ESP','GBR') 
                    then trim(zrh_users.country_tnc_legal)
                when zrh_users.country_tnc_legal in ('NLD','IRL','GRC','PRT','BEL','LUX','FIN','EST','LVA','LTU','SVK','SVN') 
                    then 'GrE'
                when zrh_users.country_tnc_legal in ('POL', 'NOR', 'SWE', 'DNK', 'ISL', 'LIE', 'CHE') 
                    then 'NEuro'
                else 'RoE'
            end as market,
            'total' as type,
            count(*)
        from sf_back_office_ticket
        inner join dbt.zrh_users
            on sf_back_office_ticket.contact = zrh_users.contact_id
        where chargeback_type = 'Unauthorized Transaction'
            and created_date > date('2020-01-01')
            and created_date < date('2021-01-01')
        group by 1, 2
    """,
)

# repeated bo_ticets aggregated monthly
cbs_u = df_from_sql(
    "redshiftreader",
    """
        select 
            to_char(created, 'yyyy-mm') as created,
            market,
            automated,
            count(*)
        from (
            select
                sf_back_office_ticket.created_date as created,
                case
                    when zrh_users.country_tnc_legal in ('DEU','AUT','FRA','ITA','ESP','GBR') 
                        then trim(zrh_users.country_tnc_legal)
                    when zrh_users.country_tnc_legal in ('NLD','IRL','GRC','PRT','BEL','LUX','FIN','EST','LVA','LTU','SVK','SVN') 
                        then 'GrE'
                    when zrh_users.country_tnc_legal in ('POL', 'NOR', 'SWE', 'DNK', 'ISL', 'LIE', 'CHE') 
                        then 'NEuro'
                    else 'RoE'
                end as market,
                created_by_role = 'API User' as automated,
                lag(id) over (partition by contact order by created_date) is not null as repeated_chargeback
            from sf_back_office_ticket
            inner join dbt.zrh_users
                on sf_back_office_ticket.contact = zrh_users.contact_id
            where chargeback_type = 'Unauthorized Transaction'
                and created_date >= date('2020-01-01')
                and created_date < date('2021-01-01')
        )
        where repeated_chargeback
        group by 1, 2, 3
    """,
)

# amount and count aggregated (taken from a separate sql query)
raw_data = {
    "created": [
        "2020-05",
        "2020-09",
        "2020-11",
        "2020-10",
        "2020-4",
        "2020-07",
        "2020-08",
        "2020-06",
        "2020-12",
    ],
    "cbs_count": ["2960", "1042", "4654", "1766", "22", "769", "752", "1262", "6600"],
    "cbs_value": [
        "154529.64",
        "84300.03",
        "303169.06",
        "126786.94",
        "598.86",
        "40547.16",
        "43495.55",
        "95279.41",
        "475103.45",
    ],
    "avg_value": [
        "52.20595946",
        "80.90214012",
        "65.14161152",
        "71.79328426",
        "27.22090909",
        "52.72712614",
        "57.83982713",
        "75.4987401",
        "71.98537121",
    ],
}

cbs_calc = pd.DataFrame(
    raw_data, columns=["created", "cbs_count", "cbs_value", "avg_value"]
)

# rejected bo_ticets aggregated monthly
cbs_reject = df_from_sql(
    "redshiftreader",
    """
        select 
            to_char(created_date, 'yyyy-mm') as created,
            sum((resolution_type = 'Rejected' and created_by_role = 'API User')::int) as rejections_api,
            avg((resolution_type = 'Rejected')::int::float) as reject_ratio
        from sf_back_office_ticket
        where chargeback_type = 'Unauthorized Transaction'
            and created_date >= date('2020-01-01')
            and created_date < date('2021-01-01')
        group by 1
        order by 1 desc
    """,
)

# removed rejected and repeated to flatten the chart
cbs_flat_1 = df_from_sql(
    "redshiftreader",
    """
        with base as (
            select
                sf_back_office_ticket.created_date as created,
                created_by_role = 'API User' as automated,
                resolution_type = 'Rejected' as rejected,
                resolution_type is null as no_resolution,
                lag(id) over (partition by contact order by created_date) is not null as repeated_chargeback
            from sf_back_office_ticket
            where chargeback_type = 'Unauthorized Transaction'
                and created_date >= date('2020-01-01')
                and created_date < date('2021-01-01')
        )
        select 
            to_char(created, 'yyyy-mm') as created,
            count(*)
        from base 
        where not repeated_chargeback
            and not rejected
        group by 1 
    """,
)

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 114, "funcName": "df_from_sql", "created": "20210108T093447", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "87278fba-f4af-4b38-a5ca-b57cf14cd879", "hostname": "172.22.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 24.728, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 124, "funcName": "df_from_sql", "created": "20210108T093512", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "87278fba-f4af-4b38-a5ca-b57cf14cd879", "hostname": "172.22.0.4"}
{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "

# Quick look at increase of chargebacks after API automaiton was implemented
Vladas Jankus<br/>
2021-01-08

### Contents:
* [1. Spike in May](#spike_may)
* [2. Spike in November-December](#spike_novdec)
* [Assumption 1: People are filing for more smaller value amount chargebacks](#ass1)
  * [Repeated ticket volumes](#sub1)
  * [Average chargeback value](#sub2)
* [Assumption 2: There is an increase in card fraud](#ass2)
* [Assumption 3: People like the convenience of filing a CB via in-app form](#ass3)
* [Assumption 4: People are incorrectly using the in-app](#ass4)
* [Flattening the spikes](#conc)



## TL; DR

It was noticed, that after API-automization was introduced, there's an increased amount of unauthorized chargebacks raised. Automization was introduced in three markets (ITA, AUT, FRA) in May, and the rest of markets in November. FRA was discontinued in April, because the chargebacks spiked. Now the same behaviour appears to happen again in November.

Chart below visualizes the issue:

In [4]:
source = cbs_m.groupby(["created", "type"])["count"].sum().reset_index()

alt.Chart(source).mark_area(opacity=0.3).encode(
    x="created:T", y=alt.Y("count:Q", stack=None), color="type:N"
).properties(width=500, height=200, title="Chargeback volume by created type").display()

Unauthorized transaction chargeback volumes saw a spike in May and November-December. In both cases it looks that spikes were caused by solely by automated chargebacs. 

Unautomated ticket volumes slightly decreased, from 1500-2000 in Q1 to around 1000-1200 in the remainder of the year. In June-October, the weight of automated tickets seems to be reasonable - less than 1000 cases. That roughly replaces the volume lost on unautomated tickets.

Months of May, November and December have much higher spikes of automated tickets. 2000-2500 in volume.

The purpose of this research is to quickly look into the numbers to understand what was causing the spikes.

It was found, that the spikes are partially caused by users who created repeated BO tickets and by incorrectly raised requests that got rejected. This covers roughly around 40%-50% of the spike volume. 

Remaining volume is attributed to distinct customers raising valid requests. It could be attributed to behavioural reasons, especially if the customers were notified about the launch and started using in-app form on a higher level.

## 1. Spike in May <a class="anchor" id="spike_may"></a>

Spike in may could have been caused only by Italy, Austria and/or France because only these markets had automation launched in May. Let's look at these markets separately first.

Note: scroll right to see all charts. Charts have different scale.

In [22]:
charts = []
markets = ["FRA", "AUT", "ITA"]
for market in markets:
    source = cbs_m.loc[cbs_m["market"] == market].sort_values(by=["type"])
    charts.append(
        alt.Chart(source)
        .mark_area(opacity=0.3)
        .encode(x="created:T", y=alt.Y("count:Q", stack=None), color="type:N")
        .properties(width=600, height=200, title=market)
    )
alt.hconcat(*charts)

There are several things to be emphasized here:

1. First month after automation introduction saw a huge spike. In AUT and ITA markets this spike was not repeated again during the year.

2. In AUT and ITA, increased volume of tickets remained throughout the year. At a lower level than in May, but at least 2-3 times higher than before. In France, the volume dropped back to usual after the automation was discontinued.

3. July-October, looked normal in the overall chart, because the volume from AUT and ITA markets does not have a large ratio when comparison with others. Looking at these markets separately, it's evident that in July-October the ticket volumes remained elevated.

4. There is no spike in November-December. There's an increase, but not significant compared to May.

## 2. Spike in November-December <a class="anchor" id="spike_novdec"></a>

Since FRA, AUT and ITA markets did not see a significant spike in November - December, let's look at all other markets only.

Note: scroll right to see all charts. Charts have different scale.

In [25]:
charts = []
markets = ["DEU", "ESP", "GrE", "NEuro"]

for market in markets:
    source = cbs_m.loc[cbs_m["market"] == market].sort_values(by=["type"])
    charts.append(
        alt.Chart(source)
        .mark_area(opacity=0.3)
        .encode(x="created:T", y=alt.Y("count:Q", stack=None), color="type:N")
        .properties(width=600, height=200, title=market)
    )
alt.hconcat(*charts)

All markets have significant increases in November-December. It looks like the volume of unautomated chargebacks sharnk in most of them, but this was outweighted by immense automated ticket volumes.

Partially this could be comparable to a spike in May for AUT, ITA and FRA markets. This would allow to assume that volumes in 2021 should drop, albeit to a level that is still higher than previously.

Issue does not seem to be maket specific. Same pattern repeats in all markets.

## Assumption 1: People are filing for more smaller value amount chargebacks <a class="anchor" id="ass1"></a>

#### Repeated ticket volumes. <a class="anchor" id="sub1"></a>

At first, let's look at overall volumes of repeated chargebacks. Below chart shows the volume of unauthorised chargeback tickets that were created by a user who already had at least one ticket in 2020. From overall volumes it excludes all first tickets and shows only repeated volumes.

NOTE: Chart shows automated volume vs unautomated volume. Previous charts displayed automated inside total volume of tickets.

In [26]:
source = cbs_u.groupby(["created", "automated"])["count"].sum().reset_index()

alt.Chart(source).mark_area(opacity=0.3).encode(
    x="created:T", y=alt.Y("count:Q", stack=None), color="automated:N"
).properties(width=600, height=200, title="Repeated chargebacks").display()

There are evident spikes of repeated tickets in the same months where we have spikes overall.

However, it remains unknown why volume looks normal in June-October. Also, the spikes of ~800 repeated tickets does not explain the overall spikes, that are around 1500-2000 tickets above usual level.

Further, let's focus only on automated tickets. Below chart shows the ratio of repeated tickets in all automated tickets.

In [28]:
# take only automated tickets and group only by date
cbs_m_ovr = (
    cbs_m.loc[cbs_m["type"] == "total"].groupby("created")["count"].sum().reset_index()
)
cbs_u_ovr = (
    cbs_u.loc[cbs_u["automated"] == True]
    .groupby("created")["count"]
    .sum()
    .reset_index()
)

# join both on created month and get the percentage of repeated tickets
cbs_c = pd.merge(cbs_m_ovr, cbs_u_ovr, on="created", how="left")
cbs_c["repeat_ratio"] = cbs_c["count_y"] / cbs_c["count_x"]

# draw chart
source = cbs_c
alt.Chart(source).mark_line().encode(
    x="created:T", y=alt.Y("repeat_ratio:Q")
).properties(width=600, height=200, title="Repeat ratio").display()

Even though it was noted that repeated tickets spiked at <800 in volume - not comprising all of the overall spikes, their ratio to overall automated tickets still increased significantly during the spikes. 

It suggests that repeated chargebacks contributed to around 20% of the spike. Remaining volume is yet of unknown origin.

Below chart is just a quick look how repeated tickets look in volume when compared to all automated tickets.

In [30]:
# take only automated tickets and group only by date
cbs_m_ovr = (
    cbs_m.loc[cbs_m["type"] == "total"].groupby("created")["count"].sum().reset_index()
)
cbs_m_ovr["legend"] = "total_volume"
cbs_u_ovr = (
    cbs_u.loc[cbs_u["automated"] == True]
    .groupby("created")["count"]
    .sum()
    .reset_index()
)
cbs_u_ovr["legend"] = "repeated_tickets"

# join both on created month and get the percentage of repeated tickets
cbs_c = pd.concat([cbs_m_ovr, cbs_u_ovr])

# draw chart
source = cbs_c
alt.Chart(source).mark_area(opacity=0.3).encode(
    x="created:T", y=alt.Y("count:Q", stack=None), color="legend:N"
).properties(width=600, height=200, title="Repeated vs. all automated").display()

#### Average chargeback value <a class="anchor" id="sub2"></a>

Below is an estimate of the average chargeback value.

In [32]:
source = cbs_calc

alt.Chart(source).mark_area(opacity=0.3).encode(
    x="created:T", y="avg_value:Q"
).properties(width=600, height=200, title="Average chargeback value").display()

This does not seem to correspond to the spikes in May and November-December. There is no evidence to confirm that our customers are creating more chargebacks in less value.

## Assumption 2: There is an increase in card fraud <a class="anchor" id="ass2"></a>

Not looking into this assumption at the moment. Need to investigate how to quantify card fraud first. Remains an open question.

## Assumption 3: People like the convenience of filing a CB via in-app form <a class="anchor" id="ass3"></a>

That could be true, but difficult to confirm from the data perspective.

Since we collect no feedback (my guess) on the in-app form, we cannot be certain on this point.

We can only speculate, but it appears that the spikes are behavioural since they happen right after launching the feature in every market. Or at least in the case of ITA and AUT, with a hopefull drop in January for other markets.

If no other assumption explains the spikes fully, they could probably be attributed to people liking the in-app form and using it more.

However this assumption would make no sense if the customers were not aware about the launch. If they were not notified, such a sudden behavioural spike would make no sense.

## Assumption 4: People are incorrectly using the in-app <a class="anchor" id="ass4"></a>

There was a suggestion to look at the amount of rejections. Charts below display, that rejections really seem to have at least some influence to the spikes.

In [35]:
source = cbs_reject

chart_ratio = (
    alt.Chart(source)
    .mark_line()
    .encode(x="created:T", y=alt.Y("reject_ratio:Q"))
    .properties(width=600, height=200, title="Reject ratio")
)

chart_count = (
    alt.Chart(source)
    .mark_area(opacity=0.3)
    .encode(x="created:T", y=alt.Y("rejections_api:Q"))
    .properties(width=600, height=200, title="Rejections for API chargebacks")
)

chart_ratio | chart_count

There was an obvious spike in May, contributing to around 400 additional tickets that were rejected.

There's also a visible increase in November, not too large though. I don't know how long the chargeback process takes, but there's a possibility that this volume might still grow. At least for the month of December it should definitely still grow to some extent.

## Flattening the spikes <a class="anchor" id="conc"></a>

It was determined, that rejected tickets and recontacts contributed to the spikes. Below chart just displays the volume of chargebacks without repeated tickets and rejected tickets.

In [40]:
source = cbs_flat_1

alt.Chart(source).mark_area(opacity=0.3).encode(
    x="created:T", y=alt.Y("count:Q")
).properties(
    width=600, height=200, title="Non-repeated non-rejected chargebacks"
).display()

After removing recontacts and rejected tickets, it looks like some amount of spikes still remains. These are all unique customers raising valid (except Nov-Dec where some are not rejected yet) chargeback requests.

This could possibly be related with the fact that customers are aware of the launch for this feature and therefore the activity increases after launch as more people start using it. This could partially explain why after a month it decreased in ITA and AUT. It also did not spike in ITA and AUS in November-December.

In [43]:
HTML(
    """
<script>
    code_show=true; 
    function code_toggle() {
        if (code_show){
            $('div.input').hide();
        } else {
            $('div.input').show();
            }
        code_show = !code_show
 
        $('div.output_subarea').css("text-align", "center"); 
        $('body').css("font-family", "Montserrat, sans-serif");
        $('h1').css("font-family", "Karla, sans-serif");
        $('h2').css("font-family", "Karla, sans-serif");
    } 
    $( document ).ready(code_toggle);
    
    </script>
    <form action="javascript:code_toggle()">
        <input type="submit" value="Click here to toggle on/off the raw code.">
    </form>
"""
)