title: Do Overdraft infocards help users top-up and avoid arrears?
author: Helder Silva 
date: 2021-07-16
region: EU
tags: overdraft, infocards, bank products, credit, top-up, arrears, ab test
summary:  In order to answer this question, we ran a couple of A/B test experiments: Sixty days usage infocard test - Displays information that users started using their arranged overdraft, and is meant to be displayed a repeated time 60 days after it was last shown if applicable. Fifty percent usage infocard test - Displays information that users have crossed 50% of their arranged overdraft limit. We would recommend adopting the Sixty days usage infocard moving forward, and not recommend using the Fifty percent usage infocard as it is moving forward (a potential alternative for this one could be sending a reminder when users are closer to falling into arrears).

<div class="alert alert-block alert-success">
    <H1>Do Overdraft infocards help users top-up and avoid arrears? (Part I)</H1>
</div>

In order to answer the question above, we ran a couple of A/B test experiments:
 - <font color='#48AC98'><b>Sixty days usage infocard test</b></font> - Displays information that users started using their arranged overdraft, and is meant to be displayed a repeated time 60 days after it was last shown if applicable
 - <font color='#48AC98'><b>Fifty percent usage infocard test</b></font>  - Displays information that users have crossed 50% of their arranged overdraft limit. In this test we have 2 variants:
    - One infocard that displays the amount of overdraft users
    - One infocard that displays that they are using 50% of their overdraft limit
    
[Here](https://www.figma.com/file/kaE5VHKLfu82RekVXnd8Aa/Overdraft-experiments?node-id=7%3A182) you can find the designs used for each of these infocards.

## Contents
 - [Sixty days usage infocard test results](#section1)
 - [Fifty Percent usage infocard test results](#section2)
 - [Do we have users in both experiments?](#section3)
 - [Do all users in the experiment have OD Enabled?](#section4)
 - [Were experiment users part of the overdraft reduction initiative?](#section5)

## Experimental Design
For each of the 3 infocards we want to test, users that meet the infocart trigger conditions will be randomly split into 2 groups:
 - One control group that will not receive the infocard
 - One variant group that will receive the infocard
 
This split was automatically handled by plutonium (the overdraft microservice) as soon as the users met the trigger condition.


### Hypothesis

For each of the experiments, we have 2 success measures, so we'll be defining hypothesis for each of these:

<font color='#266678'><b>Top-up rates increase</b></font>
 - **H0:** There are no significant differences in top-up rate within 7 days of meeting the trigger condition
 - **H1:** The top-up rates for the variant group are significantly higher than the control group within 7 days of meeting the trigger condition

<font color='#266678'><b>Arrears rates decrease</b></font>
 - **H0:** There are no significant differences in arrears rate within 30 days of meeting the trigger condition
 - **H1:** The arrears rates for the variant group are significantly lower than the control group within 30 days of meeting the trigger condition

### Experiment Analysis

The experiment ran between 2021-04-29 and 2021-06-04. We will be running a Z-Test for each of the experiments and success measures, and assume a significance level of 0.05.

## Key take-aways:
### Sixty Day Infocard Experiment:
 - We have significantly higher top-up rates for the variant group in all tests of the sixty day experiment. This means that for all users that received the infocars, we verified a 1 percentage point increase in top-up rates for users that received the infocard (from 67.8% to 68.8%).
 - However, we don't have significantly lower arrears rates for the variant group in the sixty day experiment.
 - **Recommendation**: Given that top-up rates results for this experiment were quite positive, <font color='#48AC98'><b>we would recommend adopting this infocard moving forward.</b></font> 
 

### Fifty Percent Infocard Experiment:
 - Both variants of this experiment had similar results.
 -  We have significantly higher top-up rates for variant users who clicked the inforcards (vs. login in control). However, we found the opposite effect for the variant users who viewed this infocard.
 - We did not find significantly lower arrears rates for the variant users in the Percent Reminder Infocard. However, we have significantly higher arrears rates for variant users who viewed the inforcard (vs. login in control).
 - We also found that only about 64% of the users in the fifty percent usage infocars had an overdraft enabled when they met the trigger condition.
 - **Recommendation**: Since we didn't find a consistently favorable trend for variant users in this experiment (neither for top-up rates nor arrears rates), <font color='#CB7C7A'><b>we would not recommend using this inforcard as it is moving forward.</b></font>  A potential alternative for this one could be sending a reminder when users are closer to falling into arrears (e.g at 90%+ usage of their overdraft). We would also suggest to only include users with overdraft enabled in future iterations.

In [1]:
cd/app/

In [2]:
!pip install duckdb
!pip install altair
!pip install statsmodels

In [85]:
import pandas as pd
from utils.datalib_database import df_from_sql

import utils.altair_functions as af
import altair as alt
import numpy as np
import scipy.stats as stats
import statsmodels.api as sm


import duckdb

con = duckdb.connect(database=":memory:", read_only=False)

# define colors
color_scale_1 = alt.Scale(domain=["variant", "control"], range=[af.teal, "#98A5A5"])

color_scale_2 = alt.Scale(
    domain=["control", "reminder_use", "reminder_percent"],
    range=["#98A5A5", af.petrol, af.teal],
)

In [4]:
experiment_query = """ 
with dups as (
select user_id, 
count(*) as n_experiments
from pu_experiments_result
where created between '2021-04-29' and '2021-06-04' -- excluding dates when card was not visible in dash
group by 1
),
top_ups as (
select 
user_id,
to_char(txn_date, 'YYYY-MM-DD') as top_up_date
from dbt.zrh_txn_day 
inner join dbt.zrh_users using (user_created)
inner join pu_experiments_result pu using (user_id)
where n_ext_total_in > 0
and txn_date >= '2021-04-29'
group by 1, 2
), 
arrears as (
select 
user_id,
to_char(end_time, 'YYYY-MM-DD') as arrears_date
from dbt.bp_overdraft_users
inner join dbt.zrh_users using (user_created)
inner join pu_experiments_result using (user_id)
where timeframe = 'day'
and od_enabled_flag 
and outstanding_balance_eur > max_amount_cents::numeric/100
and end_time >= '2021-04-29'
group by 1, 2
)
select 
to_char(created, 'YYYY-MM-DD') as created, 
pu.user_id,
experiment_name, 
user_in_control_group,
coalesce(experiment_outcome, 'control') as experiment_outcome,
n_experiments,
case when max_amount_cents::numeric = 0 then 0 else round(outstanding_balance_eur::numeric/(max_amount_cents::numeric/100), 2)*100 end as perc_usage,
max(case when t.user_id is not null then 1 else 0 end) as has_top_up,
max(case when a.user_id is not null then 1 else 0 end) as has_arrears,
max(case when ld.user_created is not null then 1 else 0 end) as has_login,
n_experiments - 1 as both_exp_dummy,
case when user_in_control_group then 1 else 0 end as control_dummy
from pu_experiments_result pu
inner join dbt.zrh_users u using (user_id)
left join dups using (user_id)
left join top_ups t 
on t.user_id = pu.user_id
and top_up_date::date between created::date and created::date + interval '7 day'
left join arrears a 
on a.user_id = pu.user_id
and arrears_date::date between created::date and created::date + interval '30 day'
left join dbt.zrh_login_day ld
on ld.user_created = u.user_created
and login_date between created::date and created::date + interval '7 day'
left join dbt.bp_overdraft_users ou
on u.user_created = ou.user_created 
and timeframe = 'day' 
and end_time::date = created::date
where created between '2021-04-29' and '2021-06-04' -- excluding dates when card was not visible in dash
group by 1, 2, 3, 4, 5, 6, 7
"""

sixty_day_infocards_query = """
with infocards as (
select 
user_id,
name as infocard_name,
count(case when se_label = 'cta1' then 1 end) as n_action_clicked, 
count(case when se_label = 'cta2' then 1 end) as n_dismissed_clicked, 
count(case when se_label is null then 1 end) as n_views
from dbt.stg_txn_events c
inner join mcv_infocard mi on mi.id = c.se_property
inner join mcv_infocard_template mit on mi.template_id = mit.id
where c.event_type in ('686', '687', '454131163','881898916','-14084404','-56984998','-894','6257')
and c.event_dt between '2021-04-29' and '2021-06-04'
and mi.created between '2021-04-29' and '2021-06-04'
and name in ('infocard_overdraft_first_sixty_day')
group by 1, 2
)
select 
user_id,
experiment_name, 
coalesce(experiment_outcome, 'control') as experiment_outcome, 
user_in_control_group,
infocard_name,
coalesce(n_action_clicked, 0) as n_action_clicked, 
coalesce(n_dismissed_clicked, 0) as n_dismissed_clicked, 
coalesce(n_views, 0) as n_views
from pu_experiments_result pu 
left join infocards
using (user_id)
where created between '2021-04-29' and '2021-06-04'
and experiment_name = 'sixty-days-usage-info-card'
"""

fifty_perc_infocards_query = """
with infocards as (
select 
user_id,
name as infocard_name,
count(case when se_label = 'cta1' then 1 end) as n_action_clicked, 
count(case when se_label = 'cta2' then 1 end) as n_dismissed_clicked, 
count(case when se_label is null then 1 end) as n_views
from dbt.stg_txn_events c
inner join mcv_infocard mi on mi.id = c.se_property
inner join mcv_infocard_template mit on mi.template_id = mit.id
where c.event_type in ('686', '687', '454131163','881898916','-14084404','-56984998','-894','6257')
and c.event_dt between '2021-04-29' and '2021-06-04'
and mi.created between '2021-04-29' and '2021-06-04'
and name in ('infocard_overdraft_limit_reminder_percent', 'infocard_overdraft_limit_reminder_use')
group by 1, 2
)
select 
user_id,
experiment_name, 
coalesce(experiment_outcome, 'control') as experiment_outcome, 
user_in_control_group,
infocard_name,
coalesce(n_action_clicked, 0) as n_action_clicked, 
coalesce(n_dismissed_clicked, 0) as n_dismissed_clicked, 
coalesce(n_views, 0) as n_views
from pu_experiments_result pu 
left join infocards
using (user_id)
where created between '2021-04-29' and '2021-06-04'
and experiment_name = 'fifty-percent-usage-exceeded'
"""

In [6]:
experiment_df = df_from_sql("redshiftreader", experiment_query)

In [7]:
sixty_day_infocards_df = df_from_sql("redshiftreader", sixty_day_infocards_query)

In [9]:
fifty_perc_infocards_df = df_from_sql("redshiftreader", fifty_perc_infocards_query)

<a id='section1'></a>
# Sixty days usage infocard test results

## The Sample

Below we can see that we have an evenly split sample of about 16.7K users for this experiment. 

In [21]:
sixty_day_query = """
select * from experiment_df
where experiment_name = 'sixty-days-usage-info-card'
"""

sixty_day_sample_query = """
select 
case when user_in_control_group then 'control' else 'variant' end as user_split,
count(*) as n_users
from sixty_day_df
group by 1
"""

In [22]:
sixty_day_df = con.execute(sixty_day_query).fetchdf()

con.register("sixty_day_df", sixty_day_df)
sixty_day_sample_df = con.execute(sixty_day_sample_query).fetchdf()

In [23]:
# build chart
chart = (
    alt.Chart(sixty_day_sample_df)
    .mark_bar()
    .encode(
        x=alt.X("n_users:Q", title="Customers"),
        y=alt.Y("user_split:N", title=None),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_1,
        ),
    )
    .properties(
        width=600,
        height=200,
        title="Customers in the sixty days usage infocard experiment",
    )
)

text = chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dx=20,
).encode(text="n_users:Q")

chart + text

## Infocard Interactions

Below you have an overview of how users interact with the N26 app after being selected for this experiment.

In [24]:
sixty_day_infocards_query = """
with unions as (
select 
'logins' as event_type,
count(case when not user_in_control_group and has_login = 0 then 1 end) as no_event,
count(case when not user_in_control_group and has_login > 0 then 1 end) as has_event
from sixty_day_df
union all 
select 
'views' as event_type,
count(case when not user_in_control_group and n_views = 0 then 1 end) as no_event,
count(case when not user_in_control_group and n_views > 0 then 1 end) as has_event
from sixty_day_infocards_df
union all 
select 
'dismissed_clicked' as event_type,
count(case when not user_in_control_group and n_dismissed_clicked = 0 then 1 end) as no_event,
count(case when not user_in_control_group and n_dismissed_clicked > 0 then 1 end) as has_event
from sixty_day_infocards_df
union all 
select 
'action_clicked' as event_type,
count(case when not user_in_control_group and n_action_clicked = 0 then 1 end) as no_event,
count(case when not user_in_control_group and n_action_clicked > 0 then 1 end) as has_event
from sixty_day_infocards_df
)
select *,
round(has_event::numeric/ (no_event + has_event), 3) * 100 as perc_has_event
from unions
"""

In [25]:
con.register("sixty_day_infocards_df", sixty_day_infocards_df)
sixty_day_infocards_chart_df = con.execute(sixty_day_infocards_query).fetchdf()
sixty_day_infocards_chart_df

Unnamed: 0,event_type,no_event,has_event,perc_has_event
0,logins,1138,15643,93.2
1,views,7805,8976,53.5
2,dismissed_clicked,13407,3374,20.1
3,action_clicked,15904,877,5.2


In [26]:
# add login events
af.column_single_label(
    sixty_day_infocards_chart_df,
    af.teal,
    "event_type:O",
    "perc_has_event:Q",
    300,
    400,
    "-y",
).properties(
    title="% of events out of all users that received the sixty days usage infocard"
)

## Top-up Rates

In [28]:
# Z-Test Function
def sixty_day_user_data(col):
    return sixty_day_test_df.iloc[0][col].astype(float)


sixty_day_test_query = """
select 
--All Sixty Day Users
sum(case when sd.user_in_control_group then 1 end) as sixty_day_control_users,
sum(case when not sd.user_in_control_group then 1 end) as sixty_day_variant_users,
sum(case when sd.user_in_control_group and has_top_up = 1 then 1 end) as sixty_day_control_users_with_top_up,
sum(case when not sd.user_in_control_group and has_top_up = 1 then 1 end) as sixty_day_variant_users_with_top_up,
sum(case when sd.user_in_control_group and has_arrears = 1 then 1 end) as sixty_day_control_users_with_arrears,
sum(case when not sd.user_in_control_group and has_arrears = 1 then 1 end) as sixty_day_variant_users_with_arrears,
-- Sixty Day Users With Login
sum(case when sd.user_in_control_group and has_login = 1 then 1 end) as sixty_day_control_users_with_login,
sum(case when not sd.user_in_control_group and has_login = 1 then 1 end) as sixty_day_variant_users_with_login,
sum(case when sd.user_in_control_group and has_login = 1 and has_top_up = 1 then 1 end) as sixty_day_control_users_with_top_up_and_login,
sum(case when not sd.user_in_control_group and has_login = 1 and has_top_up = 1 then 1 end) as sixty_day_variant_users_with_top_up_and_login,
sum(case when sd.user_in_control_group and has_login = 1 and has_arrears = 1 then 1 end) as sixty_day_control_users_with_arrears_and_login,
sum(case when not sd.user_in_control_group and has_login = 1 and has_arrears = 1 then 1 end) as sixty_day_variant_users_with_arrears_and_login,
-- Sixty Day Users With Infocard Views
sum(case when not sd.user_in_control_group and n_views >0 then 1 end) as sixty_day_variant_users_with_view,
sum(case when not sd.user_in_control_group and n_views >0 and has_top_up = 1 then 1 end) as sixty_day_variant_users_with_top_up_and_view,
sum(case when not sd.user_in_control_group and n_views >0 then 1 end) as sixty_day_variant_users_with_view,
sum(case when not sd.user_in_control_group and n_views >0 and has_arrears = 1 then 1 end) as sixty_day_variant_users_with_arrears_and_view, 
-- Sixty Day Users With Infocard Clicks
sum(case when not sd.user_in_control_group and n_action_clicked >0 then 1 end) as sixty_day_variant_users_with_click,
sum(case when not sd.user_in_control_group and n_action_clicked >0 and has_top_up = 1 then 1 end) as sixty_day_variant_users_with_top_up_and_click,
sum(case when not sd.user_in_control_group and n_action_clicked >0 then 1 end) as sixty_day_variant_users_with_click,
sum(case when not sd.user_in_control_group and n_action_clicked >0 and has_arrears = 1 then 1 end) as sixty_day_variant_users_with_arrears_and_click
from sixty_day_df sd
left join sixty_day_infocards_df using (user_id)
order by 2, 1
"""
sixty_day_test_df = con.execute(sixty_day_test_query).fetchdf()


def z_test(variant_success, variant_obs, control_success, control_obs, alternative):
    p_variant = variant_success / variant_obs
    p_control = control_success / control_obs

    diff = p_variant - p_control

    p_pooled = (variant_success + control_success) * (
        (variant_obs + control_obs) ** (-1)
    )

    n_obs_fact = 1 / variant_obs + 1 / control_obs

    var = (p_pooled * (1 - p_pooled)) * n_obs_fact
    std_diff = var**0.5

    z = diff / std_diff

    if alternative == "two-sided":
        pvalue = stats.norm.sf(abs(z)) * 2
    elif alternative == "larger":
        pvalue = stats.norm.sf(z)
    elif alternative == "smaller":
        pvalue = stats.norm.cdf(z)
    else:
        raise ValueError(
            "Invalid alternative: alternative must be one of two-sided, larger or smaller"
        )

    return round(pvalue, 2)

In [29]:
sixty_day_top_up_query = """ 
with totals as (
select 
case when pu.user_in_control_group then 'control' else 'variant' end as user_split,
case when has_top_up = 1 then 'Has Top-up' else 'No Top-up' end as top_up_type,
count(*) as n_users,
count(case when has_login = 1 then 1 end) as login_users, 
count(case when n_views>0 then 1 end) as infocard_view_users,
count(case when n_action_clicked>0 then 1 end) as infocard_clicked_users
from sixty_day_df pu
left join sixty_day_infocards_df using (user_id)
group by 1, 2
)
select *, 
round(n_users::numeric/ (sum(n_users) over (partition by user_split))::numeric, 3)*100 as perc_all_users,
round(login_users::numeric/ (sum(login_users) over (partition by user_split))::numeric, 3)*100 as perc_login_users,
round(infocard_view_users::numeric/ (sum(infocard_view_users) over (partition by user_split))::numeric, 3)*100 as perc_infocard_view_users,
round(infocard_clicked_users::numeric/ (sum(infocard_clicked_users) over (partition by user_split))::numeric, 3)*100 as perc_infocard_click_users
from totals
order by 2, 1
"""

In [30]:
sixty_day_top_up_df = con.execute(sixty_day_top_up_query).fetchdf()
sixty_day_top_up_df

Unnamed: 0,user_split,top_up_type,n_users,login_users,infocard_view_users,infocard_clicked_users,perc_all_users,perc_login_users,perc_infocard_view_users,perc_infocard_click_users
0,control,Has Top-up,11450,11137,0,0,67.8,70.7,,
1,variant,Has Top-up,11551,11237,6460,669,68.8,71.8,72.0,76.3
2,control,No Top-up,5429,4612,0,0,32.2,29.3,,
3,variant,No Top-up,5230,4406,2516,208,31.2,28.2,28.0,23.7


In [31]:
# build chart
all_chart = (
    alt.Chart(sixty_day_top_up_df[sixty_day_top_up_df["top_up_type"] == "Has Top-up"])
    .mark_rect(size=80)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_all_users:Q", title="% of users", scale=alt.Scale(domain=(0, 100))
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_1,
        ),
    )
    .properties(width=200, height=400, title="All users with top_up")
)

all_text = all_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_all_users:Q")

login_chart = (
    alt.Chart(sixty_day_top_up_df[sixty_day_top_up_df["top_up_type"] == "Has Top-up"])
    .mark_rect(size=80)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_login_users:Q", title="% of users", scale=alt.Scale(domain=(0, 100))
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_1,
        ),
    )
    .properties(width=200, height=400, title="Login users with top_up")
)

login_text = login_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_login_users:Q")

infocard_view_chart = (
    alt.Chart(sixty_day_top_up_df[sixty_day_top_up_df["top_up_type"] == "Has Top-up"])
    .mark_rect(size=80)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_infocard_view_users:Q",
            title="% of users",
            scale=alt.Scale(domain=(0, 100)),
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_1,
        ),
    )
    .properties(width=100, height=400, title="Infocard view users with top_up")
)

infocard_view_text = infocard_view_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_infocard_view_users:Q")


infocard_clicked_chart = (
    alt.Chart(sixty_day_top_up_df[sixty_day_top_up_df["top_up_type"] == "Has Top-up"])
    .mark_rect(size=80)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_infocard_click_users:Q",
            title="% of users",
            scale=alt.Scale(domain=(0, 100)),
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_1,
        ),
    )
    .properties(width=100, height=400, title="Infocard clicked users with top_up")
)

infocard_clicked_text = infocard_clicked_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_infocard_click_users:Q")

all_chart + all_text | login_chart + login_text | infocard_view_chart + infocard_view_text | infocard_clicked_chart + infocard_clicked_text

#### Top-up Z-Test results for the Sixty Day Reminder Infocard

In [88]:
pd.set_option("display.max_colwidth", None)

In [89]:
top_up_sixty_all_larger = z_test(
    sixty_day_user_data("sixty_day_variant_users_with_top_up"),
    sixty_day_user_data("sixty_day_variant_users"),
    sixty_day_user_data("sixty_day_control_users_with_top_up"),
    sixty_day_user_data("sixty_day_control_users"),
    "larger",
)
top_up_sixty_login_larger = z_test(
    sixty_day_user_data("sixty_day_variant_users_with_top_up_and_login"),
    sixty_day_user_data("sixty_day_variant_users_with_login"),
    sixty_day_user_data("sixty_day_control_users_with_top_up_and_login"),
    sixty_day_user_data("sixty_day_control_users_with_login"),
    "larger",
)
top_up_sixty_view_larger = z_test(
    sixty_day_user_data("sixty_day_variant_users_with_top_up_and_view"),
    sixty_day_user_data("sixty_day_variant_users_with_view"),
    sixty_day_user_data("sixty_day_control_users_with_top_up_and_login"),
    sixty_day_user_data("sixty_day_control_users_with_login"),
    "larger",
)
top_up_sixty_click_larger = z_test(
    sixty_day_user_data("sixty_day_variant_users_with_top_up_and_click"),
    sixty_day_user_data("sixty_day_variant_users_with_click"),
    sixty_day_user_data("sixty_day_control_users_with_top_up_and_login"),
    sixty_day_user_data("sixty_day_control_users_with_login"),
    "larger",
)

p_value_top_up_sixty_days_df = pd.DataFrame(
    {
        "test": [
            "All users with top-up",
            "Users that logged-in",
            "Viewed inforcard (vs. login in control)",
            "Clicked inforcard (vs. login in control)",
        ],
        "larger_p_value": [
            top_up_sixty_all_larger,
            top_up_sixty_login_larger,
            top_up_sixty_view_larger,
            top_up_sixty_click_larger,
        ],
    }
)

p_value_top_up_sixty_days_df["smaller_p_value"] = (
    1 - p_value_top_up_sixty_days_df["larger_p_value"]
)
p_value_top_up_sixty_days_df["result"] = np.where(
    p_value_top_up_sixty_days_df["larger_p_value"] < 0.05,
    "Variant top-up rates are significantly higher than the control (reject H0)",
    "Samples have similar top-up rates (fail to reject H0)",
)

p_value_top_up_sixty_days_df

Unnamed: 0,test,larger_p_value,smaller_p_value,result
0,All users with top-up,0.02,0.98,Variant top-up rates are significantly higher than the control (reject H0)
1,Users that logged-in,0.01,0.99,Variant top-up rates are significantly higher than the control (reject H0)
2,Viewed inforcard (vs. login in control),0.02,0.98,Variant top-up rates are significantly higher than the control (reject H0)
3,Clicked inforcard (vs. login in control),0.0,1.0,Variant top-up rates are significantly higher than the control (reject H0)


**Conclusion:** We have significantly higher top-up rates for the variant group in all tests of the sixty day experiment. This means that for all users that received the infocars, we verified a 1 percentage point increase in top-up rates for users that received the infocard (from 67.8% to 68.8%)

## Going into arrears

In [33]:
sixty_day_arrears_query = """
with totals as (
select 
case when pu.user_in_control_group then 'control' else 'variant' end as user_split,
case when has_arrears = 1 then 'Has Arrears' else 'No Arrears' end as arrears_type,
count(*) as n_users,
count(case when has_login = 1 then 1 end) as login_users, 
count(case when n_views>0 then 1 end) as infocard_view_users,
count(case when n_action_clicked>0 then 1 end) as infocard_clicked_users
from sixty_day_df pu
left join sixty_day_infocards_df using (user_id)
group by 1, 2
)
select *, 
round(n_users::numeric/ (sum(n_users) over (partition by user_split))::numeric, 3)*100 as perc_all_users,
round(login_users::numeric/ (sum(login_users) over (partition by user_split))::numeric, 3)*100 as perc_login_users,
round(infocard_view_users::numeric/ (sum(infocard_view_users) over (partition by user_split))::numeric, 3)*100 as perc_infocard_view_users,
round(infocard_clicked_users::numeric/ (sum(infocard_clicked_users) over (partition by user_split))::numeric, 3)*100 as perc_infocard_click_users
from totals
order by 2, 1
"""

In [34]:
sixty_day_arrears_df = con.execute(sixty_day_arrears_query).fetchdf()
sixty_day_arrears_df

Unnamed: 0,user_split,arrears_type,n_users,login_users,infocard_view_users,infocard_clicked_users,perc_all_users,perc_login_users,perc_infocard_view_users,perc_infocard_click_users
0,control,Has Arrears,216,208,0,0,1.3,1.3,,
1,variant,Has Arrears,248,245,169,13,1.5,1.6,1.9,1.5
2,control,No Arrears,16663,15541,0,0,98.7,98.7,,
3,variant,No Arrears,16533,15398,8807,864,98.5,98.4,98.1,98.5


In [35]:
# build chart
all_chart = (
    alt.Chart(
        sixty_day_arrears_df[sixty_day_arrears_df["arrears_type"] == "Has Arrears"]
    )
    .mark_rect(size=80)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y("perc_all_users:Q", title="% of users", scale=alt.Scale(domain=(0, 3))),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_1,
        ),
    )
    .properties(width=200, height=400, title="All users with arrears")
)

all_text = all_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_all_users:Q")

login_chart = (
    alt.Chart(
        sixty_day_arrears_df[sixty_day_arrears_df["arrears_type"] == "Has Arrears"]
    )
    .mark_rect(size=80)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_login_users:Q", title="% of users", scale=alt.Scale(domain=(0, 3))
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_1,
        ),
    )
    .properties(width=200, height=400, title="Login users with arrears")
)

login_text = login_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_login_users:Q")

infocard_view_chart = (
    alt.Chart(
        sixty_day_arrears_df[sixty_day_arrears_df["arrears_type"] == "Has Arrears"]
    )
    .mark_rect(size=80)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_infocard_view_users:Q",
            title="% of users",
            scale=alt.Scale(domain=(0, 3)),
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_1,
        ),
    )
    .properties(width=100, height=400, title="Infocard view users with arrears")
)

infocard_view_text = infocard_view_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_infocard_view_users:Q")


infocard_clicked_chart = (
    alt.Chart(
        sixty_day_arrears_df[sixty_day_arrears_df["arrears_type"] == "Has Arrears"]
    )
    .mark_rect(size=80)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_infocard_click_users:Q",
            title="% of users",
            scale=alt.Scale(domain=(0, 3)),
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_1,
        ),
    )
    .properties(width=100, height=400, title="Infocard clicked users with arrears")
)

infocard_clicked_text = infocard_clicked_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_infocard_click_users:Q")

all_chart + all_text | login_chart + login_text | infocard_view_chart + infocard_view_text | infocard_clicked_chart + infocard_clicked_text

#### Arrears Z-Test results for the Sixty Day Reminder Infocard

In [106]:
arrears_sixty_all_larger = z_test(
    sixty_day_user_data("sixty_day_variant_users_with_arrears"),
    sixty_day_user_data("sixty_day_variant_users"),
    sixty_day_user_data("sixty_day_control_users_with_arrears"),
    sixty_day_user_data("sixty_day_control_users"),
    "larger",
)
arrears_sixty_login_larger = z_test(
    sixty_day_user_data("sixty_day_variant_users_with_arrears_and_login"),
    sixty_day_user_data("sixty_day_variant_users_with_login"),
    sixty_day_user_data("sixty_day_control_users_with_arrears_and_login"),
    sixty_day_user_data("sixty_day_control_users_with_login"),
    "larger",
)
arrears_sixty_view_larger = z_test(
    sixty_day_user_data("sixty_day_variant_users_with_arrears_and_view"),
    sixty_day_user_data("sixty_day_variant_users_with_view"),
    sixty_day_user_data("sixty_day_control_users_with_arrears_and_login"),
    sixty_day_user_data("sixty_day_control_users_with_login"),
    "larger",
)
arrears_sixty_click_larger = z_test(
    sixty_day_user_data("sixty_day_variant_users_with_arrears_and_click"),
    sixty_day_user_data("sixty_day_variant_users_with_click"),
    sixty_day_user_data("sixty_day_control_users_with_arrears_and_login"),
    sixty_day_user_data("sixty_day_control_users_with_login"),
    "larger",
)

p_value_arrears_sixty_days_df = pd.DataFrame(
    {
        "test": [
            "All users with top-up",
            "Users that logged-in",
            "Viewed inforcard (vs. login in control)",
            "Clicked inforcard (vs. login in control)",
        ],
        "larger_p_value": [
            arrears_sixty_all_larger,
            arrears_sixty_login_larger,
            arrears_sixty_view_larger,
            arrears_sixty_click_larger,
        ],
    }
)

p_value_arrears_sixty_days_df["smaller_p_value"] = (
    1 - p_value_arrears_sixty_days_df["larger_p_value"]
)
p_value_arrears_sixty_days_df["result"] = np.where(
    p_value_arrears_sixty_days_df["smaller_p_value"] <= 0.05,
    "Variant arrears rates are significantly lower than the control (reject H0)",
    np.where(
        p_value_arrears_sixty_days_df["smaller_p_value"] >= 0.95,
        "Variant arrears rates are significantly higher the control (reject H0 but undesirable effect)",
        "Samples have similar top-up rates (fail to reject H0)",
    ),
)
p_value_arrears_sixty_days_df

Unnamed: 0,test,larger_p_value,smaller_p_value,result
0,All users with top-up,0.06,0.94,Samples have similar top-up rates (fail to reject H0)
1,Users that logged-in,0.03,0.97,Variant arrears rates are significantly higher the control (reject H0 but undesirable effect)
2,Viewed inforcard (vs. login in control),0.0,1.0,Variant arrears rates are significantly higher the control (reject H0 but undesirable effect)
3,Clicked inforcard (vs. login in control),0.34,0.66,Samples have similar top-up rates (fail to reject H0)


**Conclusion:** We don't have significantly lower arrears rates for the variant group in the sixty day experiment. Also, in some cases we have significantly higher arrears rates for the variant group, namely for the logged-in users and variant users who viewed the inforcard (vs. login in control).

<a id='section2'></a>
# Fifty Percent usage infocard test results

For each of the infocards tested in this experiment, we split users equally between variant and control, which means that when grouped together, the control users look like the double of the variant users, but in reality they are once again evenly split (with about 16.8K users per group).

In [37]:
fifty_perc_query = """
select * from experiment_df
where experiment_name = 'fifty-percent-usage-exceeded'
"""

fifty_perc_sample_query = """
select 
case when user_in_control_group then 'control' else replace(experiment_outcome, 'infocard_overdraft_limit_', '') end as user_split,
count(*) as n_users
from fifty_perc_df
group by 1
"""

In [38]:
fifty_perc_df = con.execute(fifty_perc_query).fetchdf()

con.register("fifty_perc_df", fifty_perc_df)
fifty_perc_sample_df = con.execute(fifty_perc_sample_query).fetchdf()

In [39]:
# build chart
chart = (
    alt.Chart(fifty_perc_sample_df)
    .mark_bar()
    .encode(
        x=alt.X("n_users:Q", title="Customers"),
        y=alt.Y("user_split:N", title=None, sort="-y"),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_2,
        ),
    )
    .properties(
        width=600,
        height=200,
        title="Customers in the fifty percent usage infocard experiment",
    )
)

text = chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dx=20,
).encode(text="n_users:Q")

chart + text

## Infocard Interactions

Below you can see an overview of how users interact with the N26 app after being selected for this experiment, this time with a split between the 2 infocards tested.

In [40]:
fifty_perc_infocards_chart_query = """
with unions as (
select 
'logins' as event_type,
replace(experiment_outcome, 'infocard_overdraft_limit_', '') as experiment_outcome,
count(case when not user_in_control_group and has_login = 0 then 1 end) as no_event,
count(case when not user_in_control_group and has_login > 0 then 1 end) as has_event
from fifty_perc_df
group by 1, 2
union all 
select 
'views' as event_type,
replace(experiment_outcome, 'infocard_overdraft_limit_', '') as experiment_outcome,
count(case when not user_in_control_group and n_views = 0 then 1 end) as no_event,
count(case when not user_in_control_group and n_views > 0 then 1 end) as has_event
from fifty_perc_infocards_df
group by 1, 2
union all 
select 
'dismissed_clicked' as event_type,
replace(experiment_outcome, 'infocard_overdraft_limit_', '') as experiment_outcome,
count(case when not user_in_control_group and n_dismissed_clicked = 0 then 1 end) as no_event,
count(case when not user_in_control_group and n_dismissed_clicked > 0 then 1 end) as has_event
from fifty_perc_infocards_df
group by 1, 2
union all 
select 
'action_clicked' as event_type,
replace(experiment_outcome, 'infocard_overdraft_limit_', '') as experiment_outcome,
count(case when not user_in_control_group and n_action_clicked = 0 then 1 end) as no_event,
count(case when not user_in_control_group and n_action_clicked > 0 then 1 end) as has_event
from fifty_perc_infocards_df
group by 1, 2
)
select *,
round(has_event::numeric/ (no_event + has_event), 3) * 100 as perc_has_event
from unions
where experiment_outcome != 'control'
"""

In [41]:
fifty_perc_infocards_chart_df = con.execute(fifty_perc_infocards_chart_query).fetchdf()
fifty_perc_infocards_chart_df

Unnamed: 0,event_type,experiment_outcome,no_event,has_event,perc_has_event
0,logins,reminder_use,2155,14670,87.2
1,logins,reminder_percent,2124,14677,87.4
2,views,reminder_use,3706,13119,78.0
3,views,reminder_percent,3674,13127,78.1
4,dismissed_clicked,reminder_use,8351,8474,50.4
5,dismissed_clicked,reminder_percent,9287,7514,44.7
6,action_clicked,reminder_use,14402,2423,14.4
7,action_clicked,reminder_percent,14518,2283,13.6


In [42]:
percent_chart = af.column_single_label(
    fifty_perc_infocards_chart_df[
        fifty_perc_infocards_chart_df["experiment_outcome"] == "reminder_percent"
    ],
    af.teal,
    "event_type:O",
    "perc_has_event:Q",
    300,
    400,
    "-y",
).properties(
    title="% of events out of all users that received the reminder percent usage infocard"
)
use_chart = af.column_single_label(
    fifty_perc_infocards_chart_df[
        fifty_perc_infocards_chart_df["experiment_outcome"] == "reminder_use"
    ],
    af.petrol,
    "event_type:O",
    "perc_has_event:Q",
    300,
    400,
    "-y",
).properties(
    title="% of events out of all users that received the reminder usage usage infocard"
)
percent_chart | use_chart

## Top-up Rates

In [43]:
# Z-Test Function
def fifty_perc_user_data(col):
    return fifty_perc_test_df.iloc[0][col].astype(float)


fifty_perc_test_query = """
select 
--All Fifty Percent Users
sum(case when sd.user_in_control_group then 1 end) as fifty_perc_control_users,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' then 1 end) as fifty_perc_variant_use_users,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' then 1 end) as fifty_perc_variant_perc_users,
sum(case when sd.user_in_control_group and has_top_up = 1 then 1 end) as fifty_perc_control_users_with_top_up,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and has_top_up = 1 then 1 end) as fifty_perc_variant_use_users_with_top_up,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and has_top_up = 1 then 1 end) as fifty_perc_variant_perc_users_with_top_up,
sum(case when sd.user_in_control_group and has_arrears = 1 then 1 end) as fifty_perc_control_users_with_arrears,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and has_arrears = 1 then 1 end) as fifty_perc_variant_use_users_with_arrears,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and has_arrears = 1 then 1 end) as fifty_perc_variant_perc_users_with_arrears,
-- Fifty Percent Users With Login
sum(case when sd.user_in_control_group and has_login = 1 then 1 end) as fifty_perc_control_users_with_login,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and has_login = 1 then 1 end) as fifty_perc_variant_use_users_with_login,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and has_login = 1 then 1 end) as fifty_perc_variant_perc_users_with_login,
sum(case when sd.user_in_control_group and has_login = 1 and has_top_up = 1 then 1 end) as fifty_perc_control_users_with_top_up_and_login,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and has_login = 1 and has_top_up = 1 then 1 end) as fifty_perc_variant_use_users_with_top_up_and_login,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and has_login = 1 and has_top_up = 1 then 1 end) as fifty_perc_variant_perc_users_with_top_up_and_login,
sum(case when sd.user_in_control_group and has_login = 1 and has_arrears = 1 then 1 end) as fifty_perc_control_users_with_arrears_and_login,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and has_login = 1 and has_arrears = 1 then 1 end) as fifty_perc_variant_use_users_with_arrears_and_login,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and has_login = 1 and has_arrears = 1 then 1 end) as fifty_perc_variant_perc_users_with_arrears_and_login,
-- Fifty Percent Users With Infocard Views
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and n_views >0 then 1 end) as fifty_perc_variant_use_users_with_view,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and n_views >0 then 1 end) as fifty_perc_variant_perc_users_with_view,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and n_views >0 and has_top_up = 1 then 1 end) as fifty_perc_variant_use_users_with_top_up_and_view,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and n_views >0 and has_top_up = 1 then 1 end) as fifty_perc_variant_perc_users_with_top_up_and_view,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and n_views >0 and has_arrears = 1 then 1 end) as fifty_perc_variant_use_users_with_arrears_and_view,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and n_views >0 and has_arrears = 1 then 1 end) as fifty_perc_variant_perc_users_with_arrears_and_view,
-- Fifty Percent Users With Infocard Clicks
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and n_action_clicked >0 then 1 end) as fifty_perc_variant_use_users_with_click,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and n_action_clicked >0 then 1 end) as fifty_perc_variant_perc_users_with_click,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and n_action_clicked >0 and has_top_up = 1 then 1 end) as fifty_perc_variant_use_users_with_top_up_and_click,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and n_action_clicked >0 and has_top_up = 1 then 1 end) as fifty_perc_variant_perc_users_with_top_up_and_click,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_use' and n_action_clicked >0 and has_arrears = 1 then 1 end) as fifty_perc_variant_use_users_with_arrears_and_click,
sum(case when sd.experiment_outcome = 'infocard_overdraft_limit_reminder_percent' and n_action_clicked >0 and has_arrears = 1 then 1 end) as fifty_perc_variant_perc_users_with_arrears_and_click
from fifty_perc_df sd
left join fifty_perc_infocards_df using (user_id)
order by 2, 1
"""
fifty_perc_test_df = con.execute(fifty_perc_test_query).fetchdf()


def z_test(variant_success, variant_obs, control_success, control_obs, alternative):
    p_variant = variant_success / variant_obs
    p_control = control_success / control_obs

    diff = p_variant - p_control

    p_pooled = (variant_success + control_success) * (
        (variant_obs + control_obs) ** (-1)
    )

    n_obs_fact = 1 / variant_obs + 1 / control_obs

    var = (p_pooled * (1 - p_pooled)) * n_obs_fact
    std_diff = var**0.5

    z = diff / std_diff

    if alternative == "two-sided":
        pvalue = stats.norm.sf(abs(z)) * 2
    elif alternative == "larger":
        pvalue = stats.norm.sf(z)
    elif alternative == "smaller":
        pvalue = stats.norm.cdf(z)
    else:
        raise ValueError(
            "Invalid alternative: alternative must be one of two-sided, larger or smaller"
        )

    return round(pvalue, 2)

In [44]:
fifty_perc_top_up_query = """ 
with totals as (
select 
case when pu.user_in_control_group then 'control' else replace(pu.experiment_outcome, 'infocard_overdraft_limit_', '') end as user_split,
case when has_top_up = 1 then 'Has Top-up' else 'No Top-up' end as top_up_type,
count(*) as n_users,
count(case when has_login = 1 then 1 end) as login_users, 
count(case when n_views>0 then 1 end) as infocard_view_users,
count(case when n_action_clicked>0 then 1 end) as infocard_clicked_users
from fifty_perc_df pu
left join fifty_perc_infocards_df using (user_id)
group by 1, 2
)
select *, 
round(n_users::numeric/ (sum(n_users) over (partition by user_split))::numeric, 3)*100 as perc_all_users,
round(login_users::numeric/ (sum(login_users) over (partition by user_split))::numeric, 3)*100 as perc_login_users,
round(infocard_view_users::numeric/ (sum(infocard_view_users) over (partition by user_split))::numeric, 3)*100 as perc_infocard_view_users,
round(infocard_clicked_users::numeric/ (sum(infocard_clicked_users) over (partition by user_split))::numeric, 3)*100 as perc_infocard_click_users
from totals
order by 2, 1
"""

In [45]:
fifty_perc_top_up_df = con.execute(fifty_perc_top_up_query).fetchdf()
fifty_perc_top_up_df

Unnamed: 0,user_split,top_up_type,n_users,login_users,infocard_view_users,infocard_clicked_users,perc_all_users,perc_login_users,perc_infocard_view_users,perc_infocard_click_users
0,control,Has Top-up,22579,21666,0,0,66.8,73.4,,
1,reminder_percent,Has Top-up,11222,10783,9455,1744,66.8,73.5,72.0,76.4
2,reminder_use,Has Top-up,11259,10788,9458,1866,66.9,73.5,72.1,77.0
3,control,No Top-up,11204,7850,0,0,33.2,26.6,,
4,reminder_percent,No Top-up,5579,3894,3672,539,33.2,26.5,28.0,23.6
5,reminder_use,No Top-up,5566,3882,3661,557,33.1,26.5,27.9,23.0


In [46]:
# build chart
all_chart = (
    alt.Chart(fifty_perc_top_up_df[fifty_perc_top_up_df["top_up_type"] == "Has Top-up"])
    .mark_rect(size=60)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_all_users:Q", title="% of users", scale=alt.Scale(domain=(0, 100))
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_2,
        ),
    )
    .properties(width=200, height=400, title="All users with top_up")
)

all_text = all_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_all_users:Q")

login_chart = (
    alt.Chart(fifty_perc_top_up_df[fifty_perc_top_up_df["top_up_type"] == "Has Top-up"])
    .mark_rect(size=60)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_login_users:Q", title="% of users", scale=alt.Scale(domain=(0, 100))
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_2,
        ),
    )
    .properties(width=200, height=400, title="Login users with top_up")
)

login_text = login_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_login_users:Q")

infocard_view_chart = (
    alt.Chart(fifty_perc_top_up_df[fifty_perc_top_up_df["top_up_type"] == "Has Top-up"])
    .mark_rect(size=50)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_infocard_view_users:Q",
            title="% of users",
            scale=alt.Scale(domain=(0, 100)),
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_2,
        ),
    )
    .properties(width=120, height=400, title="Infocard view users with top_up")
)

infocard_view_text = infocard_view_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_infocard_view_users:Q")


infocard_clicked_chart = (
    alt.Chart(fifty_perc_top_up_df[fifty_perc_top_up_df["top_up_type"] == "Has Top-up"])
    .mark_rect(size=50)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_infocard_click_users:Q",
            title="% of users",
            scale=alt.Scale(domain=(0, 100)),
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_2,
        ),
    )
    .properties(width=120, height=400, title="Infocard clicked users with top_up")
)

infocard_clicked_text = infocard_clicked_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_infocard_click_users:Q")

all_chart + all_text | login_chart + login_text | infocard_view_chart + infocard_view_text | infocard_clicked_chart + infocard_clicked_text

#### Top-up Z-Test results for the Percent Reminder Infocard

In [105]:
# Reminder percent
top_up_fifty_perc_all_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_top_up"),
    fifty_perc_user_data("fifty_perc_variant_perc_users"),
    fifty_perc_user_data("fifty_perc_control_users_with_top_up"),
    fifty_perc_user_data("fifty_perc_control_users"),
    "larger",
)
top_up_fifty_perc_login_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_top_up_and_login"),
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_top_up_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)
top_up_fifty_perc_view_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_top_up_and_view"),
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_view"),
    fifty_perc_user_data("fifty_perc_control_users_with_top_up_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)
top_up_fifty_perc_click_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_top_up_and_click"),
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_click"),
    fifty_perc_user_data("fifty_perc_control_users_with_top_up_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)

p_value_top_up_fifty_perc_df = pd.DataFrame(
    {
        "test": [
            "All users with top-up",
            "Users that logged-in",
            "Viewed inforcard (vs. login in control)",
            "Clicked inforcard (vs. login in control)",
        ],
        "larger_p_value": [
            top_up_fifty_perc_all_larger,
            top_up_fifty_perc_login_larger,
            top_up_fifty_perc_view_larger,
            top_up_fifty_perc_click_larger,
        ],
    }
)

p_value_top_up_fifty_perc_df["smaller_p_value"] = (
    1 - p_value_top_up_fifty_perc_df["larger_p_value"]
)

p_value_top_up_fifty_perc_df["result"] = np.where(
    p_value_top_up_fifty_perc_df["larger_p_value"] <= 0.05,
    "Variant top-up rates are significantly higher than the control (reject H0)",
    np.where(
        p_value_top_up_fifty_perc_df["larger_p_value"] >= 0.95,
        "Variant top-up rates are significantly lower the control (reject H0 but undesirable effect)",
        "Samples have similar top-up rates (fail to reject H0)",
    ),
)
p_value_top_up_fifty_perc_df

Unnamed: 0,test,larger_p_value,smaller_p_value,result
0,All users with top-up,0.54,0.46,Samples have similar top-up rates (fail to reject H0)
1,Users that logged-in,0.44,0.56,Samples have similar top-up rates (fail to reject H0)
2,Viewed inforcard (vs. login in control),1.0,0.0,Variant top-up rates are significantly lower the control (reject H0 but undesirable effect)
3,Clicked inforcard (vs. login in control),0.0,1.0,Variant top-up rates are significantly higher than the control (reject H0)


**Conclusion:** We have significantly higher top-up rates in the Percent Reminder Infocard for variant users who clicked the inforcard (vs. login in control). However, we found the opposite effect for the variant users who viewed this infocard.

#### Top-up Z-Test results for the Usage Reminder Infocard

In [102]:
# Reminder usage
top_up_fifty_usage_all_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_use_users_with_top_up"),
    fifty_perc_user_data("fifty_perc_variant_use_users"),
    fifty_perc_user_data("fifty_perc_control_users_with_top_up"),
    fifty_perc_user_data("fifty_perc_control_users"),
    "larger",
)
top_up_fifty_usage_login_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_use_users_with_top_up_and_login"),
    fifty_perc_user_data("fifty_perc_variant_use_users_with_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_top_up_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)
top_up_fifty_usage_view_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_use_users_with_top_up_and_view"),
    fifty_perc_user_data("fifty_perc_variant_use_users_with_view"),
    fifty_perc_user_data("fifty_perc_control_users_with_top_up_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)
top_up_fifty_usage_click_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_use_users_with_top_up_and_click"),
    fifty_perc_user_data("fifty_perc_variant_use_users_with_click"),
    fifty_perc_user_data("fifty_perc_control_users_with_top_up_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)

p_value_top_up_fifty_usage_df = pd.DataFrame(
    {
        "test": [
            "All users with top-up",
            "Users that logged-in",
            "Viewed inforcard (vs. login in control)",
            "Clicked inforcard (vs. login in control)",
        ],
        "larger_p_value": [
            top_up_fifty_usage_all_larger,
            top_up_fifty_usage_login_larger,
            top_up_fifty_usage_view_larger,
            top_up_fifty_usage_click_larger,
        ],
    }
)

p_value_top_up_fifty_usage_df["smaller_p_value"] = (
    1 - p_value_top_up_fifty_usage_df["larger_p_value"]
)

p_value_top_up_fifty_usage_df["result"] = np.where(
    p_value_top_up_fifty_usage_df["larger_p_value"] <= 0.05,
    "Variant top-up rates are significantly higher than the control (reject H0)",
    np.where(
        p_value_top_up_fifty_usage_df["larger_p_value"] >= 0.95,
        "Variant top-up rates are significantly lower the control (reject H0 but undesirable effect)",
        "Samples have similar top-up rates (fail to reject H0)",
    ),
)
p_value_top_up_fifty_usage_df

Unnamed: 0,test,larger_p_value,smaller_p_value,result
0,All users with top-up,0.43,0.57,Samples have similar top-up rates (fail to reject H0)
1,Users that logged-in,0.38,0.62,Samples have similar top-up rates (fail to reject H0)
2,Viewed inforcard (vs. login in control),1.0,0.0,Variant top-up rates are significantly lower the control (reject H0 but undesirable effect)
3,Clicked inforcard (vs. login in control),0.0,1.0,Variant top-up rates are significantly higher than the control (reject H0)


**Conclusion:** Same result as in the infocard above, have significantly higher top-up rates for Usage Reminder Infocard for variant users who clicked the inforcard (vs. login in control), and found the opposite effect for the variant users who viewed this infocard.

## Going into arrears

In [49]:
fifty_perc_arrears_query = """
with totals as (
select 
case when pu.user_in_control_group then 'control' else replace(pu.experiment_outcome, 'infocard_overdraft_limit_', '') end as user_split,
case when has_arrears = 1 then 'Has arrears' else 'No arrears' end as arrears_type,
count(*) as n_users,
count(case when has_login = 1 then 1 end) as login_users, 
count(case when n_views>0 then 1 end) as infocard_view_users,
count(case when n_action_clicked>0 then 1 end) as infocard_clicked_users
from fifty_perc_df pu
left join fifty_perc_infocards_df using (user_id)
group by 1, 2
)
select *, 
round(n_users::numeric/ (sum(n_users) over (partition by user_split))::numeric, 3)*100 as perc_all_users,
round(login_users::numeric/ (sum(login_users) over (partition by user_split))::numeric, 3)*100 as perc_login_users,
round(infocard_view_users::numeric/ (sum(infocard_view_users) over (partition by user_split))::numeric, 3)*100 as perc_infocard_view_users,
round(infocard_clicked_users::numeric/ (sum(infocard_clicked_users) over (partition by user_split))::numeric, 3)*100 as perc_infocard_click_users
from totals
order by 2, 1
"""

In [50]:
fifty_perc_arrears_df = con.execute(fifty_perc_arrears_query).fetchdf()

In [51]:
fifty_perc_arrears_df

Unnamed: 0,user_split,arrears_type,n_users,login_users,infocard_view_users,infocard_clicked_users,perc_all_users,perc_login_users,perc_infocard_view_users,perc_infocard_click_users
0,control,Has arrears,1029,883,0,0,3.0,3.0,,
1,reminder_percent,Has arrears,513,462,435,69,3.1,3.1,3.3,3.0
2,reminder_use,Has arrears,545,472,454,82,3.2,3.2,3.5,3.4
3,control,No arrears,32754,28633,0,0,97.0,97.0,,
4,reminder_percent,No arrears,16288,14215,12692,2214,96.9,96.9,96.7,97.0
5,reminder_use,No arrears,16280,14198,12665,2341,96.8,96.8,96.5,96.6


In [52]:
# build chart
all_chart = (
    alt.Chart(
        fifty_perc_arrears_df[fifty_perc_arrears_df["arrears_type"] == "Has arrears"]
    )
    .mark_rect(size=60)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y("perc_all_users:Q", title="% of users", scale=alt.Scale(domain=(0, 4))),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_2,
        ),
    )
    .properties(width=200, height=400, title="All users with arrears")
)

all_text = all_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_all_users:Q")

login_chart = (
    alt.Chart(
        fifty_perc_arrears_df[fifty_perc_arrears_df["arrears_type"] == "Has arrears"]
    )
    .mark_rect(size=60)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_login_users:Q", title="% of users", scale=alt.Scale(domain=(0, 4))
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_2,
        ),
    )
    .properties(width=200, height=400, title="Login users with arrears")
)

login_text = login_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_login_users:Q")

infocard_view_chart = (
    alt.Chart(
        fifty_perc_arrears_df[fifty_perc_arrears_df["arrears_type"] == "Has arrears"]
    )
    .mark_rect(size=50)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_infocard_view_users:Q",
            title="% of users",
            scale=alt.Scale(domain=(0, 4)),
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_2,
        ),
    )
    .properties(width=120, height=400, title="Infocard view users with arrears")
)

infocard_view_text = infocard_view_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_infocard_view_users:Q")


infocard_clicked_chart = (
    alt.Chart(
        fifty_perc_arrears_df[fifty_perc_arrears_df["arrears_type"] == "Has arrears"]
    )
    .mark_rect(size=50)
    .encode(
        x=alt.X("user_split:N", title=None),
        y=alt.Y(
            "perc_infocard_click_users:Q",
            title="% of users",
            scale=alt.Scale(domain=(0, 4)),
        ),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_2,
        ),
    )
    .properties(width=120, height=400, title="Infocard clicked users with arrears")
)

infocard_clicked_text = infocard_clicked_chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dy=-10,
).encode(text="perc_infocard_click_users:Q")

all_chart + all_text | login_chart + login_text | infocard_view_chart + infocard_view_text | infocard_clicked_chart + infocard_clicked_text

#### Arrears Z-Test results for the Percent Reminder Infocard

In [103]:
# Reminder percent
arrears_fifty_perc_all_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_arrears"),
    fifty_perc_user_data("fifty_perc_variant_perc_users"),
    fifty_perc_user_data("fifty_perc_control_users_with_arrears"),
    fifty_perc_user_data("fifty_perc_control_users"),
    "larger",
)
arrears_fifty_perc_login_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_arrears_and_login"),
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_arrears_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)
arrears_fifty_perc_view_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_arrears_and_view"),
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_view"),
    fifty_perc_user_data("fifty_perc_control_users_with_arrears_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)
arrears_fifty_perc_click_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_arrears_and_click"),
    fifty_perc_user_data("fifty_perc_variant_perc_users_with_click"),
    fifty_perc_user_data("fifty_perc_control_users_with_arrears_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)

p_value_arrears_fifty_perc_df = pd.DataFrame(
    {
        "test": [
            "All users with top-up",
            "Users that logged-in",
            "Viewed inforcard (vs. login in control)",
            "Clicked inforcard (vs. login in control)",
        ],
        "larger_p_value": [
            arrears_fifty_perc_all_larger,
            arrears_fifty_perc_login_larger,
            arrears_fifty_perc_view_larger,
            arrears_fifty_perc_click_larger,
        ],
    }
)

p_value_arrears_fifty_perc_df["smaller_p_value"] = (
    1 - p_value_arrears_fifty_perc_df["larger_p_value"]
)

p_value_arrears_fifty_perc_df["result"] = np.where(
    p_value_arrears_fifty_perc_df["smaller_p_value"] <= 0.05,
    "Variant arrears rates are significantly lower than the control (reject H0)",
    np.where(
        p_value_arrears_fifty_perc_df["smaller_p_value"] >= 0.95,
        "Variant arrears rates are significantly higher the control (reject H0 but undesirable effect)",
        "Samples have similar top-up rates (fail to reject H0)",
    ),
)
p_value_arrears_fifty_perc_df

Unnamed: 0,test,larger_p_value,smaller_p_value,result
0,All users with top-up,0.48,0.52,Samples have similar top-up rates (fail to reject H0)
1,Users that logged-in,0.18,0.82,Samples have similar top-up rates (fail to reject H0)
2,Viewed inforcard (vs. login in control),0.04,0.96,Variant arrears rates are significantly higher the control (reject H0 but undesirable effect)
3,Clicked inforcard (vs. login in control),0.47,0.53,Samples have similar top-up rates (fail to reject H0)


**Conclusion:** We did not find significantly lower arrears rates for the variant users in the Percent Reminder Infocard. However, we have significantly higher arrears rates for variant users who viewed the inforcard (vs. login in control). 

#### Arrears Z-Test results for the Usage Reminder Infocard

In [104]:
# Reminder usage
arrears_fifty_usage_all_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_use_users_with_arrears"),
    fifty_perc_user_data("fifty_perc_variant_use_users"),
    fifty_perc_user_data("fifty_perc_control_users_with_arrears"),
    fifty_perc_user_data("fifty_perc_control_users"),
    "larger",
)
arrears_fifty_usage_login_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_use_users_with_arrears_and_login"),
    fifty_perc_user_data("fifty_perc_variant_use_users_with_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_arrears_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)
arrears_fifty_usage_view_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_use_users_with_arrears_and_view"),
    fifty_perc_user_data("fifty_perc_variant_use_users_with_view"),
    fifty_perc_user_data("fifty_perc_control_users_with_arrears_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)
arrears_fifty_usage_click_larger = z_test(
    fifty_perc_user_data("fifty_perc_variant_use_users_with_arrears_and_click"),
    fifty_perc_user_data("fifty_perc_variant_use_users_with_click"),
    fifty_perc_user_data("fifty_perc_control_users_with_arrears_and_login"),
    fifty_perc_user_data("fifty_perc_control_users_with_login"),
    "larger",
)

p_value_arrears_fifty_usage_df = pd.DataFrame(
    {
        "test": [
            "All users with top-up",
            "Users that logged-in",
            "Viewed inforcard (vs. login in control)",
            "Clicked inforcard (vs. login in control)",
        ],
        "larger_p_value": [
            arrears_fifty_usage_all_larger,
            arrears_fifty_usage_login_larger,
            arrears_fifty_usage_view_larger,
            arrears_fifty_usage_click_larger,
        ],
    }
)

p_value_arrears_fifty_usage_df["smaller_p_value"] = (
    1 - p_value_arrears_fifty_usage_df["larger_p_value"]
)

p_value_arrears_fifty_usage_df["result"] = np.where(
    p_value_arrears_fifty_usage_df["smaller_p_value"] <= 0.05,
    "Variant arrears rates are significantly lower than the control (reject H0)",
    np.where(
        p_value_arrears_fifty_usage_df["smaller_p_value"] >= 0.95,
        "Variant arrears rates are significantly higher the control (reject H0 but undesirable effect)",
        "Samples have similar top-up rates (fail to reject H0)",
    ),
)
p_value_arrears_fifty_usage_df

Unnamed: 0,test,larger_p_value,smaller_p_value,result
0,All users with top-up,0.12,0.88,Samples have similar top-up rates (fail to reject H0)
1,Users that logged-in,0.1,0.9,Samples have similar top-up rates (fail to reject H0)
2,Viewed inforcard (vs. login in control),0.01,0.99,Variant arrears rates are significantly higher the control (reject H0 but undesirable effect)
3,Clicked inforcard (vs. login in control),0.14,0.86,Samples have similar top-up rates (fail to reject H0)


**Conclusion:** Once again, we did not find significantly lower arrears rates for the variant users in the Usage Reminder Infocard, and have significantly higher arrears rates for for variant users who viewed the inforcard (vs. login in control). 

<a id='section3'></a>
# Do we have users in both experiments?

We had about 15.6K users that were in both experiments, meaning that these users had their first overdraft usage and reached 50% of their overdraft limit during the experiment. In order to understand how these results impact the top-up rates for the sixty day experiment, we ran an OLS Regression. This regression shows us that there is a significant negative coefficient for these users, which implies that being in both experiments correlates to lower top-up rates in comparison to being in the sixty day experiment only. 

We decided to keep these users in the experiment results either way since these will overlap even after the experiment ends.

In [10]:
both_exp_query = """
with max_created as (
select user_id, 
max(created) as last_created
from experiment_df 
where n_experiments > 1
group by 1
)
select 
last_created::date as last_infocard_date,
count (*) as n_duplicate_users
from max_created
group by 1
"""

In [11]:
con.register("experiment_df", experiment_df)
both_exp_df = con.execute(both_exp_query).fetchdf()
af.area_single_label(
    both_exp_df, af.pink, "last_infocard_date:T", "n_duplicate_users:Q", 800, 400, "x"
)

In [12]:
regression_query = """
with variant_opposite_exp as (
select 
user_id,
count(case when experiment_name = 'sixty-days-usage-info-card' and user_in_control_group then 1 end) as exp1_control,
count(case when experiment_name = 'sixty-days-usage-info-card' and not user_in_control_group then 1 end) as exp1_variant,
count(case when experiment_name = 'fifty-percent-usage-exceeded' and user_in_control_group then 1 end) as exp2_control,
count(case when experiment_name = 'fifty-percent-usage-exceeded' and not user_in_control_group then 1 end) as exp2_variant
from experiment_df
where n_experiments = 2
group by 1
), 
opposite_users as (
select
user_id
from variant_opposite_exp 
where (exp1_control or exp1_variant)
and exp2_variant
)
select *,
case when ou.user_id is not null then 1 else 0 end as variant_exp2_dummy
from experiment_df
left join opposite_users ou using (user_id)
"""

In [3]:
regression_df = con.execute(regression_query).fetchdf()
regression_sixty_day_df = regression_df.loc[
    regression_df["experiment_name"] == "sixty-days-usage-info-card"
]
regression_sixty_day_df["constant"] = 1

In [14]:
x = regression_sixty_day_df.loc[
    :, ["both_exp_dummy", "variant_exp2_dummy", "control_dummy", "constant"]
]  # explanatory variables
y = regression_sixty_day_df.loc[:, ["has_top_up"]]  # outcome
results = sm.OLS(y, x).fit()
results.summary()

0,1,2,3
Dep. Variable:,has_top_up,R-squared:,0.004
Model:,OLS,Adj. R-squared:,0.004
Method:,Least Squares,F-statistic:,42.18
Date:,"Wed, 14 Jul 2021",Prob (F-statistic):,3.41e-27
Time:,13:53:12,Log-Likelihood:,-21937.0
No. Observations:,33660,AIC:,43880.0
Df Residuals:,33656,BIC:,43920.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
both_exp_dummy,-0.0430,0.007,-5.993,0.000,-0.057,-0.029
variant_exp2_dummy,-0.0240,0.010,-2.362,0.018,-0.044,-0.004
control_dummy,-0.0212,0.007,-3.062,0.002,-0.035,-0.008
constant,0.7194,0.005,146.962,0.000,0.710,0.729

0,1,2,3
Omnibus:,437053.532,Durbin-Watson:,1.953
Prob(Omnibus):,0.0,Jarque-Bera (JB):,6072.304
Skew:,-0.784,Prob(JB):,0.0
Kurtosis:,1.632,Cond. No.,6.46


### Compare top-ups for number of experiments and experiment name

In [15]:
dups_query = """
with totals as (
select 
experiment_name || ' ' || n_experiments::text || ' experiments' as experiment_dups, 
case when user_in_control_group then 'control' else 'variant' end as user_split,
case when has_top_up = 1 then 'Has Top-up' else 'No Top-up' end as top_up_type,
count(distinct user_id) as n_users
from experiment_df
group by 1, 2, 3
)
select *, 
round(n_users::numeric/ (sum(n_users) over (partition by experiment_dups, user_split))::numeric, 3)*100 as perc_all_users
from totals
order by 1, 2, 3 
"""

In [69]:
dups_df = con.execute(dups_query).fetchdf()

In [73]:
af.column_multi(
    dups_df[dups_df["top_up_type"] == "Has Top-up"],
    "user_split:N",
    "user_split",
    "perc_all_users",
    "experiment_dups",
    200,
    400,
    "x",
).properties(title="Top-up rates for users in both experiments")

<a id='section4'></a>
# Do all users in the experiment have OD Enabled?

Whereas in the sixty day experiment all users have an enabled overdraft, only about 64% of the users in the fifty percent usage infocars have an overdraft enabled. Once again, we decided to keep these users in the experiment since this is the way plutonium is currently set up, but we would recommend to only include users with an enabled overdraft moving forward.

In [18]:
od_enabled_query = """
with fifty_perc_enabled as (
select * from (
select 
experiment_name,
per.user_id, 
poh.created as od_history_updated,
poh.status,
case when user_in_control_group then 'control' else replace(experiment_outcome, 'infocard_overdraft_limit_', '') end as user_split,
row_number() over (partition by per.user_id order by od_history_updated desc) as rn
from pu_experiments_result per
inner join pu_overdraft_history poh 
on per.user_id = poh.user_id 
and poh.created <= per.created
and experiment_name = 'fifty-percent-usage-exceeded'
)
where rn = 1 
),
sixty_day_enabled as (
select * from (
select 
experiment_name,
per.user_id, 
poh.created as od_history_updated,
poh.status,
case when user_in_control_group then 'control' else 'variant' end as user_split,
row_number() over (partition by per.user_id order by od_history_updated desc) as rn
from pu_experiments_result per
inner join pu_overdraft_history poh 
on per.user_id = poh.user_id 
and poh.created <= per.created
and experiment_name = 'sixty-days-usage-info-card'
)
where rn = 1 
)
select 
experiment_name,
status,
user_split,
count(*) as n_users
from fifty_perc_enabled
group by 1, 2, 3
union all 
select 
experiment_name,
status,
user_split,
count(*) as n_users
from sixty_day_enabled
group by 1, 2, 3
order by 1, 2, 3
"""

In [19]:
od_enabled_df = df_from_sql("redshiftreader", od_enabled_query)

In [72]:
af.column_multi(
    od_enabled_df.groupby(["experiment_name", "status"]).sum().reset_index(),
    "experiment_name",
    "status",
    "n_users",
    "experiment_name",
    300,
    400,
    "-y",
)

# Was the overdraft usage equally distributed?

Looking at the charts below, we can easily see that the random split worked as expected in this case and users were really well split in both experiments when it comes to their percentage of overdraft usage at the moment they met the conditions for the experiments.

In [55]:
sixty_day_od_usage_query = """
with totals as (
select 
case when user_in_control_group then 'control' else 'variant' end as user_split,
case when perc_usage > 100 then 101 else perc_usage end as perc_usage,
count(*) as n_users
from sixty_day_df
where perc_usage is not null
group by 1, 2
order by 3 desc
), 
cumsum as (
select *, 
sum(n_users) over (partition by user_split order by perc_usage rows unbounded preceding) as cumulative_sum
from totals
)
select *,
round(cumulative_sum::numeric/ (sum(n_users::numeric) over (partition by user_split))::numeric, 4)*100 as perc_cumsum_users
from cumsum
"""
sixty_day_od_usage_df = con.execute(sixty_day_od_usage_query).fetchdf()

#### Cumulative sum of the sixty day experiment users based on their percentage of overdraft users

In [56]:
af.line_multi(
    sixty_day_od_usage_df,
    "user_split",
    "perc_usage",
    "perc_cumsum_users",
    800,
    400,
    "y",
)

In [57]:
fifty_perc_od_usage_query = """
with totals as (
select 
replace(experiment_outcome, 'infocard_overdraft_limit_', '') as user_split,
case when perc_usage > 100 then 101 else perc_usage end as perc_usage,
count(*) as n_users
from fifty_perc_df
where perc_usage is not null
group by 1, 2
), 
cumsum as (
select *, 
sum(n_users) over (partition by user_split order by perc_usage rows unbounded preceding) as cumulative_sum
from totals
)
select *,
round(cumulative_sum::numeric/ (sum(n_users::numeric) over (partition by user_split))::numeric, 4)*100 as perc_cumsum_users
from cumsum
"""
fifty_perc_od_usage_df = con.execute(fifty_perc_od_usage_query).fetchdf()

#### Cumulative sum of the fifty percent experiment users based on their percentage of overdraft users

In [58]:
af.line_multi(
    fifty_perc_od_usage_df,
    "user_split",
    "perc_usage",
    "perc_cumsum_users",
    800,
    400,
    "y",
)

<a id='section5'></a>
# Were experiment users part of the overdraft reduction initiative?

On June 22nd we ran a batch job to reduce/ cancel the arranged overdraft for a list of 16K users. Even though this batch jod didn't match the timings of the experiment, it did fall within the 30 days period we considered for users to potentially go into arrears.

Below we check if the users affected by this overdraft reduction were evenly split in each experiment, and once again it seems that they do, so this batch job should not have an impact on the experiment results. 

### Sixty day usage experiment

In [59]:
reduction_users_query = """
select *
from read_csv_auto('research/product/bank_products/20210525_overdraft_infocard_ab_tests/od_limit_reduction_users.csv',HEADER=TRUE)
"""

In [60]:
reduction_users_df = con.execute(reduction_users_query).fetchdf()

In [61]:
sixty_days_reduction_query = """ 
with totals as (
select 
case when user_in_control_group then 'control' else 'variant' end as user_split,
case when ru.user_id is null then 'Not in Reduction List' else 'In Reduction List' end as reduction_type,
count(*) as n_users
from sixty_day_df
left join reduction_users_df ru using (user_id)
group by 1, 2
)
select *, 
round(n_users::numeric/ (sum(n_users::numeric) over (partition by user_split))::numeric, 3)*100 as perc_users 
from totals
"""
sixty_days_reduction_df = con.execute(sixty_days_reduction_query).fetchdf()
sixty_days_reduction_df

Unnamed: 0,user_split,reduction_type,n_users,perc_users
0,control,Not in Reduction List,15804,93.6
1,control,In Reduction List,1075,6.4
2,variant,Not in Reduction List,15698,93.5
3,variant,In Reduction List,1083,6.5


In [62]:
# build chart
chart = (
    alt.Chart(
        sixty_days_reduction_df[
            sixty_days_reduction_df["reduction_type"] == "In Reduction List"
        ]
    )
    .mark_bar()
    .encode(
        x=alt.X("perc_users:Q", title="Customers"),
        y=alt.Y("user_split:N", title=None),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_1,
        ),
    )
    .properties(
        width=600,
        height=200,
        title="% of users in the sixty days usage infocard experiment and the OD limit reduction List",
    )
)

text = chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dx=20,
).encode(text="perc_users:Q")

chart + text

### Fifty percent usage experiment

In [63]:
fifty_perc_reduction_query = """ 
with totals as (
select 
replace(experiment_outcome, 'infocard_overdraft_limit_', '') as user_split,
case when ru.user_id is null then 'Not in Reduction List' else 'In Reduction List' end as reduction_type,
count(*) as n_users
from fifty_perc_df
left join reduction_users_df ru using (user_id)
group by 1, 2
)
select *, 
round(n_users::numeric/ (sum(n_users::numeric) over (partition by user_split))::numeric, 3)*100 as perc_users 
from totals
"""
fifty_perc_reduction_df = con.execute(fifty_perc_reduction_query).fetchdf()
fifty_perc_reduction_df

Unnamed: 0,user_split,reduction_type,n_users,perc_users
0,reminder_use,In Reduction List,1533,9.1
1,reminder_use,Not in Reduction List,15292,90.9
2,control,In Reduction List,3159,9.4
3,control,Not in Reduction List,30624,90.6
4,reminder_percent,In Reduction List,1522,9.1
5,reminder_percent,Not in Reduction List,15279,90.9


In [64]:
# build chart
chart = (
    alt.Chart(
        fifty_perc_reduction_df[
            fifty_perc_reduction_df["reduction_type"] == "In Reduction List"
        ]
    )
    .mark_bar()
    .encode(
        x=alt.X("perc_users:Q", title="Customers"),
        y=alt.Y("user_split:N", title=None),
        color=alt.Color(
            "user_split:N",
            scale=color_scale_2,
        ),
    )
    .properties(
        width=600,
        height=200,
        title="% of users in the fifty percent usage infocard experiment and the OD limit reduction List",
    )
)

text = chart.mark_text(
    align="center",
    baseline="middle",
    # Nudges text to right so it doesn't appear on top of the bar
    dx=20,
).encode(text="perc_users:Q")

chart + text