title: New premium tier positioning    
author: Fabio Schmidt-Fischbach     
date: 2020-08-10   
region: EU   
summary: Existing standard users won’t be downgraded and we don’t know who our future “Eco” users are. We look at current standard MAUs - these are the customers most closely resembling our future “free tier” users. Current standard MAUs use the physical card a lot: >70% of them use it at least once a month.  17% of standard MAUs had more than one  space at some point in 2020. 16% of standard MAUs used the ATM more than three times  per month once in 2020. Only 4% of standard MAUs use both spaces & the ATM in 2020. How many people have an incentive to upgrade from Eco to Domestic because of spaces & atm? 29% of standard MAUs.
link: https://docs.google.com/presentation/d/1aX3rY8FMJ-NMiKJ_VNYaCOHm9oKxgenVEBFqSuMqv6E/edit?usp=sharing       
tags: memberships, domestic, eco, tier, premium, product marketing    

In [1]:
import pandas as pd
import numpy as np
import altair as alt

### Part 1. Understanding our up-sell levers for our new tiers 

By introducing the new Tiers Eco and Domestic we add differentiation to the product. Some users will be more affected than others by added/removed features. The goal of this section is to understand which features affect most users.


In [None]:
query = """

 


with ucm as ( 

select user_created, 
					 sum(case when month like '2019-%' then value::float/100 end) as ucm_2019,
					 sum(case when month like '2019-%' and value > 0 then value::float/100 end) as rev_2019,
                     sum(value::float/100) as ucm, 
                     sum(case when value > 0 then value::float/100 end) as rev
from dbt.ucm_pnl
group by 1 

), 
sp_tx as (  
select user_created, date_trunc('month', txn_date) as month, sum(n_spaces_dt) as n_spaces_dt, sum(n_spaces_ct) as n_spaces_ct
from dbt.zrh_txn_day 
group by 1,2 
)

select  zu.user_id, 
		dwh.month,
		start_time,
		kyc_first_completed, 
		case when act.user_created is not null then 1 else 0 end as mau, 
		product_id, 
		spaces_balance_cents::float/100 as spaces_euro,
		all_accounts_balance_cents::float/100 as total_euro,
	    tnc_country_group, 
	    n_spaces,
	    age_group, 
	    gender, 
	    is_expat, 
	    ucm,
	    rev,
	    n_spaces_dt, 
	    n_spaces_ct,
		count(distinct case when payment_scheme = 'SPACES' then txn_id end) as n_spaces_tx, 
		sum(case when payment_scheme = 'SPACES' then amount_cents::float/100 end) as vol_spaces,
		count(distinct case when type = 'PT' and card_tx_type = 'atm' then txn_id end) as n_atm,
		count(distinct case when type = 'PT' and card_tx_type = 'ecomm' then txn_id end) as n_ecomm,
		count(distinct case when type = 'PT' and card_tx_type = 'cardpresent' then txn_id end) as n_physical,
		sum(case when type = 'PT' and card_tx_type = 'atm' then amount_cents::float/100 end) as vol_atm,
		sum(case when type = 'PT' and card_tx_type = 'ecomm' then amount_cents::float/100 end) as vol_ecommerce,
		sum(case when type = 'PT' and card_tx_type = 'cardpresent' then amount_cents::float/100 end) as vol_cardpresent,

		count(distinct case when type = 'PT' and wallet in ('apple', 'google') then txn_id end) as n_virtual,
		sum(case when type = 'PT' and wallet in ('apple', 'google') then amount_cents::float/100 end) as vol_virtual
from dwh_cohort_months as dwh 
inner join dbt.zrh_users as zu on zu.kyc_first_completed <= end_time 
inner join dbt.stg_cohort_first_active as stg on stg.user_created = zu.user_created
left join dbt.zrh_transactions as zt on zt.user_created = zu.user_created and date_trunc('month', txn_ts) = start_time 
left join dbt.zrh_user_activity_txn as act on act.user_created = zu.user_created and end_time between activity_start and activity_end and activity_type = '1_tx_35'
left join dbt.zrh_spaces as spaces on spaces.user_created = zu.user_created and spaces.month = to_char(dwh.start_time, 'YYYY-MM-DD')
left join ucm on ucm.user_created = zu.user_created
left join sp_tx on sp_tx.user_created = zu.user_created and sp_tx.month = dwh.start_time 
where start_time >= '2020-01-01' and start_time < date_trunc('month', current_date) and act.user_created is not null and product_id = 'STANDARD'
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
order by 1,2 
 
"""

70% of standard MAUs in the recent months use their cards to make a physical payment.    
- having a physical card is still integral to our active user's financial behavior. 

In [84]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["physical"] = 0
df.loc[df["n_physical"] > 0, "physical"] = 1

df = df.groupby(["month"])["physical"].agg("mean").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X("month", axis=alt.Axis(title="Month of tx")),
    y=alt.Y(
        "physical",
        axis=alt.Axis(
            format="%", title="% of Standard MAUs that did a tx with a physical card"
        ),
    ),
).properties(
    width=500,
    height=500,
    title="% of standard maus that use the physical card for a payment",
)

However, also virtual payment methods are on the rise. In July, 20% of Standard MAUs made at least one google or apple pay transaction a month. 

In [82]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["virtual"] = 0
df.loc[df["n_virtual"] > 0, "virtual"] = 1

df = df.groupby(["month"])["virtual"].agg("mean").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X("month", axis=alt.Axis(title="Month of tx")),
    y=alt.Y(
        "virtual",
        axis=alt.Axis(
            format="%", title="% of Standard MAUs that did a tx with a virtual card"
        ),
    ),
).properties(
    width=500, height=500, title="% of standard maus that use google/apple pay"
)

What about spaces? Roughly 18% of standard MAUs do a spaces transaction each month. 

In [89]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces"] = 0
df.loc[df["n_spaces"] > 0, "spaces"] = 1

df = df.groupby(["month"])["spaces"].agg("mean").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X("month", axis=alt.Axis(title="Month of tx")),
    y=alt.Y(
        "spaces",
        axis=alt.Axis(format="%", title="% of Standard MAUs that have a space"),
    ),
).properties(width=500, height=500, title="% of standard maus that have a space")

In [3]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces"] = 0
df.loc[df["n_spaces"] > 1, "spaces"] = 1

df = df.groupby(["month"])["spaces"].agg("mean").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X("month", axis=alt.Axis(title="Month of tx")),
    y=alt.Y(
        "spaces",
        axis=alt.Axis(
            format="%", title="% of Standard MAUs that have more than one space"
        ),
    ),
).properties(
    width=500, height=500, title="% of standard maus that have more than one space"
)

In [50]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df = df.loc[df["month"] == "2020-07", :]

df = df.loc[df["n_spaces"] > 0, :]

df = df.groupby(["n_spaces_tx"])["user_id"].agg("nunique").reset_index()


df["perc"] = 100 * df["user_id"] / sum(df["user_id"])
df["cum"] = df["perc"].cumsum()

alt.Chart(df.loc[df["n_spaces_tx"] < 30, :]).mark_line().encode(
    x=alt.X("n_spaces_tx", axis=alt.Axis(title="# of spaces transactions")),
    y=alt.Y("cum", axis=alt.Axis(title="Percentile")),
).properties(
    width=500,
    height=500,
    title="# of spaces transactions (out of those that have a space)",
)

In [47]:
df.head()

Unnamed: 0,n_spaces_tx,user_id,perc,cum
0,0,808921,100.0,100.0
1,1,472,100.0,200.0
2,2,41712,100.0,300.0
3,3,240,100.0,400.0
4,4,32438,100.0,500.0


In [90]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces"] = 0
df.loc[df["n_spaces_tx"] > 0, "spaces"] = 1

df = df.groupby(["month"])["spaces"].agg("mean").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X("month", axis=alt.Axis(title="Month of tx")),
    y=alt.Y(
        "spaces",
        axis=alt.Axis(format="%", title="% of Standard MAUs that made a spaces tx"),
    ),
).properties(width=500, height=500, title="% of standard maus that made a spaces tx")

In [2]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

# what % of standard maus did a physical card tx?
df["atm"] = 0
df.loc[df["n_atm"] > 3, "atm"] = 1

df = df.groupby(["month"])["atm"].agg("mean").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X("month", axis=alt.Axis(title="Month of tx")),
    y=alt.Y(
        "atm",
        axis=alt.Axis(
            format="%", title="% of Standard MAUs that do more than 3 atm withdrawals"
        ),
    ),
).properties(
    width=500,
    height=500,
    title="% of standard maus that do more than 3 atm withdrawals",
)

In [3]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

# what % of standard maus did a physical card tx?
df["atm"] = 0
df.loc[df["n_atm"] > 3, "atm"] = 1

df = df.groupby(["month", "tnc_country_group"])["atm"].agg("mean").reset_index()

alt.Chart(
    df.loc[df["tnc_country_group"].str.contains("GBR") == False, :]
).mark_line().encode(
    x=alt.X("month", axis=alt.Axis(title="Month of tx")),
    y=alt.Y(
        "atm",
        axis=alt.Axis(
            format="%", title="% of Standard MAUs that do more than 3 atm withdrawals"
        ),
    ),
    color="tnc_country_group",
).properties(
    width=500,
    height=500,
    title="% of standard maus that do more than 3 atm withdrawals",
)

In [109]:
df["tnc_country_group"].str.contains("GBR").unique()

array([False,  True])

Let's imagine users are willing to purchase a 10 Euro physical card, but now need to decide whether to upgrade because of 
- ATM withdrawal limit 
- spaces restriction 

How many users are concerned by these two constraints? 

In [115]:
df = pd.read_csv("tier_positioning_recenttx.csv")

df["spaces_user"] = False
df.loc[df["n_spaces"] > 0, "spaces_user"] = True

df["atm_user"] = False
df.loc[df["n_atm"] > 2, "atm_user"] = True

df = df.loc[df["month"] == "2020-07", :]

df = df.groupby(["atm_user", "spaces_user"])["user_id"].agg("nunique").reset_index()

df["perc"] = round(100 * df["user_id"] / sum(df["user_id"]))

# Configure common options
base = alt.Chart(df).encode(
    alt.X("atm_user:O", scale=alt.Scale(paddingInner=0)),
    alt.Y("spaces_user:O", scale=alt.Scale(paddingInner=0)),
)

# Configure heatmap
heatmap = (
    base.mark_rect()
    .encode(
        color=alt.Color(
            "perc:Q",
            scale=alt.Scale(scheme="greenblue"),
            legend=alt.Legend(direction="horizontal"),
        )
    )
    .properties(
        width=500,
        height=500,
        title="% of user that fall into each category (July 2020)",
    )
)

# Configure text
text = base.mark_text(baseline="middle").encode(
    text="perc:Q",
)

# Draw the chart
heatmap + text

In [4]:
df = pd.read_csv("tier_positioning_recenttx.csv")

df["spaces_user"] = False
df.loc[df["n_spaces"] > 0, "spaces_user"] = True

df["atm_user"] = False
df.loc[df["n_atm"] > 3, "atm_user"] = True

df = (
    df.groupby(["user_id"])
    .agg({"spaces_user": "max", "atm_user": "max", "start_time": "min"})
    .reset_index()
)

# drop too young customers.
df = df.loc[pd.to_datetime(df["start_time"]) == pd.to_datetime("2020-01-01"), :]

df = df.groupby(["atm_user", "spaces_user"])["user_id"].agg("nunique").reset_index()

df["perc"] = round(100 * df["user_id"] / sum(df["user_id"]))

# Configure common options
base = alt.Chart(df).encode(
    alt.X("atm_user:O", scale=alt.Scale(paddingInner=0)),
    alt.Y("spaces_user:O", scale=alt.Scale(paddingInner=0)),
)

# Configure heatmap
heatmap = (
    base.mark_rect()
    .encode(
        color=alt.Color(
            "perc:Q",
            scale=alt.Scale(scheme="greenblue"),
            legend=alt.Legend(direction="horizontal"),
        )
    )
    .properties(
        width=500,
        height=500,
        title="% of user that since Jan 2020 fell at some point into either category",
    )
)

# Configure text
text = base.mark_text(baseline="middle").encode(
    text="perc:Q",
)

# Draw the chart
heatmap + text

In [5]:
df = pd.read_csv("tier_positioning_recenttx.csv")

df["spaces_user"] = False
df.loc[df["n_spaces"] > 1, "spaces_user"] = True

df["atm_user"] = False
df.loc[df["n_atm"] > 3, "atm_user"] = True

df = (
    df.groupby(["user_id"])
    .agg({"spaces_user": "max", "atm_user": "max", "start_time": "min"})
    .reset_index()
)

# drop too young customers.
df = df.loc[pd.to_datetime(df["start_time"]) == pd.to_datetime("2020-01-01"), :]

df = df.groupby(["atm_user", "spaces_user"])["user_id"].agg("nunique").reset_index()

df["perc"] = round(100 * df["user_id"] / sum(df["user_id"]))

# Configure common options
base = alt.Chart(df).encode(
    alt.X("atm_user:O", scale=alt.Scale(paddingInner=0)),
    alt.Y("spaces_user:O", scale=alt.Scale(paddingInner=0)),
)

# Configure heatmap
heatmap = (
    base.mark_rect()
    .encode(
        color=alt.Color(
            "perc:Q",
            scale=alt.Scale(scheme="greenblue"),
            legend=alt.Legend(direction="horizontal"),
        )
    )
    .properties(
        width=500,
        height=500,
        title="% of user that since Jan 2020 fell at some point into either category",
    )
)

# Configure text
text = base.mark_text(baseline="middle").encode(
    text="perc:Q",
)

# Draw the chart
heatmap + text

### What % of users would have a financial benefit from usiing domestic rather than adopting their user behavior?

In [61]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df.loc[df["n_atm"] >= 5, "n_atm"] = 5

df["Zero free withdrawals"] = df["n_atm"] * 2
df["1 free withdrawal"] = (df["n_atm"] - 1) * 2
df["2 free withdrawals"] = (df["n_atm"] - 2) * 2
df["3 free withdrawals"] = (df["n_atm"] - 3) * 2
df["4 free withdrawals"] = (df["n_atm"] - 4) * 2


df = df.loc[
    :,
    [
        "user_id",
        "month",
        "Zero free withdrawals",
        "1 free withdrawal",
        "2 free withdrawals",
        "3 free withdrawals",
        "4 free withdrawals",
    ],
]

# melt.
df = pd.melt(df, id_vars=["user_id", "month"])

df["profitable"] = 0
df.loc[df["value"] >= 5, "profitable"] = 1

df = df.groupby(["month", "variable"])["profitable"].agg("mean").reset_index()

alt.Chart(df).mark_line().encode(
    x=alt.X("month", axis=alt.Axis(title="Month")),
    y=alt.Y(
        "profitable",
        axis=alt.Axis(
            format="%",
            title="% of standard MAUs for whom domestic is cheaper than paying for all additional atms",
        ),
    ),
    color="variable",
).properties(
    width=500,
    height=500,
    title="% of standard MAUs that financially benefit from domestic due to ATM fees",
)

###  Part 2.0 Recent activity and demographic info by group   




In [6]:
df = pd.read_csv("tier_positioning_recenttx.csv")

df["spaces_user"] = False
df.loc[df["n_spaces"] > 1, "spaces_user"] = True

df["atm_user"] = False
df.loc[df["n_atm"] > 3, "atm_user"] = True

df = (
    df.groupby(["user_id", "tnc_country_group"])
    .agg({"spaces_user": "max", "atm_user": "max", "start_time": "min"})
    .reset_index()
)

# drop too young customers.
df = df.loc[pd.to_datetime(df["start_time"]) == pd.to_datetime("2020-01-01"), :]
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df = (
    df.groupby(["tnc_country_group", "incentive"])["user_id"]
    .agg("nunique")
    .reset_index()
)
df["perc"] = df["user_id"] / df.groupby(["tnc_country_group"])["user_id"].transform(
    "sum"
)

df = df.loc[df["incentive"] != "No incentive", :]

alt.Chart(
    df.loc[df["tnc_country_group"].str.contains("GBR") == False, :]
).mark_bar().encode(
    x=alt.X("tnc_country_group", axis=alt.Axis(title="Market")),
    y=alt.Y(
        "perc",
        axis=alt.Axis(
            format="%", title="% of standard MAUs that have an incentive to upgrade"
        ),
    ),
    color="incentive",
).properties(
    width=500, height=500, title="% of standard MAUs that have an incentive to upgrade"
)

In [7]:
df = pd.read_csv("tier_positioning_recenttx.csv")

df["spaces_user"] = False
df.loc[df["n_spaces"] > 1, "spaces_user"] = True

df["atm_user"] = False
df.loc[df["n_atm"] > 3, "atm_user"] = True

df = (
    df.groupby(["user_id", "age_group"])
    .agg({"spaces_user": "max", "atm_user": "max", "start_time": "min"})
    .reset_index()
)

# drop too young customers.
df = df.loc[pd.to_datetime(df["start_time"]) == pd.to_datetime("2020-01-01"), :]
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df = df.groupby(["age_group", "incentive"])["user_id"].agg("nunique").reset_index()
df["perc"] = df["user_id"] / df.groupby(["age_group"])["user_id"].transform("sum")

df = df.loc[df["incentive"] != "No incentive", :]

alt.Chart(df).mark_bar().encode(
    x=alt.X("age_group", axis=alt.Axis(title="Age group")),
    y=alt.Y(
        "perc",
        axis=alt.Axis(
            format="%", title="% of standard MAUs that have an incentive to upgrade"
        ),
    ),
    color="incentive",
).properties(
    width=500, height=500, title="% of standard MAUs that have an incentive to upgrade"
)

In [8]:
df = pd.read_csv("tier_positioning_recenttx.csv")

df["spaces_user"] = False
df.loc[df["n_spaces"] > 1, "spaces_user"] = True

df["atm_user"] = False
df.loc[df["n_atm"] > 3, "atm_user"] = True

df = (
    df.groupby(["user_id", "age_group"])
    .agg({"spaces_user": "max", "atm_user": "max", "start_time": "min"})
    .reset_index()
)

# drop too young customers.
df = df.loc[pd.to_datetime(df["start_time"]) == pd.to_datetime("2020-01-01"), :]
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df = df.groupby(["age_group", "incentive"])["user_id"].agg("nunique").reset_index()
df["perc"] = df["user_id"] / df.groupby(["incentive"])["user_id"].transform("sum")

df["cum"] = df.groupby(["incentive"])["perc"].cumsum()

alt.Chart(df).mark_line().encode(
    x=alt.X("age_group", axis=alt.Axis(title="Age group")),
    y=alt.Y("cum", axis=alt.Axis(format="%", title="Percentile")),
    color="incentive",
).properties(width=500, height=500, title="Age distribution by group")

In [9]:
df = pd.read_csv("tier_positioning_recenttx.csv")

df["spaces_user"] = False
df.loc[df["n_spaces"] > 1, "spaces_user"] = True

df["atm_user"] = False
df.loc[df["n_atm"] > 3, "atm_user"] = True

df = (
    df.groupby(["user_id", "gender"])
    .agg({"spaces_user": "max", "atm_user": "max", "start_time": "min"})
    .reset_index()
)

# drop too young customers.
df = df.loc[pd.to_datetime(df["start_time"]) == pd.to_datetime("2020-01-01"), :]
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df = df.groupby(["gender", "incentive"])["user_id"].agg("nunique").reset_index()
df["perc"] = df["user_id"] / df.groupby(["gender"])["user_id"].transform("sum")
df = df.loc[df["incentive"] != "No incentive", :]

alt.Chart(df).mark_bar().encode(
    x=alt.X("gender", axis=alt.Axis(title="Gender")),
    y=alt.Y(
        "perc",
        axis=alt.Axis(
            format="%", title="% of standard MAUs that have an incentive to upgrade"
        ),
    ),
    color="incentive",
).properties(
    width=500, height=500, title="% of standard MAUs that have an incentive to upgrade"
)

In [10]:
df = pd.read_csv("tier_positioning_recenttx.csv")

df["spaces_user"] = False
df.loc[df["n_spaces"] > 1, "spaces_user"] = True

df["atm_user"] = False
df.loc[df["n_atm"] > 3, "atm_user"] = True

df = (
    df.groupby(["user_id", "gender"])
    .agg({"spaces_user": "max", "atm_user": "max", "start_time": "min"})
    .reset_index()
)

# drop too young customers.
df = df.loc[pd.to_datetime(df["start_time"]) == pd.to_datetime("2020-01-01"), :]
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df = df.groupby(["gender", "incentive"])["user_id"].agg("nunique").reset_index()
df["perc"] = df["user_id"] / df.groupby(["incentive"])["user_id"].transform("sum")

alt.Chart(df.loc[df["gender"] == "FEMALE", :]).mark_bar().encode(
    x=alt.X("incentive", axis=alt.Axis(title="Gender")),
    y=alt.Y("perc", axis=alt.Axis(format="%", title="% female")),
    color="incentive",
).properties(width=500, height=500, title="% female")

In [11]:
df = pd.read_csv("tier_positioning_recenttx.csv")

df["spaces_user"] = False
df.loc[df["n_spaces"] > 1, "spaces_user"] = True

df["atm_user"] = False
df.loc[df["n_atm"] > 3, "atm_user"] = True

df = (
    df.groupby(["user_id", "is_expat"])
    .agg({"spaces_user": "max", "atm_user": "max", "start_time": "min"})
    .reset_index()
)

# drop too young customers.
df = df.loc[pd.to_datetime(df["start_time"]) == pd.to_datetime("2020-01-01"), :]
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df = df.groupby(["is_expat", "incentive"])["user_id"].agg("nunique").reset_index()
df["perc"] = df["user_id"] / df.groupby(["incentive"])["user_id"].transform("sum")

alt.Chart(df.loc[df["is_expat"] == True, :]).mark_bar().encode(
    x=alt.X("incentive", axis=alt.Axis(title="Group")),
    y=alt.Y("perc", axis=alt.Axis(format="%", title="% expat")),
    color="incentive",
).properties(width=500, height=500, title="% expat")

In [12]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces_u"] = False
df.loc[df["n_spaces"] > 1, "spaces_u"] = True

df["atm_u"] = False
df.loc[df["n_atm"] > 3, "atm_u"] = True

df["total"] = df["n_ecomm"] + df["n_physical"]

df["spaces_user"] = df.groupby(["user_id"])["spaces_u"].transform("max")
df["atm_user"] = df.groupby(["user_id"])["atm_u"].transform("max")
df["first_month"] = df.groupby(["user_id"])["start_time"].transform("min")
# keep only users that were there on the first month
df = df.loc[pd.to_datetime(df["first_month"]) == pd.to_datetime("2020-01-01"), :]

# code up incentive structure
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df = df.groupby(["incentive", "month"]).agg({"total": "median"}).reset_index()

alt.Chart(df).mark_line().encode(
    x=alt.X("month", axis=alt.Axis(title="Month")),
    y=alt.Y(
        "total:Q",
        axis=alt.Axis(
            title="Median number of card transactions (physical + ecommerce)"
        ),
    ),
    color="incentive",
).properties(
    width=500,
    height=500,
    title="Median number of card transactions (physical + ecommerce)",
)

In [13]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces_u"] = False
df.loc[df["n_spaces"] > 1, "spaces_u"] = True

df["atm_u"] = False
df.loc[df["n_atm"] > 3, "atm_u"] = True

df["total"] = df["n_ecomm"] + df["n_physical"]

df["spaces_user"] = df.groupby(["user_id"])["spaces_u"].transform("max")
df["atm_user"] = df.groupby(["user_id"])["atm_u"].transform("max")
df["first_month"] = df.groupby(["user_id"])["start_time"].transform("min")
# keep only users that were there on the first month
df = df.loc[pd.to_datetime(df["first_month"]) == pd.to_datetime("2020-01-01"), :]

# code up incentive structure
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df = df.groupby(["incentive"]).agg({"total_euro": "median"}).reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X("incentive:N", axis=alt.Axis(title="Group")),
    y=alt.Y(
        "total_euro:Q", axis=alt.Axis(title="Median current total account balance")
    ),
).properties(width=500, height=500, title="Median current total account balance")

### 2.1. PNL by group 

In [24]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces_u"] = False
df.loc[df["n_spaces"] > 1, "spaces_u"] = True

df["atm_u"] = False
df.loc[df["n_atm"] > 3, "atm_u"] = True

df["total"] = df["n_ecomm"] + df["n_physical"]

df["spaces_user"] = df.groupby(["user_id"])["spaces_u"].transform("max")
df["atm_user"] = df.groupby(["user_id"])["atm_u"].transform("max")
df["first_month"] = df.groupby(["user_id"])["start_time"].transform("min")

# code up incentive structure
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df["cost"] = df["ucm"] - df["rev"]
df["revenue"] = df["rev"]
df["net"] = df["ucm"]

df = pd.melt(
    df, id_vars=["user_id", "incentive"], value_vars=["revenue", "cost", "net"]
)

df = df.groupby(["incentive", "variable"]).agg({"value": "mean"}).reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X("variable:N", axis=alt.Axis(title="Group")),
    y=alt.Y("value:Q", axis=alt.Axis(title="Avg lifetime PNL")),
    color="variable",
).properties(width=200, height=200, title="Avg lifetime PNL").facet(
    facet="incentive", columns=2
)

In [16]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces_u"] = False
df.loc[df["n_spaces"] > 1, "spaces_u"] = True

df["atm_u"] = False
df.loc[df["n_atm"] > 3, "atm_u"] = True

df["total"] = df["n_ecomm"] + df["n_physical"]

df["spaces_user"] = df.groupby(["user_id"])["spaces_u"].transform("max")
df["atm_user"] = df.groupby(["user_id"])["atm_u"].transform("max")
df["first_month"] = df.groupby(["user_id"])["start_time"].transform("min")

# code up incentive structure
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df["cost"] = df["ucm"] - df["rev"]
df["revenue"] = df["rev"]
df["net"] = df["ucm"]

df["year"] = pd.to_datetime(df["kyc_first_completed"]).dt.year

df = pd.melt(
    df, id_vars=["user_id", "incentive", "year"], value_vars=["revenue", "cost", "net"]
)

df = df.groupby(["incentive", "variable", "year"]).agg({"value": "mean"}).reset_index()

alt.Chart(df).mark_line().encode(
    x=alt.X("year:N", axis=alt.Axis(title="KYC year")),
    y=alt.Y("value:Q", axis=alt.Axis(title="Avg. lifetime per user Euro value")),
    color="incentive",
    column="variable",
).properties(
    width=300, height=300, title="Avg. lifetime per user Euro value by KYC year"
)

In [25]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces_u"] = False
df.loc[df["n_spaces"] > 1, "spaces_u"] = True

df["atm_u"] = False
df.loc[df["n_atm"] > 3, "atm_u"] = True

df["total"] = df["n_ecomm"] + df["n_physical"]

df["spaces_user"] = df.groupby(["user_id"])["spaces_u"].transform("max")
df["atm_user"] = df.groupby(["user_id"])["atm_u"].transform("max")
df["first_month"] = df.groupby(["user_id"])["start_time"].transform("min")

# code up incentive structure
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df["net"] = round(df["ucm"], -1)
df["year"] = pd.to_datetime(df["kyc_first_completed"]).dt.year

df = df.groupby(["incentive", "net", "year"])["user_id"].agg("nunique").reset_index()

df["perc"] = (
    100 * df["user_id"] / df.groupby(["incentive", "year"])["user_id"].transform("sum")
)
df["cum"] = df.groupby(["incentive", "year"])["perc"].cumsum()

alt.Chart(df.loc[(abs(df["net"]) < 150) & (df["year"] >= 2017), :]).mark_line().encode(
    x=alt.X("net:N", axis=alt.Axis(title="Net PNL lifetime")),
    y=alt.Y("cum:Q", axis=alt.Axis(title="Percentile")),
    color="incentive",
).properties(
    width=300, height=300, title="Distribution of lifetime PNL by KYC year"
).facet(
    facet="year", columns=2
)

In [41]:
alt.Chart(df.loc[(abs(df["net"]) < 150) & (df["year"] >= 2017), :]).mark_line().encode(
    x=alt.X("net:N", axis=alt.Axis(title="Net PNL lifetime")),
    y=alt.Y("cum:Q", axis=alt.Axis(title="Percentile")),
    color="incentive",
).properties(
    width=400, height=300, title="Distribution of lifetime PNL by KYC year"
).facet(
    facet="year", columns=2
)

In [28]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces_u"] = False
df.loc[df["n_spaces"] > 1, "spaces_u"] = True

df["atm_u"] = False
df.loc[df["n_atm"] > 3, "atm_u"] = True

df["total"] = df["n_ecomm"] + df["n_physical"]

df["spaces_user"] = df.groupby(["user_id"])["spaces_u"].transform("max")
df["atm_user"] = df.groupby(["user_id"])["atm_u"].transform("max")
df["first_month"] = df.groupby(["user_id"])["start_time"].transform("min")

# code up incentive structure
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df["net"] = round(df["ucm"], -1)

df = df.groupby(["incentive", "net"])["user_id"].agg("nunique").reset_index()

df["perc"] = 100 * df["user_id"] / df.groupby(["incentive"])["user_id"].transform("sum")
df["cum"] = df.groupby(["incentive"])["perc"].cumsum()

alt.Chart(df.loc[abs(df["net"]) < 100, :]).mark_line().encode(
    x=alt.X("net:N", axis=alt.Axis(title="Net PNL lifetime")),
    y=alt.Y("cum:Q", axis=alt.Axis(title="Percentile")),
    color="incentive",
).properties(width=400, height=400, title="Distribution of lifetime PNL")

In [25]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces_u"] = False
df.loc[df["n_spaces"] > 0, "spaces_u"] = True

df["atm_u"] = False
df.loc[df["n_atm"] > 2, "atm_u"] = True

df["total"] = df["n_ecomm"] + df["n_physical"]

df["spaces_user"] = df.groupby(["user_id"])["spaces_u"].transform("max")
df["atm_user"] = df.groupby(["user_id"])["atm_u"].transform("max")
df["first_month"] = df.groupby(["user_id"])["start_time"].transform("min")

# keep only users that were there for the entire year of 2019.
df = df.loc[
    pd.to_datetime(df["kyc_first_completed"]) <= pd.to_datetime("2019-01-01"), :
]

# code up incentive structure
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

df["net"] = round(df["ucm"], -1)

df = df.groupby(["incentive", "net"])["user_id"].agg("nunique").reset_index()

df["perc"] = 100 * df["user_id"] / df.groupby(["incentive"])["user_id"].transform("sum")
df["cum"] = df.groupby(["incentive"])["perc"].cumsum()

alt.Chart(df.loc[abs(df["net"]) < 150, :]).mark_line().encode(
    x=alt.X("net:N", axis=alt.Axis(title="Net PNL per user")),
    y=alt.Y("perc:Q", axis=alt.Axis(title="% of users")),
    color="incentive",
).properties(width=400, height=400, title="Distribution of net PNL")

KeyboardInterrupt: 

In [1]:
query = """

select cmd_users.id as user_id, 
		month, 
		product_group,
		revenue_cost,
		sum(value::float/100) as value 
from dbt.ucm_pnl as pnl  
inner join dbt.ucm_mapping as map on map.label = pnl.label  
inner join cmd_users using (user_created) 
group by 1,2,3,4
order by 1,2,3,4 


"""

In [1]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces_u"] = False
df.loc[df["n_spaces"] > 1, "spaces_u"] = True

df["atm_u"] = False
df.loc[df["n_atm"] > 3, "atm_u"] = True

df["total"] = df["n_ecomm"] + df["n_physical"]

df["spaces_user"] = df.groupby(["user_id"])["spaces_u"].transform("max")
df["atm_user"] = df.groupby(["user_id"])["atm_u"].transform("max")
df["first_month"] = df.groupby(["user_id"])["start_time"].transform("min")

# keep only users that were there for the entire year of 2019.
df = df.loc[
    pd.to_datetime(df["kyc_first_completed"]) <= pd.to_datetime("2019-01-01"), :
]

# code up incentive structure
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

# go to unique level on user-id
df["rn"] = df.groupby(["user_id"]).cumcount()
df = df.loc[df["rn"] == 0, :]

# load ucm items.
ucm = pd.read_csv("ucm_panel.csv")

# aggregate across time.
ucm = (
    ucm.groupby(["user_id", "product_group", "revenue_cost"])["value"]
    .agg("sum")
    .reset_index()
)

final = df.merge(ucm, on=["user_id"], how="inner")

# pivot table to wide and add zeroes
final = pd.pivot_table(
    final,
    values="value",
    index=["user_id", "incentive", "revenue_cost"],
    columns="product_group",
    fill_value=0,
).reset_index()
# melt table
final = pd.melt(final, id_vars=["user_id", "incentive", "revenue_cost"])

# aggregate to user.
final = final.groupby(["incentive", "user_id", "product_group", "revenue_cost"])[
    "value"
].agg(
    "mean"
)  # compute revenue and cost per user.

final.to_csv("tier_positioning_mergeducm.csv")

NameError: name 'pd' is not defined

In [18]:
df = pd.read_csv("tier_positioning_mergeducm.csv")

df = (
    df.groupby(["incentive", "revenue_cost", "product_group"])["value"]
    .agg("mean")
    .reset_index()
)

df = df.loc[df["revenue_cost"] == "Cost", :]

alt.Chart(df).mark_bar().encode(
    x=alt.X(
        "incentive",
        sort=alt.EncodingSortField(field="value", op="mean", order="descending"),
    ),
    y=alt.Y("value:Q", axis=alt.Axis(title="Avg. lifetime cost per user")),
    color="incentive",
).properties(width=200, height=100).facet(facet="product_group", columns=3)

In [19]:
df = pd.read_csv("tier_positioning_mergeducm.csv")

df = (
    df.groupby(["incentive", "revenue_cost", "product_group"])["value"]
    .agg("mean")
    .reset_index()
)
df = df.loc[df["revenue_cost"] != "Cost", :]

alt.Chart(df).mark_bar().encode(
    x=alt.X(
        "incentive",
        sort=alt.EncodingSortField(field="value", op="mean", order="descending"),
    ),
    y=alt.Y("value:Q", axis=alt.Axis(title="Avg. lifetime revenue")),
    color="incentive",
).properties(width=200, height=100).facet(facet="product_group", columns=3)

In [None]:
query = """
select cmd_users.id as user_id, 
		month, 
		sum(value::float/100) as net, 
		sum(case when value > 0 then value::float/100 end) as rev 
from dbt.ucm_pnl 
inner join cmd_users using (user_created) 
group by 1,2 
"""

In [None]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["spaces_u"] = False
df.loc[df["n_spaces"] > 0, "spaces_u"] = True

df["atm_u"] = False
df.loc[df["n_atm"] > 2, "atm_u"] = True

df["total"] = df["n_ecomm"] + df["n_physical"]

df["spaces_user"] = df.groupby(["user_id"])["spaces_u"].transform("max")
df["atm_user"] = df.groupby(["user_id"])["atm_u"].transform("max")
df["first_month"] = df.groupby(["user_id"])["start_time"].transform("min")

# keep only users that were there for the entire year of 2019.
df = df.loc[
    pd.to_datetime(df["kyc_first_completed"]) <= pd.to_datetime("2019-01-01"), :
]

# code up incentive structure
df["incentive"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
df.loc[
    (df["atm_user"] == True) & (df["spaces_user"] == False), "incentive"
] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["spaces_user"] == True), "incentive"
] = "Only Spaces"

# go to unique level on user-id
df["rn"] = df.groupby(["user_id"]).cumcount()
df = df.loc[df["rn"] == 0, :]

# merge to pnl panel

In [30]:
query = """
with ucm as ( 

select user_created, sum(case when month like '2018-%' then value::float/100 end) as ucm_2018,
					 sum(case when month like '2019-%' then value::float/100 end) as ucm_2019 
from dbt.ucm_pnl
group by 1 
)

select 
b.user_id,
c.user_created,
a.*,
ucm_2018, 
ucm_2019, 
c.product_id as current_product, 
count(b.user_id) over (partition by a.cluster_new) as cluster_new_size
from dev_dbt.user_clusters a 
left join dev_dbt.user_clusters_mapping b 
on a.id = b.id
left join dbt.zrh_users c 
on b.user_id = c.user_id
left join ucm on ucm.user_created = c.user_created;

"""

### Part 2.2. Leverage existing clustering work by Wendy


The second piece of this research leverages the existing user clustering that Wendy Vu conducted in Spring 2020. For details on how she leveraged financial transaction behavior to cluster users into groups, please compare [her slide deck](https://docs.google.com/presentation/d/13Diykdi_HBRrUf_rTiLXcH29OzuSQmRbJO32XQJSnkM/edit#slide=id.g76cb8b17e3_0_485).

Let's first get a feeling for our sample. 
- We have users from all core EU markets in both premium and standard products. 
- All users in the sample KYCc in September/October 2018.
- In replication studies we found these cluster groups to be robust for similarly old cohorts. 


In [116]:
df = pd.read_csv("tier_positioning.csv")
mapping = {
    1: "Primary account",
    2: "Spaces power user",
    3: "Secondary spender",
    4: "International traveller",
    5: "Euro traveller",
    6: "Cash26er",
    7: "Unconvinced",
    8: "Holding account",
    9: "Barely active",
    10: "Referral&Run",
}
df["cluster_str"] = df["cluster_new"].map(mapping)

# first understand the sample.

df = df.groupby(["market", "membership"])["user_id"].agg("nunique").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X("market:N", axis=alt.Axis(title="market")),
    y=alt.Y("user_id:Q", axis=alt.Axis(title="Number of users")),
    color="membership:N",
).properties(width=500, height=500, title="Sample")

### What types of user groups do we see in the transaction behavior? 
This subsets to only "Standard" customers - as they are pre-dominantly affected by the introduction of the new tiers. 

- 11% of users are primary account spenders (most of them in Germany (from these cohorts). 
- 22% of users are regular secondary account spenders. 
- 8% are international or euro travellers. 

These are our most active customer groups. 









In [117]:
df = pd.read_csv("tier_positioning.csv")
mapping = {
    1: "Primary account",
    2: "Spaces power user",
    3: "Secondary spender",
    4: "International traveller",
    5: "Euro traveller",
    6: "Cash26er",
    7: "Unconvinced",
    8: "Holding account",
    9: "Barely active",
    10: "Referral&Run",
}
df["cluster_str"] = df["cluster_new"].map(mapping)

# only standard users
df = df.loc[df["membership"] == "standard", :]

# show distribution of clusters
df = df.groupby(["cluster_str", "cluster_new"])["user_id"].agg("nunique").reset_index()
df["group"] = 1
df["perc"] = 100 * df["user_id"] / df.groupby(["group"])["user_id"].transform("sum")

alt.Chart(df).mark_bar().encode(
    x=alt.X(
        "cluster_str:N",
        axis=alt.Axis(title="Cluster"),
        sort=alt.EncodingSortField(
            field="cluster_new", op="average", order="ascending"
        ),
    ),
    y=alt.Y("perc:Q", axis=alt.Axis(title="% of sample")),
    color="cluster_str:N",
).properties(width=500, height=500, title="Distribution of user clusters in sample")

... by market (note that sample sizes are not always large enough for this analysis to make sense. 

In [56]:
df = pd.read_csv("tier_positioning.csv")
mapping = {
    1: "Primary account",
    2: "Spaces power user",
    3: "Secondary spender",
    4: "International traveller",
    5: "Euro traveller",
    6: "Cash26er",
    7: "Unconvinced",
    8: "Holding account",
    9: "Barely active",
    10: "Referral&Run",
}
df["cluster_str"] = df["cluster_new"].map(mapping)

# only standard users
df = df.loc[df["membership"] == "standard", :]

# show distribution of clusters
df = (
    df.groupby(["cluster_str", "cluster_new", "market"])["user_id"]
    .agg("nunique")
    .reset_index()
)
df["perc"] = 100 * df["user_id"] / df.groupby(["market"])["user_id"].transform("sum")

alt.Chart(df.loc[df["market"] != "GBR"]).mark_bar().encode(
    x=alt.X(
        "cluster_str:N",
        axis=alt.Axis(title="Cluster"),
        sort=alt.EncodingSortField(
            field="cluster_new", op="average", order="ascending"
        ),
    ),
    y=alt.Y("perc:Q", axis=alt.Axis(title="% of sample")),
    color="cluster_str:N",
).properties(
    width=200, height=200, title="Distribution of user clusters in sample"
).facet(
    facet="market", columns=3
)

### Are all user groups equally profitable? Correlate cluster with PNL 

Financially, the two most important groups are 
- primary account users (frequent transactions generating domestic interchange revenue) 
- international travellers (frequent abroad transactions generating high international interchange revenue). 

In [49]:
df = pd.read_csv("tier_positioning.csv")
mapping = {
    1: "Primary account",
    2: "Spaces power user",
    3: "Secondary spender",
    4: "International traveller",
    5: "Euro traveller",
    6: "Cash26er",
    7: "Unconvinced",
    8: "Holding account",
    9: "Barely active",
    10: "Referral&Run",
}
df["cluster_str"] = df["cluster_new"].map(mapping)

# only standard users
df = df.loc[df["membership"] == "standard", :]

# show distribution of clusters
df = df.groupby(["cluster_str", "cluster_new"])["ucm_2019"].agg("mean").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X(
        "cluster_str:N",
        axis=alt.Axis(title="Cluster"),
        sort=alt.EncodingSortField(
            field="cluster_new", op="average", order="ascending"
        ),
    ),
    y=alt.Y("ucm_2019:Q", axis=alt.Axis(title="Avg. pnl of customers in 2019 in Euro")),
    color="cluster_str:N",
).properties(
    width=500, height=500, title="Avg. per user profitability (PNL) by cluster"
)

In [50]:
df = pd.read_csv("tier_positioning.csv")
mapping = {
    1: "Primary account",
    2: "Spaces power user",
    3: "Secondary spender",
    4: "International traveller",
    5: "Euro traveller",
    6: "Cash26er",
    7: "Unconvinced",
    8: "Holding account",
    9: "Barely active",
    10: "Referral&Run",
}
df["cluster_str"] = df["cluster_new"].map(mapping)

# only standard users
df = df.loc[df["membership"] == "standard", :]

# show distribution of clusters
df = df.groupby(["cluster_str", "cluster_new"])["ucm_2019"].agg("sum").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X(
        "cluster_str:N",
        axis=alt.Axis(title="Cluster"),
        sort=alt.EncodingSortField(
            field="cluster_new", op="average", order="ascending"
        ),
    ),
    y=alt.Y("ucm_2019:Q", axis=alt.Axis(title="sum of PNL customers in 2019 in Euro")),
    color="cluster_str:N",
).properties(width=500, height=500, title="Sum of profitability (PNL) by cluster")

### Users are pre-dominantly in their 30ies. 

In [48]:
df = pd.read_csv("tier_positioning.csv")
mapping = {
    1: "Primary account",
    2: "Spaces power user",
    3: "Secondary spender",
    4: "International traveller",
    5: "Euro traveller",
    6: "Cash26er",
    7: "Unconvinced",
    8: "Holding account",
    9: "Barely active",
    10: "Referral&Run",
}
df["cluster_str"] = df["cluster_new"].map(mapping)

# only standard users
df = df.loc[df["membership"] == "standard", :]

# show distribution of clusters
df = df.groupby(["cluster_str", "cluster_new"])["age"].agg("mean").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X(
        "cluster_str:N",
        axis=alt.Axis(title="Cluster"),
        sort=alt.EncodingSortField(
            field="cluster_new", op="average", order="ascending"
        ),
    ),
    y=alt.Y("age:Q", axis=alt.Axis(title="Avg. age")),
    color="cluster_str:N",
).properties(width=500, height=500, title="Avg. age by cluster")

### The % of female users ranges between 30-40%. 

In [53]:
df = pd.read_csv("tier_positioning.csv")
mapping = {
    1: "Primary account",
    2: "Spaces power user",
    3: "Secondary spender",
    4: "International traveller",
    5: "Euro traveller",
    6: "Cash26er",
    7: "Unconvinced",
    8: "Holding account",
    9: "Barely active",
    10: "Referral&Run",
}
df["cluster_str"] = df["cluster_new"].map(mapping)

# only standard users
df = df.loc[df["membership"] == "standard", :]

df["female"] = 0
df.loc[df["gender"] == "FEMALE", "female"] = 1

# show distribution of clusters
df = df.groupby(["cluster_str", "cluster_new"])["female"].agg("mean").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X(
        "cluster_str:N",
        axis=alt.Axis(title="Cluster"),
        sort=alt.EncodingSortField(
            field="cluster_new", op="average", order="ascending"
        ),
    ),
    y=alt.Y("female:Q", axis=alt.Axis(format="%", title="% female")),
    color="cluster_str:N",
).properties(width=500, height=500, title="% female")

### Primary account users are primarily expats (60%) - international travellers are commonly natives in their country (12%). 

In [54]:
df = pd.read_csv("tier_positioning.csv")
mapping = {
    1: "Primary account",
    2: "Spaces power user",
    3: "Secondary spender",
    4: "International traveller",
    5: "Euro traveller",
    6: "Cash26er",
    7: "Unconvinced",
    8: "Holding account",
    9: "Barely active",
    10: "Referral&Run",
}
df["cluster_str"] = df["cluster_new"].map(mapping)

# only standard users
df = df.loc[df["membership"] == "standard", :]

df["expat"] = 0
df.loc[df["nat_status"] == "expat", "expat"] = 1

# show distribution of clusters
df = df.groupby(["cluster_str", "cluster_new"])["expat"].agg("mean").reset_index()

alt.Chart(df).mark_bar().encode(
    x=alt.X(
        "cluster_str:N",
        axis=alt.Axis(title="Cluster"),
        sort=alt.EncodingSortField(
            field="cluster_new", op="average", order="ascending"
        ),
    ),
    y=alt.Y("expat:Q", axis=alt.Axis(format="%", title="% expat")),
    color="cluster_str:N",
).properties(width=500, height=500, title="% expat")

## How do users that have an incentive to upgrade differ from those that don't? 

- Those that have an incentive to upgrade are predominantly highly active users. 
    - 73% are in the three highest activity clusters:  24% (primary), 19% (spaces power user) or secondary spender (30%). 
    - 47% of those without a concrete incentive to upgrade fall into these high activity groups. 


In [20]:
df1 = pd.read_csv("tier_positioning_recenttx.csv")

df1["spaces_user"] = False
df1.loc[df1["n_spaces"] > 1, "spaces_user"] = True

df1["atm_user"] = False
df1.loc[df1["n_atm"] > 3, "atm_user"] = True

df1 = (
    df1.groupby(["user_id"])
    .agg({"spaces_user": "max", "atm_user": "max", "start_time": "min"})
    .reset_index()
)

# drop too young customers.
df1 = df1.loc[pd.to_datetime(df1["start_time"]) == pd.to_datetime("2020-01-01"), :]

# load clusters.
df2 = pd.read_csv("tier_positioning.csv")
mapping = {
    1: "Primary account",
    2: "Spaces power user",
    3: "Secondary spender",
    4: "International traveller",
    5: "Euro traveller",
    6: "Cash26er",
    7: "Unconvinced",
    8: "Holding account",
    9: "Barely active",
    10: "Referral&Run",
}
df2["cluster_str"] = df2["cluster_new"].map(mapping)

final = df1.merge(df2, on="user_id", how="inner")

final["incentive"] = "No incentive"
final.loc[
    (final["atm_user"] == True) & (final["spaces_user"] == True), "incentive"
] = "Both ATM and Spaces"
final.loc[
    (final["atm_user"] == True) & (final["spaces_user"] == False), "incentive"
] = "Only ATM"
final.loc[
    (final["atm_user"] == False) & (final["spaces_user"] == True), "incentive"
] = "Only Spaces"

final.to_csv("final.csv")

In [21]:
final = pd.read_csv("final.csv")

# show distribution of clusters
final = (
    final.groupby(["cluster_str", "cluster_new", "incentive"])["user_id"]
    .agg("nunique")
    .reset_index()
)
final["perc"] = (
    100 * final["user_id"] / final.groupby(["incentive"])["user_id"].transform("sum")
)

alt.Chart(final).mark_bar().encode(
    x=alt.X(
        "cluster_str:N",
        axis=alt.Axis(title="Cluster"),
        sort=alt.EncodingSortField(
            field="cluster_new", op="average", order="ascending"
        ),
    ),
    y=alt.Y("perc:Q", axis=alt.Axis(title="% of group")),
    color="cluster_str:N",
).properties(width=300, height=300, title="% of user in cluster").facet(
    facet="incentive", columns=2
)

  interactivity=interactivity, compiler=compiler, result=result)


In [23]:
final.loc[final["cluster_new"] <= 3, :].head(30)

Unnamed: 0,cluster_str,cluster_new,incentive,user_id,perc
20,Primary account,1,Both ATM and Spaces,238,44.402985
21,Primary account,1,No incentive,759,12.658439
22,Primary account,1,Only ATM,324,29.562044
23,Primary account,1,Only Spaces,296,20.598469
27,Secondary spender,3,Both ATM and Spaces,101,18.843284
28,Secondary spender,3,No incentive,2235,37.27485
29,Secondary spender,3,Only ATM,373,34.032847
30,Secondary spender,3,Only Spaces,350,24.356298
31,Spaces power user,2,Both ATM and Spaces,139,25.932836
32,Spaces power user,2,No incentive,276,4.603069


In [23]:
final = pd.read_csv("final.csv")

final["ucm_round"] = round(final["ucm_2019"], -1)

final = (
    final.groupby(["incentive", "ucm_round"])["user_id"].agg("nunique").reset_index()
)

final["perc"] = (
    100 * final["user_id"] / final.groupby(["incentive"])["user_id"].transform("sum")
)
final["cum"] = final.groupby(["incentive"])["perc"].cumsum()

alt.Chart(
    final.loc[(final["ucm_round"] < 200) & (final["ucm_round"] > -100), :]
).mark_line().encode(
    x=alt.X("ucm_round:Q", axis=alt.Axis(title="UCM total 2019")),
    y=alt.Y("cum:Q", axis=alt.Axis(title="Percentile")),
    color="incentive:N",
).properties(
    width=400, height=400, title="2019 PNL by user"
)

In [21]:
final = pd.read_csv("final.csv")

final["ucm_round"] = round(final["ucm_2019"], -1)

final = final.groupby(["incentive", "market"])["ucm_2019"].agg("mean").reset_index()

alt.Chart(final.loc[final["market"] != "GBR", :]).mark_bar().encode(
    x=alt.X("incentive:N", axis=alt.Axis(title="Group")),
    y=alt.Y("ucm_2019:Q", axis=alt.Axis(title="Average contribution margin (2019)")),
    color="incentive",
).properties(
    width=200, height=200, title="Average contribution margin per user in 2019"
).facet(
    facet="market:N", columns=3
)

In [13]:
final = pd.read_csv("final.csv")
final["market"]

0       FRA
1       AUT
2       RoE
3       DEU
4       DEU
       ... 
9051    AUT
9052    FRA
9053    DEU
9054    FRA
9055    AUT
Name: market, Length: 9056, dtype: object

In [29]:
final = pd.read_csv("final.csv")
final.shape

(8870, 156)

In [30]:
df = pd.read_csv("tier_positioning.csv")

df.shape

(20000, 150)

### Matching James Crease user-ids to mine. 

In [3]:
df = pd.read_csv("tier_positioning_recenttx.csv").fillna(0)

df["atm_u"] = False
df.loc[df["n_atm"] > 2, "atm_u"] = True

df["total"] = df["n_ecomm"] + df["n_physical"]

df["n_spaces"] = df.groupby(["user_id"])["n_spaces"].transform("max")
df["atm_user"] = df.groupby(["user_id"])["atm_u"].transform("max")
df["first_month"] = df.groupby(["user_id"])["start_time"].transform("min")

# keep only users that were there for the entire year of 2019.
df = df.loc[
    pd.to_datetime(df["kyc_first_completed"]) <= pd.to_datetime("2019-01-01"), :
]

# code up incentive structure
df["incentive_basic"] = "No incentive"
df.loc[
    (df["atm_user"] == True) & (df["n_spaces"] == 1), "incentive_basic"
] = "Both ATM and Spaces (1)"
df.loc[
    (df["atm_user"] == True) & (df["n_spaces"] > 1), "incentive_basic"
] = "Both ATM and Spaces (>1)"

df.loc[(df["atm_user"] == True) & (df["n_spaces"] == 0), "incentive_basic"] = "Only ATM"
df.loc[
    (df["atm_user"] == False) & (df["n_spaces"] == 1), "incentive_basic"
] = "Only Spaces (1)"
df.loc[
    (df["atm_user"] == False) & (df["n_spaces"] > 1), "incentive_basic"
] = "Only Spaces (>1)"

# go to unique level on user-id
df["rn"] = df.groupby(["user_id"]).cumcount()
df = df.loc[df["rn"] == 0, :]

# crease ids
crease = pd.read_csv("Segmentation Participants IDs - ID Mapping.csv")

crease["Corrected"] = crease["Corrected"].str.strip()

final = df.loc[:, ["user_id", "incentive_basic"]].merge(
    crease, left_on="user_id", right_on="Corrected", how="inner"
)

final.to_csv("tier_positioning_survey_19082020.csv")
print(final.shape)

(638, 9)


In [12]:
# crease ids
crease = pd.read_csv("Segmentation Participants IDs - ID Mapping.csv")

final = df.loc[:, ["user_id", "incentive_basic"]].merge(
    crease, left_on="user_id", right_on="Corrected", how="inner"
)

final.to_csv("tier_positioning_survey_19082020.csv")