title: Membership subscription payments - deep dive   
author: Fabio Schmidt-Fischbach   
date: 2020-09-09   
region: EU   
summary: Hypothetically, we should collect 12 fees from each premium user. The problem is, it's not that simple. Not all users are charged 12 times: employees, discounts, or dunning. Not all charges are paid. Goal is to explore repayment dynamics of our premium users. Measure premium health with actual subscription payments rather than dunning process. Provide reliable way to estimate how much revenue we should expect from premium users e.g. Metal users that joined a year ago on avg. paid 8 Euros per month (rather than 17 Euro). 50-60% of new premium users fall into dunning on the initial payment. Recovery rates for users that at some point are much higher than for users that fall into dunning on first payment.
link: https://docs.google.com/presentation/d/1Ub_jahc-3BRZJAjp9eqKRlKjOFAOdkjytSG0Ogz7u70/edit?usp=sharing   
tags: memberships, finance, premium, metal, you, dunning, arrears   

In [None]:
import pandas as pd
import altair as alt
import os
import io
from google.colab import files
import numpy as np

In [3]:
# Code to read csv file into Colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

## Evaluate zrh_subscription_payments

In [None]:
query = """
select *
from dbt.zrh_subscription_payments
where subscription_valid_from >= '2019-01-01'
"""

In [4]:
link = (
    "https://drive.google.com/file/d/1vbpMqHcP-ggUiKS_sKI3Q4-LvnOn3zVy/view?usp=sharing"
)
id = "1vbpMqHcP-ggUiKS_sKI3Q4-LvnOn3zVy"

downloaded = drive.CreateFile({"id": id})
downloaded.GetContentFile("Filename.csv")
df = pd.read_csv("Filename.csv")

## Model QA

1. What % of users are charged 
  - market 
  - product
  - entry flow 
  - month of sub
  - depending on how they exited 

2. What % of users fall into arrears 
  - market 
  - product
  - entry flow 
  - month of sub 
3. What % of arrears tx are recovered? 
  - .. all these dimensions. 
  - how long does it take? e.g. % recovery probability after not having recovered after t days. 



## Business metrics : 

1. Expected monthly recurring revenue 
  - show distributions across various dimensions.

  

### 1. What % of users are charged 

In a world where we have 100% retention, we should be charging 100% of the 12 monthly charges to our customers. 

How close to 100% do we get? 



In [25]:
data = df

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]

data["month"] = pd.to_datetime(data["subscription_valid_from"]).dt.to_period("M")

# drop all subscription pamyents that would not be running anymore.
data = data.loc[data["subscription_valid_from"] < "2019-09-01", :]

data = (
    data.groupby(["product_id", "month", "enter_reason"])["charged"]
    .agg("mean")
    .reset_index()
)

data["month"] = data["month"].astype(str)

alt.Chart(
    data.loc[
        (data["enter_reason"] != "DOWNGRADED") & (data["enter_reason"] != "RENEWED"), :
    ]
).mark_line().encode(
    x=alt.X("month:N", axis=alt.Axis(title="Month of sub start")),
    y=alt.Y("charged:Q", axis=alt.Axis(format="%", title="% subs charged")),
    color="product_id:N",
).properties(
    width=300, height=300, title="% of 12 months charged"
).facet(
    facet="enter_reason:N", columns=3
)

##note that renewal for metal dont make too mcuh sense prior June 2019 --> we launched in June 2018.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


### Why are not all users charged? 

Most likely this is because of churn. The next graph visualizes the % of payments we charged to customers while they still had the product e.g. dropping all payment numbers that happened after they were dropped by dunning/AML.

What explains the delta between 100% and the realized values now? 
- we did not charge fees from customers that should be charged
- user had a discount 
- user was an employee 
- problem in our code



In [23]:
data = df

# drop all subscription pamyents that would not be running anymore.
data = data.loc[data["subscription_valid_from"] < "2019-09-01", :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]
data["month"] = pd.to_datetime(data["subscription_valid_from"]).dt.to_period("M")

data["age_churn"] = (
    pd.to_datetime(data["subscription_valid_until"])
    - pd.to_datetime(data["subscription_valid_from"])
) / np.timedelta64(1, "M")
data["age_churn"] = data["age_churn"].round()

# keep only payment_no that were still relevant for user -_> e.g. drop payment_no that would have happened after churn.
data = data.loc[data["age_churn"] >= data["payment_no"], :]

data = (
    data.groupby(["product_id", "month", "enter_reason"])["charged"]
    .agg("mean")
    .reset_index()
)

data["month"] = data["month"].astype(str)

alt.Chart(
    data.loc[
        (data["enter_reason"] != "DOWNGRADED") & (data["enter_reason"] != "RENEWED"), :
    ]
).mark_line().encode(
    x=alt.X("month:N", axis=alt.Axis(title="Month of sub start")),
    y=alt.Y("charged:Q", axis=alt.Axis(format="%", title="% subs charged")),
    color="product_id:N",
).properties(
    width=300, height=300, title="% of 12 months charged"
).facet(
    facet="enter_reason:N", columns=3
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


Let's drop discounts + employees explicitly. 


In [22]:
data = df

# drop all discount_flg and overwrite_flg = True

data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# drop all subscription pamyents that would not be running anymore.
data = data.loc[data["subscription_valid_from"] < "2019-09-01", :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]
data["month"] = pd.to_datetime(data["subscription_valid_from"]).dt.to_period("M")

data["age_churn"] = (
    pd.to_datetime(data["subscription_valid_until"])
    - pd.to_datetime(data["subscription_valid_from"])
) / np.timedelta64(1, "M")
data["age_churn"] = data["age_churn"].round()

# keep only payment_no that were still relevant for user -_> e.g. drop payment_no that would have happened after churn.
data = data.loc[data["age_churn"] >= data["payment_no"], :]

data = (
    data.groupby(["product_id", "month", "enter_reason"])["charged"]
    .agg("mean")
    .reset_index()
)

data["month"] = data["month"].astype(str)

alt.Chart(
    data.loc[
        (data["enter_reason"] != "DOWNGRADED") & (data["enter_reason"] != "RENEWED"), :
    ]
).mark_line().encode(
    x=alt.X("month:N", axis=alt.Axis(title="Month of sub start")),
    y=alt.Y("charged:Q", axis=alt.Axis(format="%", title="% subs charged")),
    color="product_id:N",
).properties(
    width=300, height=300, title="% of 12 months charged"
).facet(
    facet="enter_reason:N", columns=3
)

#### Do we find charges AFTER the subscription already should have ended? 

In [18]:
data = df

# drop all discount_flg and overwrite_flg = True
data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# drop all subscription pamyents that would not be running anymore.
data = data.loc[data["subscription_valid_from"] < "2019-09-01", :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]
data["month"] = pd.to_datetime(data["subscription_valid_from"]).dt.to_period("M")

data["age_churn"] = (
    pd.to_datetime(data["subscription_valid_until"])
    - pd.to_datetime(data["subscription_valid_from"])
) / np.timedelta64(1, "M")
data["age_churn"] = data["age_churn"].round()

# scheduled date
data["days"] = data["payment_no"].astype(int) * 30
data["scheduled_date"] = (
    pd.to_datetime(data["subscription_valid_from"])
    + pd.to_timedelta(data["days"], unit="D")
).dt.date

# difference between scheduled data and sub_end
data["diff"] = (
    pd.to_datetime(data["scheduled_date"])
    - pd.to_datetime(data["subscription_valid_until"])
) / np.timedelta64(1, "M")
data["diff"] = data["diff"].round()

data = data.groupby(["product_id", "diff"])["charged"].agg("mean").reset_index()

alt.Chart(data.loc[abs(data["diff"]) <= 3, :]).mark_line().encode(
    x=alt.X("diff:N", axis=alt.Axis(title="Months to subscription end")),
    y=alt.Y("charged:Q", axis=alt.Axis(format="%", title="% subs charged")),
    color="product_id:N",
).properties(width=500, height=500, title="% of 12 months charged")

# Summary: % charged. 

It looks like our model captures all charges by users that should be charged. We reach 100% once we drop payments that were scheduled after the account was already downgraded + drop users with discounts / employees.





## % of charges that are paid

In [9]:
data = df

# drop all discount_flg and overwrite_flg = True
data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]

# keep only payments that were charged
data = data.loc[data["charged"] == True, :]

# restrict on relevant cohorts
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) < pd.to_datetime("2020-08-01"), :
]
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) >= pd.to_datetime("2019-01-01"), :
]

# convert to monthly cohorts
data["month"] = pd.to_datetime(data["subscription_valid_from"]).dt.to_period("M")

# drop downgraders
data = data.loc[data["enter_reason"] != "DOWNGRADED", :]

data = (
    data.groupby(["month", "product_id", "enter_reason"])["paid"]
    .agg("mean")
    .reset_index()
)

data["month"] = data["month"].astype(str)

alt.Chart(data).mark_line().encode(
    x=alt.X("month:N", axis=alt.Axis(title="Monthly cohort")),
    y=alt.Y(
        "paid:Q", axis=alt.Axis(format="%", title="% of charged fees that are paid")
    ),
    color="product_id:N",
).properties(width=300, height=300, title="% of charged fees that are paid").facet(
    facet="enter_reason:N", columns=3
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [12]:
data = df

# drop all discount_flg and overwrite_flg = True
data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]

data.loc[
    data["market"].isin(["FRA", "DEU", "ITA", "ESP", "AUT"]) == False, "market"
] = "other"

# keep only payments that were charged
data = data.loc[data["charged"] == True, :]

# restrict on relevant cohorts
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) < pd.to_datetime("2020-08-01"), :
]
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) >= pd.to_datetime("2019-01-01"), :
]

# convert to monthly cohorts
data["month"] = pd.to_datetime(data["subscription_valid_from"]).dt.to_period("M")

# drop downgraders
data = data.loc[data["enter_reason"] != "DOWNGRADED", :]

data = (
    data.groupby(["month", "market", "enter_reason"])["paid"].agg("mean").reset_index()
)

data["month"] = data["month"].astype(str)

alt.Chart(data).mark_line().encode(
    x=alt.X("month:N", axis=alt.Axis(title="Monthly cohort")),
    y=alt.Y(
        "paid:Q", axis=alt.Axis(format="%", title="% of charged fees that are paid")
    ),
    color="market:N",
).properties(width=300, height=300, title="% of charged fees that are paid").facet(
    facet="enter_reason:N", columns=3
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [None]:
data = df

# drop all discount_flg and overwrite_flg = True
data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]

data.loc[
    data["market"].isin(["FRA", "DEU", "ITA", "ESP", "AUT"]) == False, "market"
] = "other"

# keep only payments that were charged
data = data.loc[data["charged"] == True, :]

# restrict on relevant cohorts
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) < pd.to_datetime("2020-08-01"), :
]
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) >= pd.to_datetime("2019-01-01"), :
]

# convert to monthly cohorts
data["month"] = pd.to_datetime(data["subscription_valid_from"]).dt.to_period("M")

# drop downgraders
data = data.loc[data["enter_reason"] != "DOWNGRADED", :]

data = (
    data.groupby(["payment_no", "market", "enter_reason"])["arrears"]
    .agg("mean")
    .reset_index()
)

alt.Chart(data).mark_line().encode(
    x=alt.X("payment_no:N", axis=alt.Axis(title="Payment #")),
    y=alt.Y(
        "arrears:Q",
        axis=alt.Axis(format="%", title="% of charged fees that were in arrears"),
    ),
    color="market:N",
).properties(width=300, height=300, title="% of charged fees that are paid").facet(
    facet="enter_reason:N", columns=3
)

In [17]:
data = df

# drop all discount_flg and overwrite_flg = True
data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]

data.loc[
    data["market"].isin(["FRA", "DEU", "ITA", "ESP", "AUT"]) == False, "market"
] = "other"

# keep only payments that were charged
data = data.loc[data["charged"] == True, :]

# restrict on relevant cohorts
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) < pd.to_datetime("2020-08-01"), :
]
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) >= pd.to_datetime("2019-01-01"), :
]

# convert to monthly cohorts
data["month"] = pd.to_datetime(data["subscription_valid_from"]).dt.to_period("M")

# drop downgraders
data = data.loc[data["enter_reason"] != "DOWNGRADED", :]

data = (
    data.groupby(["payment_no", "market", "enter_reason"])["arrears"]
    .agg("mean")
    .reset_index()
)

alt.Chart(data).mark_line().encode(
    x=alt.X("payment_no:N", axis=alt.Axis(title="Payment #")),
    y=alt.Y(
        "arrears:Q",
        axis=alt.Axis(format="%", title="% of charged fees that were in arrears"),
    ),
    color="market:N",
).properties(width=300, height=300, title="% of charged fees that are paid").facet(
    facet="enter_reason:N", columns=3
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [39]:
data = df

# drop all discount_flg and overwrite_flg = True
data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]

data.loc[
    data["market"].isin(["FRA", "DEU", "ITA", "ESP", "AUT"]) == False, "market"
] = "other"

# keep only payments that were charged
data = data.loc[data["arrears"] == True, :]

data["first_arrears"] = data.groupby(["product_key"])["payment_no"].transform("min")
data = data.loc[data["first_arrears"] == data["payment_no"], :]
# restrict on relevant cohorts
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) < pd.to_datetime("2020-08-01"), :
]
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) >= pd.to_datetime("2019-01-01"), :
]

# drop downgraders
data = data.loc[data["enter_reason"] != "DOWNGRADED", :]

data = data.groupby(["first_arrears", "enter_reason"])["paid"].agg("mean").reset_index()

alt.Chart(data).mark_line().encode(
    x=alt.X("first_arrears:N", axis=alt.Axis(title="First payment in arrears")),
    y=alt.Y("paid:Q", axis=alt.Axis(format="%", title="% recovered")),
    color="enter_reason:N",
).properties(width=500, height=500, title="% of recovered arrears cases")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [50]:
# time to recover

data = df

# drop all discount_flg and overwrite_flg = True
data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]

data.loc[
    data["market"].isin(["FRA", "DEU", "ITA", "ESP", "AUT"]) == False, "market"
] = "other"

# keep only payments that were in arrears
data = data.loc[data["arrears"] == True, :]
data.loc[data["days_delay"].isna() == True, "days_delay"] = 100000

data["weeks"] = (data["days_delay"] / 7).round()

data = data.groupby(["weeks", "paid"])["product_key"].agg("count").reset_index()

data["perc"] = (
    100 * data["product_key"] / data.groupby(["paid"])["product_key"].transform("sum")
)
data["cum"] = data.groupby(["paid"])["perc"].cumsum()


alt.Chart(data.loc[data["weeks"] < 30, :]).mark_line().encode(
    x=alt.X("weeks:N", axis=alt.Axis(title="Delay in weeks")),
    y=alt.Y("cum:Q", axis=alt.Axis(title="Percentile")),
    color="paid:N",
).properties(width=500, height=500, title="Arrear case length by recovery status")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


In [64]:
# time to recover

data = df

# drop all discount_flg and overwrite_flg = True
data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]

data.loc[
    data["market"].isin(["FRA", "DEU", "ITA", "ESP", "AUT"]) == False, "market"
] = "other"

# keep only payments that were in arrears
data = data.loc[data["arrears"] == True, :]
data = data.loc[data["paid"] == True, :]

data["weeks"] = (data["days_delay"] / 7).round()

data = data.groupby(["weeks", "enter_reason"])["product_key"].agg("count").reset_index()

data["perc"] = (
    100
    * data["product_key"]
    / data.groupby(["enter_reason"])["product_key"].transform("sum")
)
data["cum"] = data.groupby(["enter_reason"])["perc"].cumsum()

data = data.loc[data["enter_reason"] != "DOWNGRADED", :]

alt.Chart(data.loc[data["weeks"] < 30, :]).mark_line().encode(
    x=alt.X("weeks:N", axis=alt.Axis(title="Delay in weeks")),
    y=alt.Y("cum:Q", axis=alt.Axis(title="% of all recovered cases")),
    color="enter_reason:N",
).properties(width=500, height=500, title="Weeks in arrears for recovered cases")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


## MRR 

In [34]:
data = df

# drop all discount_flg and overwrite_flg = True
data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]

data.loc[
    data["market"].isin(["FRA", "DEU", "ITA", "ESP", "AUT"]) == False, "market"
] = "other"

# restrict on relevant cohorts that completed 12 months already
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) < pd.to_datetime("2019-08-01"), :
]

# replace amount cents to 0 if user did not pay
data.loc[(data["paid"] == False), "amount_cents"] = 0

# aggregate on subscription period level
data = (
    data.groupby(["market", "product_id", "product_key", "enter_reason"])[
        "amount_cents"
    ]
    .agg("sum")
    .reset_index()
)

data["amount_euro"] = data["amount_cents"] / 100

data = data.groupby(["market", "product_id"])["amount_euro"].agg("mean").reset_index()

data["amount_euro"] = (data["amount_euro"] / 12).round()

data = data.loc[data["product_id"] != "FLEX_ACCOUNT_MONTHLY", :]
# Configure common options
base = (
    alt.Chart(data)
    .encode(
        alt.X("market:O", scale=alt.Scale(paddingInner=0)),
        alt.Y("product_id:O", scale=alt.Scale(paddingInner=0)),
    )
    .properties(width=300, height=300)
)

# Configure heatmap
heatmap = base.mark_rect().encode(
    color=alt.Color(
        "amount_euro:Q",
        scale=alt.Scale(scheme="viridis"),
        legend=alt.Legend(direction="horizontal"),
    )
)

# Configure text
text = base.mark_text(baseline="middle").encode(text="amount_euro:Q")

# Draw the chart
heatmap + text

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [33]:
data = df

# drop all discount_flg and overwrite_flg = True
data = data.loc[(data["discount_flg"] == False) & (data["override_flg"] == False), :]

# trim market
data["market"] = data["market"].str.strip()

# drop UK
data = data.loc[data["market"] != "GBR", :]

data.loc[
    data["market"].isin(["FRA", "DEU", "ITA", "ESP", "AUT"]) == False, "market"
] = "other"

# restrict on relevant cohorts that completed 12 months already
data = data.loc[
    pd.to_datetime(data["subscription_valid_from"]) < pd.to_datetime("2019-08-01"), :
]

# replace amount cents to 0 if user did not pay
data.loc[(data["paid"] == False), "amount_cents"] = 0

# aggregate on subscription period level
data = (
    data.groupby(["market", "product_id", "product_key", "enter_reason"])[
        "amount_cents"
    ]
    .agg("sum")
    .reset_index()
)

data["amount_euro"] = data["amount_cents"] / 100

data = (
    data.groupby(["market", "product_id", "enter_reason"])["amount_euro"]
    .agg("mean")
    .reset_index()
)

data["amount_euro"] = (data["amount_euro"] / 12).round()

data = data.loc[data["product_id"] != "FLEX_ACCOUNT_MONTHLY", :]


def plot_heat(data, flow):
    data = data.loc[data["enter_reason"] == flow, :]
    # Configure common options
    base = (
        alt.Chart(data)
        .encode(
            alt.X("market:O", scale=alt.Scale(paddingInner=0)),
            alt.Y("product_id:O", scale=alt.Scale(paddingInner=0)),
        )
        .properties(width=200, height=200, title=flow)
    )

    # Configure heatmap
    heatmap = base.mark_rect().encode(
        color=alt.Color(
            "amount_euro:Q",
            scale=alt.Scale(scheme="viridis"),
            legend=alt.Legend(direction="horizontal"),
        )
    )

    # Configure text
    text = base.mark_text(baseline="middle").encode(text="amount_euro:Q")

    # Draw the chart
    flow = heatmap + text
    return flow


signup = plot_heat(data, "SIGNUP")
upgrade = plot_heat(data, "UPGRADED")
renewal = plot_heat(data, "RENEWED")

chart = signup | upgrade | renewal
chart

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


# Summary: % of fees paid in arrears

In [None]:
data_b = df

# trim market
data_b["market"] = data_b["market"].str.strip()

# drop UK
data_b = data_b.loc[data_b["market"] != "GBR", :]

data_b["month"] = pd.to_datetime(data_b["subscription_valid_from"]).dt.to_period("M")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


In [None]:
# drop all subscription pamyents that would not be running anymore.
data_b = data_b.loc[data_b["subscription_valid_from"] < "2019-09-01", :]

data_b = (
    data_b.loc[data_b["paid"]]
    .groupby(["product_id", "month", "payment_no"])["arrears"]
    .agg("means")
    .reset_index()
)
data_b.head(40)

Unnamed: 0,product_id,month,enter_reason,payment_no,arrears
0,BLACK_CARD_MONTHLY,2019-01,DOWNGRADED,1,7
1,BLACK_CARD_MONTHLY,2019-01,DOWNGRADED,2,7
2,BLACK_CARD_MONTHLY,2019-01,DOWNGRADED,3,7
3,BLACK_CARD_MONTHLY,2019-01,DOWNGRADED,4,7
4,BLACK_CARD_MONTHLY,2019-01,DOWNGRADED,5,7


In [None]:
data_b["month"] = data_b["month"].astype(str)

alt.Chart(data_b.loc[data_b["enter_reason"] != "DOWNGRADED"]).mark_line().encode(
    x=alt.X("month:N", axis=alt.Axis(title="Month of sub start")),
    y=alt.Y("charged:Q", axis=alt.Axis(format="%", title="% subs charged")),
    color="product_id:N",
).properties(width=300, height=300, title="% of 12 months charged").facet(
    facet="enter_reason:N", columns=3
)

##note that renewal for metal dont make too mcuh sense prior June 2019 --> we launched in June 2018.