title: Paid social - Collected membership revenues per country and ads    
author: Brieuc Van Thienen     
date: 2020-10-12      
region: EU     
summary: We before could not analyze the performance of our paid premium marketing campaigns effectively because we would account for booked revenues and write-offs. With our new memberships payment model, we can now measure which fees were effectively paid. The objectives of this research are multiple: (1) Find out which countries have the highest (realized) revenue to spend ratio? (Once we will have predicted LTV, this will be replaced by the LTV to CAC ratio). (2) Is there a negative correlation between our premium spending and the revenues we collect? In other words, is the efficiency of our spending decreasing as we scale campaigns? (3) Which campaigns bring the most revenue / are the most efficient?
tags: marketing, paid social, premium, membership, revenues, profitability, campaign, ads, ad sets


In [11]:
# !pip install altair
# !pip install vega
# !pip install --upgrade notebook  # need jupyter_client >= 4.2 for sys-prefix below
# !jupyter nbextension install --sys-prefix --py vega  # not needed in notebook >= 5.3
# alt.renderers.enable('notebook')

In [29]:
# import pandas as pd
# import numpy as np
# import altair as alt

In [12]:
# !pip install warnings
# import warnings
# warnings.filterwarnings('ignore')

# Introduction
--- 

**We before could not analyze the performance of our paid premium marketing campaigns** effectively because we would account for booked revenues and write-offs:

- Booked revenues would be similar for all premium users (of a premium product), at least until they churn. We would hence need to wait several months to know which users churned or not.
- Account closures (moment at which write-offs are accounted for) didn't happen continuously. Hence, you could never know if all inactive users of a past cohort had effectively churned.

**With our new memberships payment model, we can now measure which fees were effectively paid.** The objectives of this research are multiple: 

1. Find out which countries have the highest (realized) revenue to spend ratio? (Once we will have predicted LTV, this will be replaced by the LTV to CAC ratio).


2. Is there a negative correlation between our premium spending and the revenues we collect? In other words, is the efficiency of our spending decreasing as we scale campaigns?


3. Which campaigns bring the most revenue / are the most efficient?


**In addition, the aim of this notebook is also to introduce new visualisations for campaign performance measurement**. Apart from smarter country / product budget allocations, not many actionable insights can be derived from the findings below. Having said that, this should increase once:

1. Once we have the Marketing hierarchy framework: instead of visualising each and every ad, we could group campaigns by labels and characteristics, to find patterns.

2. LTV prediction will be improved: instead of visualising actuals, we will be able to plot predictions, and work with much more actionable data on campaigns that are currently running.

3. Once we have data-driven attribution in place: revenues from a user can be more fairly attributed to a awareness channels.




# Proposed methodology
---

1. Premium campaigns on paid social do not just bring premium users, and vice-versa, so we here looked at metrics holistically, on a country level.


2.  **Signup cohort is from -1.5 years to -0.5 year.** We acknowledge that the campaign insights are possibly not actionable due to the period we're looking at. 


3. We look at the **collected membership revenues in the first 6 months after KYC** ("revenues" hereafter). Other revenue sources are disregarded.


4. We join the paid social spend and the paid social signups on weeks and ad ids. The **revenues attributable to paid social campaigns are possibly underestimated** due to last-click attribution.




# Membership revenues at month 6 - Query
---

In [26]:
# query4 = """


#     with marketing_spend as (

#         select
#             date_trunc('week',date)::date as date,
#             case
#                 when country in ('DEU','AUT','FRA','ITA','ESP','GBR') then trim(country)
#                 when country in ('NLD','IRL','GRC','PRT','BEL','LUX','FIN','EST','LVA','LTU','SVK','SVN') then 'GrE'
#                 else 'Other'
#             end as country,
#             trim(ad_id) as ad_id,
#             trim(dbt.zrh_facebook_ids.ad_name) as ad_name,
#             trim(sm.ad_set_name) as ad_set_name,
#             sum(cost_with_vat) as cost_with_vat
#         from
#             dbt.stg_facebook a
#         left join
#             dbt.zrh_facebook_ids using (ad_id)
#         left join
#             dwh_mktg_smartly_ids sm on a.ad_id = sm.ad_facebook_id
#         where
#             channel = 'paid_social'
#             and date between add_months(current_date, -18)::date and add_months(current_date, -7)::date
#             and country !=  'GBR'
#             and country in ('DEU','AUT','FRA','ITA','ESP','GBR','NLD','IRL','GRC','PRT','BEL','LUX','FIN','EST','LVA','LTU','SVK','SVN','POL', 'NOR', 'SWE','DNK', 'ISL', 'LIE', 'CHE')
#         group by 1,2,3,4,5

#     ), revenues as (

#         select
#             date_trunc('week',u.user_created)::date as date,
#             trim(sm.ad_facebook_id) as ad_id,
#             sum(value) as value_6m
#         from
#             (select user_created, kyc_first_completed, country_tnc_legal from dbt.zrh_users) u
#         inner join
#             dbt.stg_conversions_paid_social ps using (user_created)
#         inner join
#             dwh_mktg_smartly_ids sm on ps.mkt_term = sm.ad_smartly_id
#         inner join
#             dbt.ucm_paid_fees_memberships ucm on u.user_created = ucm.user_created
#                 and to_date(month,'YYYY-MM') <= add_months(date_trunc('month',kyc_first_completed), 6)
#         where
#             kyc_first_completed is not null
#             and u.user_created between add_months(current_date, -18)::date and add_months(current_date, -7)::date
#         group by 1,2

#     )

#     select
#         ms.date,
#         ms.country,
#         ms.ad_set_name,
#         ms.ad_name,
#         ms.cost_with_vat,
#         rev.value_6m
#     from
#         marketing_spend ms
#     left join
#         revenues rev using (date, ad_id)

# """

In [21]:
# File in local directory, removed before pushing to git but query is referenced above.
df_original = pd.read_csv("20201012_Paid_Social_Membership_Revenues.csv")

In [23]:
df_original["value_6m"] = df_original["value_6m"] / 100

In [24]:
df4 = df_original.copy()
df4["country"] = df4["country"].str.strip()

# 1. Which countries have the highest revenue to spend ratio?
---

In [25]:
# By country

dfc0 = df4.groupby(["country"])["cost_with_vat", "value_6m"].sum().reset_index()
dfc0["rev_to_spend"] = dfc0["value_6m"] / dfc0["cost_with_vat"]

dfc0["rev_to_spend"] = dfc0["rev_to_spend"].round(decimals=3)
dfc0 = dfc0.sort_values(by=["rev_to_spend"], ascending=False).reset_index(drop=True)

**Findings**

France, Austria and Germany (in that order) have the highest ratios (approx. 0.085, 0.076, 0.065), meaning that in France we get ~0.085 euros in membership revenues (in the first 6 months) for every 1 euro spent on paid social (on all campaigns).

On average, the revenue to spend ratio appears relatively small (again, part of that is likely due to last-click attribution). The spend to revenue ratio basically depends on (1) how aggressively we advertise premium (2) the ensuing cross-sell rates and (3) how many of the fees are actually paid. We should aim at improving one of these 3 factors, should we want to improve the revenue to spend ratio.


In [30]:
chart = (
    alt.Chart(dfc0[["country", "cost_with_vat", "value_6m", "rev_to_spend"]])
    .mark_bar()
    .encode(
        x=alt.X(
            "country:N", axis=alt.Axis(title="Total spend over the period, in euros")
        ),
        y=alt.Y(
            "rev_to_spend:Q",
            axis=alt.Axis(title="Revenue to spend ratio"),
            scale=alt.Scale(domain=[0, 0.15]),
        ),
        color="country:N",
    )
    .properties(
        width=300,
        height=300,
        title="Average 6 month membership revenue for every euro spent on Facebook campaigns",
    )
)

text = chart.mark_text(align="center", baseline="bottom", fontSize=13).encode(
    text="rev_to_spend:Q"
)

chart + text

**Findings** 

Despite spending significantly more in France on paid social than in other countries, it's in France that our spend remains most efficient (from a Membership revenues standpoint).


Paid social spend in Austria is quite minimal as it has historically been challenging to scale ads on that channel. Having said that, ads remain efficient. Same goes for Germany. Premium cohorts in Austria and Germany generally activate and retain more than in other European countries. The goal should be to scale premium ads (at least as a % of total paid social spend in those countries), by allocating more budget or increasing CACs. Also, it should be noted that since we can see if the first fee is paid, we could set up tests without having to wait for months to see the results.

Also, it is worth noting that GrE significantly outperforms Spain and Italy (~0.055 vs. ~0.03), with a similar level of spending.

In [31]:
alt.Chart(
    dfc0[["country", "cost_with_vat", "value_6m", "rev_to_spend"]]
).mark_point().encode(
    x=alt.X(
        "cost_with_vat:Q", axis=alt.Axis(title="Total spend over the period, in euros")
    ),
    y=alt.Y(
        "rev_to_spend:Q",
        axis=alt.Axis(title="Revenue to spend ratio"),
        scale=alt.Scale(domain=[0, 0.15]),
    ),
    color="country:N",
).properties(
    width=300,
    height=300,
    title="Revenue to spend ratio vs. total spend by country",
)

**Methodology**

We here plot revenue to spend ratios (efficiency) vs. total profits (scale). Those visualisations can be insightful to know which countries or ads (see part 3) should be scaled up or down.

1. Top-right: spend is efficient and help achieve profits at scale! It means that we are likely spending a decent amount and that the campaign is efficient! Great! 


2. Bottom-left: spend is neither efficient nor bringing profits. 


3. Top-left: spend is efficient but bringing relatively lower profits. Those should be priorities to 


4. Bottom-right: ads are bringing in profits but aren't the most efficient. Maybe scale down spend a little?

In [32]:
alt.Chart(
    dfc0[["country", "cost_with_vat", "value_6m", "rev_to_spend"]]
).mark_point().encode(
    x=alt.X(
        "value_6m:Q",
        axis=alt.Axis(title="Total 6 month Membership revenues over the period"),
    ),
    y=alt.Y(
        "rev_to_spend:Q",
        axis=alt.Axis(title="Revenue to spend ratio"),
        scale=alt.Scale(domain=[0, 0.15]),
    ),
    color="country:N",
).properties(
    width=300,
    height=300,
    title="Revenue to spend ratio vs. 6 month membership revenues by country",
)

# 2. In each country, does efficiency decrease with spend?
---

In [33]:
dfc = df4.groupby(["country", "date"])["cost_with_vat", "value_6m"].sum().reset_index()
dfc["rev_to_spend"] = dfc["value_6m"] / dfc["cost_with_vat"]

**Methodology**

Instead of aggregating the results by country for the whole period (as done above), we here look at the efficiency of our campaigns by country, on a weekly basis. The goal is to see if the marginal amount of revenues we earn decreases with the level of spend.


**Findings**

The graph below seems to confirm our initial findings, with France, Austria and Germany ranking on top.

In [34]:
alt.Chart(
    dfc[["country", "cost_with_vat", "value_6m", "rev_to_spend"]]
).mark_boxplot().encode(
    x=alt.X("country:N", axis=alt.Axis(title="Weekly campaign spend, in euros")),
    y=alt.Y(
        "rev_to_spend:Q",
        axis=alt.Axis(title="Revenue to spend ratio"),
        scale=alt.Scale(domain=[0, 0.3]),
    ),
    color="country:N",
).properties(
    width=300,
    height=300,
    title="Distribution of revenue to spend ratio by country",
)

**Findings**

We below plot spend vs. membership revenues (1) side by side, to have the same scale for all countries and (2) separately to be able to fit curves and observe the trend.

Again, France, Germany, and GrE rank better than other countries even as we scale the spend. Austria is not comparable due to the minimal level of spending there. More specifically, in Germany and to a lower extent France and GrE, the variability of revenues only increases slightly as we scale marketing spend, suggesting that we could test increases in budgets or CACs on premium campaigns.

In Italy and Spain, revenues seem to remain steady / decrease as the level of spending increases, suggesting that (1) we spend relatively less on premium ads when going for growth and / or (2) that cross-sell and / or retention rates tend to decrease with the total level of spending.



In [35]:
alt.Chart(dfc[["country", "cost_with_vat", "value_6m"]]).mark_point().encode(
    x=alt.X("cost_with_vat:Q", axis=alt.Axis(title="Weekly campaign spend in euros")),
    y=alt.Y("value_6m:Q", axis=alt.Axis(title="6 month Membership revenues, in euros")),
    color="country:N",
).properties(
    width=200,
    height=200,
    title="Correlation between weekly spend and membership revenues",
).facet(
    facet="country:N",
    columns=3,
).resolve_axis(
    x="independent",
    y="independent",
)

In [36]:
chart = (
    alt.Chart(
        dfc[["country", "cost_with_vat", "value_6m"]].loc[dfc["country"] == "AUT"]
    )
    .mark_point()
    .encode(
        x=alt.X(
            "cost_with_vat:Q", axis=alt.Axis(title="Weekly campaign spend in euros")
        ),
        y=alt.Y(
            "value_6m:Q", axis=alt.Axis(title="6 month Membership revenues, in euros")
        ),
        color="country:N",
    )
    .properties(
        width=300,
        height=300,
        title="Correlation between weekly spend and membership revenues",
    )
)

chart + chart.transform_regression(
    "cost_with_vat", "value_6m", method="poly", groupby=["country"]
).mark_line(size=4)

In [37]:
chart = (
    alt.Chart(
        dfc[["country", "cost_with_vat", "value_6m"]].loc[dfc["country"] == "DEU"]
    )
    .mark_point()
    .encode(
        x=alt.X(
            "cost_with_vat:Q", axis=alt.Axis(title="Weekly campaign spend in euros")
        ),
        y=alt.Y(
            "value_6m:Q", axis=alt.Axis(title="6 month Membership revenues, in euros")
        ),
        color="country:N",
    )
    .properties(
        width=300,
        height=300,
        title="Correlation between weekly spend and membership revenues",
    )
)
chart + chart.transform_regression(
    "cost_with_vat", "value_6m", method="poly", groupby=["country"]
).mark_line(size=4)

In [38]:
chart = (
    alt.Chart(
        dfc[["country", "cost_with_vat", "value_6m"]].loc[dfc["country"] == "ESP"]
    )
    .mark_point()
    .encode(
        x=alt.X(
            "cost_with_vat:Q", axis=alt.Axis(title="Weekly campaign spend in euros")
        ),
        y=alt.Y(
            "value_6m:Q", axis=alt.Axis(title="6 month Membership revenues, in euros")
        ),
        color="country:N",
    )
    .properties(
        width=300,
        height=300,
        title="Correlation between weekly spend and membership revenues",
    )
)
chart + chart.transform_regression(
    "cost_with_vat", "value_6m", method="poly", groupby=["country"]
).mark_line(size=4)

In [39]:
chart = (
    alt.Chart(
        dfc[["country", "cost_with_vat", "value_6m"]].loc[dfc["country"] == "FRA"]
    )
    .mark_point()
    .encode(
        x=alt.X(
            "cost_with_vat:Q", axis=alt.Axis(title="Weekly campaign spend in euros")
        ),
        y=alt.Y(
            "value_6m:Q", axis=alt.Axis(title="6 month Membership revenues, in euros")
        ),
        color="country:N",
    )
    .properties(
        width=300,
        height=300,
        title="Correlation between weekly spend and membership revenues",
    )
)
chart + chart.transform_regression(
    "cost_with_vat", "value_6m", method="poly", groupby=["country"]
).mark_line(size=4)

In [40]:
chart = (
    alt.Chart(
        dfc[["country", "cost_with_vat", "value_6m"]].loc[dfc["country"] == "GrE"]
    )
    .mark_point()
    .encode(
        x=alt.X(
            "cost_with_vat:Q", axis=alt.Axis(title="Weekly campaign spend in euros")
        ),
        y=alt.Y(
            "value_6m:Q", axis=alt.Axis(title="6 month Membership revenues, in euros")
        ),
        color="country:N",
    )
    .properties(
        width=300,
        height=300,
        title="Correlation between weekly spend and membership revenues",
    )
)
chart + chart.transform_regression(
    "cost_with_vat", "value_6m", method="poly", groupby=["country"]
).mark_line(size=4)

In [41]:
chart = (
    alt.Chart(
        dfc[["country", "cost_with_vat", "value_6m"]].loc[dfc["country"] == "ITA"]
    )
    .mark_point()
    .encode(
        x=alt.X(
            "cost_with_vat:Q", axis=alt.Axis(title="Weekly campaign spend in euros")
        ),
        y=alt.Y(
            "value_6m:Q", axis=alt.Axis(title="6 month Membership revenues, in euros")
        ),
        color="country:N",
    )
    .properties(
        width=300,
        height=300,
        title="Correlation between weekly spend and membership revenues",
    )
)
chart + chart.transform_regression(
    "cost_with_vat", "value_6m", method="poly", groupby=["country"]
).mark_line(size=4)

In [42]:
chart = (
    alt.Chart(
        dfc[["country", "cost_with_vat", "value_6m"]].loc[dfc["country"] == "Other"]
    )
    .mark_point()
    .encode(
        x=alt.X(
            "cost_with_vat:Q", axis=alt.Axis(title="Weekly campaign spend in euros")
        ),
        y=alt.Y(
            "value_6m:Q", axis=alt.Axis(title="6 month Membership revenues, in euros")
        ),
        color="country:N",
    )
    .properties(
        width=300,
        height=300,
        title="Correlation between weekly spend and membership revenues",
    )
)
chart + chart.transform_regression(
    "cost_with_vat", "value_6m", method="poly", groupby=["country"]
).mark_line(size=4)

# 3. Which campaigns bring the most revenue / are the most efficient?
---

**Methodology**

We proceed to the same analysis but this time on an ad set level, over the whole time period, and not broken down by week. We also filter on ad set names that contain either black, metal or premium, for better readability.

In [43]:
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

In [44]:
dfads = (
    df4.groupby(["country", "ad_set_name"])["cost_with_vat", "value_6m"]
    .sum()
    .reset_index()
)
dfads["rev_to_spend"] = dfads["value_6m"] / dfads["cost_with_vat"]

In [45]:
dfads["ad_set_name_2"] = dfads["ad_set_name"].apply(
    lambda x: x
    if any(product in x.lower() for product in ["black", "metal", "premium"])
    else "NaN"
)
dfads = dfads.loc[~dfads["ad_set_name_2"].isin(["NaN"])].reset_index(drop=True)
dfads = dfads.loc[(dfads["cost_with_vat"] > 5000)]
dfads = dfads.sort_values(by=["cost_with_vat"]).reset_index(drop=True)

In [46]:
alt.Chart(dfads.loc[dfads["country"] == "DEU"]).mark_point().encode(
    x=alt.X("value_6m:Q", axis=alt.Axis(title="Total revenues, in euros")),
    y=alt.Y(
        "rev_to_spend:Q",
        axis=alt.Axis(title="Revenue to spend ratio"),
        scale=alt.Scale(domain=[0, 0.3]),
    ),
    color="ad_set_name:N",
).properties(
    width=300,
    height=300,
    title="6 month membership revenues vs. rev to spend ratio in Germany",
).configure_legend(
    labelLimit=0
)

In [47]:
alt.Chart(dfads.loc[dfads["country"] == "FRA"]).mark_point().encode(
    x=alt.X("value_6m:Q", axis=alt.Axis(title="Total revenues, in euros")),
    y=alt.Y(
        "rev_to_spend:Q",
        axis=alt.Axis(title="Revenue to spend ratio"),
        scale=alt.Scale(domain=[0, 0.3]),
    ),
    color="ad_set_name:N",
).properties(
    width=300,
    height=300,
    title="6 month membership revenues vs. rev to spend ratio in France",
).configure_legend(
    labelLimit=0
)

In [48]:
alt.Chart(dfads.loc[dfads["country"] == "ITA"]).mark_point().encode(
    x=alt.X("value_6m:Q", axis=alt.Axis(title="Total revenues, in euros")),
    y=alt.Y(
        "rev_to_spend:Q",
        axis=alt.Axis(title="Revenue to spend ratio"),
        scale=alt.Scale(domain=[0, 0.3]),
    ),
    color="ad_set_name:N",
).properties(
    width=300,
    height=300,
    title="6 month membership revenues vs. rev to spend ratio in Italy",
).configure_legend(
    labelLimit=0
)

In [49]:
alt.Chart(dfads.loc[dfads["country"] == "ESP"]).mark_point().encode(
    x=alt.X("value_6m:Q", axis=alt.Axis(title="Total revenues, in euros")),
    y=alt.Y(
        "rev_to_spend:Q",
        axis=alt.Axis(title="Revenue to spend ratio"),
        scale=alt.Scale(domain=[0, 0.3]),
    ),
    color="ad_set_name:N",
).properties(
    width=300,
    height=300,
    title="6 month membership revenues vs. rev to spend ratio in Spain",
).configure_legend(
    labelLimit=0
)

In [50]:
alt.Chart(
    dfads.loc[(dfads["country"] == "GrE") & (dfads["rev_to_spend"] < 1)]
).mark_point().encode(
    x=alt.X("value_6m:Q", axis=alt.Axis(title="Total revenues, in euros")),
    y=alt.Y(
        "rev_to_spend:Q",
        axis=alt.Axis(title="Revenue to spend ratio"),
        scale=alt.Scale(domain=[0, 0.3]),
    ),
    color="ad_set_name:N",
).properties(
    width=300,
    height=300,
    title="6 month membership revenues vs. rev to spend ratio in Greater Europe",
).configure_legend(
    labelLimit=0
)

In [51]:
alt.Chart(dfads.loc[dfads["country"] == "Other"]).mark_point().encode(
    x=alt.X("value_6m:Q", axis=alt.Axis(title="Total revenues, in euros")),
    y=alt.Y(
        "rev_to_spend:Q",
        axis=alt.Axis(title="Revenue to spend ratio"),
        scale=alt.Scale(domain=[0, 0.3]),
    ),
    color="ad_set_name:N",
).properties(
    width=300,
    height=300,
    title="6 month membership revenues vs. rev to spend ratio in Non-Euro countries",
).configure_legend(
    labelLimit=0
)