title: Deep dive - UX of new product selection flow
author: Fabio Schmidt-Fischbach   
date: 2020-06-17   
region: EU   
summary: The first step of product selection is business vs standard. Do people even consider our Business products? Yes, 41% of our users explore the Business product at some point. 75% of users that explore Business also end up with it: 25% of them revert and end up with a private account. 70% of users start viewing the standard product as their first choice. 77% of users only view a single tier, only 23% of customers consider more than a single product before making their choice. Overall, conversion of prodcut selection is > 90%. 46% of customers that fail, fail on the first screen.
tags: product selection, deep dive, premium

In [2]:
import pandas as pd
import os
import altair as alt
import numpy as np

In [None]:
query = 

"""
select wc.dvce_type, se_label, se_property, se_action, ec.event_type, ec.collector_tstamp, cmd_users.id , cmd_users.user_created::date
from ksp_event_crab as ec 
inner join ksp_event_types as et using (event_type) 
inner join ksp_event_userid as uid on uid.event_id = ec.event_id and uid.collector_tstamp >= current_date - interval '1 month'
inner join cmd_shadow_user as cmd on cmd.id = uid.user_id 
inner join cmd_users on cmd_users.user_created = cmd.user_created 
inner join ksp_web_crab as wc on wc.event_id = ec.event_id and wc.collector_tstamp >= current_date - interval '1 month'
where se_action in ('account_purpose.step_entered', 'product_selection.step_entered', 'product_selection.failure', 'product_selection.tab_click', 'product_selection.success')
	and se_property = 'product_selection_new_ui' 
	
""" # query to get snowplow data. 

query = 

""" with users as ( 
	select distinct cmd.user_created 
	from ksp_event_crab inner join ksp_event_userid using (event_id) 
	inner join cmd_shadow_user as cmd on cmd.id = ksp_event_userid.user_id 
	where se_property = 'product_selection_new_ui' 
)

select zu.user_id, zu.user_created::date, kyc_first_completed, kyc_first_initiated, country_tnc_legal, coalesce(zup.product_id, zu.product_id) as product, case when fa.user_created is not null then 1 else 0 end as ever_mau 
from dbt.zrh_users as zu 
left join dbt.zrh_user_product as zup on zup.user_created = zu.user_created and enter_reason = 'SIGNUP' 
inner join users as u on u.user_created = zu.user_created 
left join dbt.stg_cohort_first_active as fa on fa.user_created = zu.user_created 
where legal_entity = 'EU' --only EU since IT and ES ask for the occupation even in the normal flow --> in these flows it's impossible to track whether the user entered standard or business flows.
""" # query to get BE data. 


## Summary 


Business vs private
- 41% of users explore the Business section of the product selection funnel (e.g. click on "Yes" on the account purpose screen). 
- 25% of those revert back and end up with a private account.

How do users navigate the tier selection screens? 

- 70% of users start viewing the standard product as their first choice. 
- 77% of users only view a single tier <-> only 23% of customers consider more than a single product before making their choice. 

How is the funnel conversion working? Where are issues? 
- Overall, conversion of prodcut selection is > 90%. 
- 46% of customers that fail, fail on the first screen. 

### Business vs standard : how many users explore the business section? 

The first step of product selection is business vs standard. Do people even consider our Business products? Yes, 41% of our users explore the Business product at some point. 

![](business_selection.png)

In [3]:
# identify users that selected business.
df = pd.read_csv("product_selection_ksp.csv")

df["b"] = False
df.loc[
    df["se_label"].isin(["account-purpose-occupations", "account-purpose-industries"]),
    "b",
] = True

# how many do this?
df = df.groupby(["id"])["b"].agg("max").reset_index()

df = df.groupby(["b"])["id"].agg("nunique").reset_index()

df["perc"] = 100 * df["id"] / df["id"].sum()

alt.Chart(df).mark_bar().encode(
    x=alt.X("b:N", axis=alt.Axis(title="Did the user ever click on business?")),
    y=alt.Y("perc:Q", axis=alt.Axis(title="% of users")),
).properties(width=600, height=500, title="% of users who explore business")

In [7]:
df.head()

Unnamed: 0,b,id,perc
0,False,19896,58.395703
1,True,14175,41.604297


They explore Business, but do they also end up with it? 75% of users that explore Business also end up with it: 25% of them revert and end up with a private account.

In [5]:
# compare to how many in the end chose premium?
# identify users that selected business.
be = pd.read_csv("product_selection_be.csv")

# join this to the front end data
df = pd.read_csv("product_selection_ksp.csv")

df["b"] = False
df.loc[
    df["se_label"].isin(["account-purpose-occupations", "account-purpose-industries"]),
    "b",
] = True
# how many do this?
df = df.groupby(["id"])["b"].agg("max").reset_index()

# join to the backend.
df = df.merge(be, left_on="id", right_on="user_id", how="inner")

df = df.groupby(["b", "product"])["user_id"].agg("nunique").reset_index()

df["total"] = df.groupby("b")["user_id"].transform("sum")
df["perc"] = 100 * df["user_id"] / df["total"]

df.loc[df["b"] == True, "b"] = "Entered business flow"
df.loc[df["b"] == False, "b"] = "Never entered the business flow"


alt.Chart(df).mark_bar().encode(
    x=alt.X("product:N", axis=alt.Axis(title="Product chosen")),
    y=alt.Y("perc:Q", axis=alt.Axis(title="% of users")),
    column="b:N",
    color="product:N",
).properties(width=600, height=600)

### Selecting the right product: how do users navigate the detail pages? 

The following analysis will drop all desktop traffic (as they have a different layout). 


![](detail.png)

Almost 70% of our users start viewing our standard product. 

In [4]:
df = pd.read_csv("product_selection_ksp.csv")

df = df.loc[df["se_action"] == "product_selection.tab_click", :]
df = df.loc[df["dvce_type"] == "Mobile", :]

df["collector_tstamp"] = pd.to_datetime(df["collector_tstamp"])
# keep only days since 10th of june when tracking on the first product selected was implemented.
df = df.loc[df["collector_tstamp"] >= "2020-06-10", :]

# number each tab event by when it was created.
df["rank"] = df.groupby("id")["collector_tstamp"].rank(method="dense", ascending=True)

df = df.loc[df["rank"] == 1, :]
# count which products are looked at first!
df = df.groupby("se_label")["id"].agg("nunique").reset_index()

df["perc"] = 100 * df["id"] / sum(df["id"])

alt.Chart(df).mark_bar().encode(
    x=alt.X("se_label:N", axis=alt.Axis(title="First product viewed")),
    y=alt.Y("perc:Q", axis=alt.Axis(title="% of users")),
).properties(width=500, height=500, title="First product viewed")

It's bad news that almost 70% of our customers start with the Standard product. Ideally, we want our users to explore the premium products. The next question is how often they view another product! It shows the % of users that looked at 1,2,3 etc. different tiers. 

- 77% of users only look at a single tier. 
- 11% look at 2/3 tiers respectively. 

In [4]:
# how often do users look at multiple products?
df = pd.read_csv("product_selection_ksp.csv")

df = df.loc[df["se_action"] == "product_selection.tab_click", :]

df["collector_tstamp"] = pd.to_datetime(df["collector_tstamp"])
# keep only days since 10th of june when tracking on the first product selected was implemented.
df = df.loc[df["collector_tstamp"] >= "2020-06-10", :]

df = df.groupby("id")["se_label"].agg("nunique").reset_index()
df = df.groupby("se_label")["id"].agg("nunique").reset_index()

df["perc"] = 100 * df["id"] / sum(df["id"])

alt.Chart(df).mark_bar().encode(
    x=alt.X("se_label:N", axis=alt.Axis(title="Number of products viewed")),
    y=alt.Y("perc:Q", axis=alt.Axis(title="% of users")),
).properties(width=700, height=700, title="Number of tiers viewed in the funnel")

In [5]:
df.head()

Unnamed: 0,se_label,id,perc
0,1,9547,77.01678
1,2,1371,11.060019
2,3,1438,11.600516
3,4,18,0.145208
4,5,20,0.161342


Depending on your first product viewed, what's your most likely next step? 

Each row of the heatmap shows the first product viewed of a customer. The columns represent their second steps. The cell in row Standard and column Black-Card-Monthly, gives you the % of users who started with Standard and then moved to You. 

- 58% of those that start with Standard look at no further product. 

In [6]:
df = pd.read_csv("product_selection_ksp.csv")

df = df.loc[df["se_action"] == "product_selection.tab_click", :]
df = df.loc[df["dvce_type"] == "Mobile", :]

df["collector_tstamp"] = pd.to_datetime(df["collector_tstamp"])
# keep only days since 10th of june when tracking on the first product selected was implemented.
df = df.loc[df["collector_tstamp"] >= "2020-06-10", :]

# number each tab event by when it was created.
df["rank"] = df.groupby("id")["collector_tstamp"].rank(method="dense", ascending=True)

df = df.loc[df["rank"] < 3, ["id", "se_label", "rank"]]

df = (
    pd.pivot_table(df, index=["id"], columns=["rank"], aggfunc="first")
    .fillna("No second product")
    .reset_index()
)
df.columns = ["id", "First choice", "Second choice"]

df = df.groupby(["First choice", "Second choice"])["id"].agg("nunique").reset_index()

df["perc"] = round(
    100 * df["id"] / df.groupby(["First choice"])["id"].transform("sum"), 0
)


base = alt.Chart(df).encode(
    x="Second choice:O",
    y="First choice:O",
)
heatmap = base.mark_rect().encode(color="perc:Q").properties(width=500, height=500)

text = base.mark_text(baseline="middle").encode(text="perc:Q")

heatmap + text

### Understanding conversion in the product selection funnel. Where do users drop off?

I already showed the first screens: account purpose (business or non-business) and detail screen. 

We then moved on to explore people switching across tiers in the section above.  

After this users choose the colour of their card, a delivery method and hopefully confirm their purchase on the last screen. 


![](rest_funnel.png)

The first thing to understand how many people we loose in the first place. Below you see that our product selection conversion ranges above 90%. 

In [7]:
df = pd.read_csv("product_selection_ksp.csv")

df = df.loc[
    df["se_action"].isin(
        ["product_selection.step_entered", "product_selection.success"]
    ),
    :,
]

# flag users that finish the flow
df["finish"] = 0
df.loc[df["se_action"] == "product_selection.success", "finish"] = 1

df = df.groupby(["id", "dvce_type"])["finish"].agg("max").reset_index()

df = df.groupby("dvce_type")["finish"].agg("mean").reset_index()


alt.Chart(df).mark_bar().encode(
    x=alt.X("dvce_type:N", axis=alt.Axis(title="Device type")),
    y=alt.Y(
        "finish:Q",
        axis=alt.Axis(title="% of users finish product selection", format="%"),
    ),
).properties(width=500, height=500)

On which screens do we loose customers? The next graph shows for each screen what % of users failed on a particular screen. 

Definition of failing on a screen = the screen is the last screen we record for this user. 

- 45% of all failures happen on the very first screen. 
- 20% of failures happen on the detail screens. 

In [8]:
df = pd.read_csv("product_selection_ksp.csv")

df = df.loc[
    df["se_action"].isin(
        [
            "account_purpose.step_entered",
            "product_selection.step_entered",
            "product_selection.success",
        ]
    ),
    :,
]

# flag users that finish the flow
df["finish"] = 0
df.loc[df["se_action"] == "product_selection.success", "finish"] = 1
# drop those that finished
df["finisher"] = df.groupby("id")["finish"].transform("max")

df = df.loc[df["finisher"] == 0, :]

# now rank each screen view
df["collector_tstamp"] = pd.to_datetime(df["collector_tstamp"])
df["rank"] = df.groupby("id")["collector_tstamp"].rank(method="dense", ascending=True)

df["max"] = df.groupby("id")["rank"].transform("max")

# keep only last step
df = df.loc[df["rank"] == df["max"], :]

# compute the % of times that a step was the last step
df = df.groupby("se_label")["id"].agg("nunique").reset_index()

df["perc"] = 100 * df["id"] / sum(df["id"])

alt.Chart(df).mark_bar().encode(
    x=alt.X(
        "se_label:N",
        axis=alt.Axis(title="Step"),
        sort=alt.EncodingSortField(field="perc", op="mean", order="descending"),
    ),
    y=alt.Y("perc:Q", axis=alt.Axis(title="% of users that drop off")),
).properties(width=500, height=500, title="Last screen of users that drop off")

In [12]:
df.head(10)

Unnamed: 0,se_label,id,perc
0,DELIVERY,394,10.965767
1,DESIGN,35,0.974116
2,DETAIL,706,19.649318
3,PREVIEW,562,15.641525
4,TERMS,107,2.978013
5,account-purpose-industries,79,2.19872
6,account-purpose-occupations,26,0.723629
7,account-purpose-preselected-occupations,3,0.083496
8,account-purpose-type,1681,46.785416


### Linking funnel behaviour of a user to real outcomes (activation, KYC completion). 

This is the natural next step. Since the new UX is only live for a short amount of time we should revisit this later. 