This is a template for A/B test analysis. Template variables (see string.Template) starting with $ have to be replaced by actual values.

(template: scdata/templates/search_experiment.ipynb)

Motivation, observation, and action items can be found in the proposal (see link below).



In [1]:
%load_ext autoreload
%autoreload 2

import warnings
warnings.filterwarnings('ignore')

from scdata.ab_testing import Experiment
from scdata.ab_testing.analyses import SearchAnalysis
from scdata.metrics.search import RelevanceLabels

experiment = Experiment('search_fuzzy_search')
is_relevant = RelevanceLabels.has_interaction()
analysis = SearchAnalysis(experiment, user_cohort='all_users', spark=spark, is_relevant=is_relevant, num_buckets_per_day=1)
experiment.display_description()
final_evaluation = False

ModuleNotFoundError: No module named 'scdata'

Sanity Checks
In the following, we study basic assignment properties to ensure that each variant was exposed equally likely, i.e., the statistic of each variant should be approximately equal, thus the significance of the observed difference low.

Total number of searches per variant

In [None]:
analysis.get_unique_searches_per_variant()

Visitors per Day
The number of visitors are defined as the number of user with at least one search.

In [None]:
%%time
visitors = analysis.get_visitors()
visitors.plot_summary()

Behavior Analysis
How do the variants differ with respect to the presented content

Collection Distribution Per Position

In [None]:
%%time
collection_distribution = analysis.get_collection_distribution_per_position()
collection_distribution.plot();

Performance Comparison
How do the variants differ with respect to our metrics

Click-through Rate
Click-through rate, or the inverse bounce rate is calculated by summing search attributed plays, item navigation, and engagement actions. This metric measures candidate selection quality.

Additionally, we consider ctr@k - the inverse bounds rate within the top k positions. These metrics measure ranking quality.

In [None]:
for k in ['inf', 1, 3, 5, 10]:
    print "CTR@{}".format(k)
    ctr = analysis.get_ctr_at(k)
    ctr.plot_summary()
    print

CTR@k statistics for all countries

In [None]:
for k in ['inf', 1, 3, 5, 10]:
    print "statistics@{}".format(k)
    statistics, diff = analysis.get_statistics_at(k)
    display(statistics)
    if not diff.empty:
        print("Statistically significant differences between variants:")
        display(diff)

CTR@k statistics for selected countries

In [None]:
countries = ['US'] #default

In [None]:
for k in ['inf', 1, 3, 5, 10]:
    for country in countries:
        print "statistics@{} for country {}".format(k, country)
        statistics, diff = analysis.get_statistics_at(k, country)
        display(statistics)
        if not diff.empty:
            print("Statistically significant differences between variants:")
            display(diff)

Search Success Metric



In [None]:
# search_success = analysis.get_search_success()
# search_success.plot_summary()


Search Success for selected countries

In [None]:
# countries = ['US'] #default

In [None]:
# for country in countries:
#     search_success = analysis.get_search_success(country)
#     search_success.plot_summary()


Normalized Discounted Cumulative Gain

In [None]:
ndcg = analysis.get_ndcg()
ndcg.plot_summary()


Pairwise Accuracy

In [None]:

pairwise_acc = analysis.get_pairwise_accuracy_at_k(20)
pairwise_acc.plot_summary()

Time to First Click
This metric is an indicator for UI quality. Note, that this metric is computed only over clicks; it could be that a variant with lower ttfc has fewer clicks.



In [None]:
ttfc = analysis.get_time_to_first_click(quantile=50)
ttfc.plot();

Listening Time
Listening time serves as a guard metric; it is not a KPI for search.

In [None]:
listening_time = analysis.get_listening_time(quantile=50)
listening_time.plot_summary()

In [None]:
listening_time = analysis.get_listening_time(quantile=75)
listening_time.plot_summary()

___________________________________

Appendix
Click-through Rate by Dimensions

In [None]:
k = 3

Page Name

In [None]:
%%time
ctr = analysis.get_ctr_at_k_by_page_name(k)
ctr.plot();

In [None]:
Click Type

In [None]:
%%time
ctr = analysis.get_ctr_at_k_by_engagement_type(k)
ctr.plot();

User Tier

In [None]:
%%time
ctr = analysis.get_ctr_at_k_by_user_tier(k)
ctr.plot();


Collection


In [None]:
%%time
ctr = analysis.get_ctr_at_k_by_collection(k)
ctr.plot();

Logged-in Status

In [None]:
%%time
if final_evaluation:
    ctr = analysis.get_ctr_at_k_by_logged_in_status(k)
    ctr.plot();

Country


In [None]:
%%time
if final_evaluation:
    ctr = analysis.get_ctr_at_k_by_country(k)
    ctr.plot();
