title: Customer Snapshots split test

author: Nimi Thomas

date: 2021-06-24

region: EU  

tags: assistance, eu, ab, split

summary:  Customer snapshots is a DASH26 module which shows the most accessed user information in just one view. This one module shows an aggregated view of multiple modules including customer information on; personal details, account, kyc, cards and transactions. We want to know if DASH26 improvements alone can improve handling times.

# Customer Snapshots

### Contents 
1. [Background](#section1)  
2. [Current Metrics](#section2)  
3. [Hypothesis](#section3)  
4. [Experimental Design](#section4) 
5. [Statistical Analysis](#section5)
6. [Final Results](#section6)
    - a) [Overall](#section6a)
    - b) [Tag split](#section6b)
    - c) [Chat Origin](#section6c)
    - d) [Tenure split](#section6d)


<a id='section1'></a>
## Background

This initiative is the result of [user surveys](https://docs.google.com/spreadsheets/d/17hrHuZmNRnJeG9Pn24Im3Aa2lWFIJlbV0GcFX9L7zKI/edit?usp=sharing) on the performance of DASH26 which highlighted the importance for consolidated customer data. 

For the vast majority of contacts processed by Customer Service, only a small fraction of the data available in Dash26 to first-level specialists is actually required to resolve a contact. This data is scattered across multiple modules and tabs, resulting in unnecessarily long loading times and thus longer contact resolution times.

Our goal is to remove unnecessary clicks, navigation and loading time by consolidating most of the necessary data into a “snapshot” of the customer. This should provide specialists with quicker access to the data they need and resolve the majority of contacts faster. 

There are certain CS tags that may be more relevant to this feature including: 
'sign_up','kyc_acceptance','card_delivery','card_activation/pin','login', 'confirmations/statements','dt/standing_order','ct/missing_ct','direct_debit', 'instant_ct_/missing_ct','instant_dt','savings','overdraft' since most of the information the agent needs for these sort of contacts will be in the snapshot feature.

For more info check out this confluence page [here](https://number26-jira.atlassian.net/wiki/spaces/CS/pages/2376532754/Customer+Snapshot+in+Dash26).


### TLDR 
Customer snapshots is a DASH26 module which shows the most accessed user information in just one view. This one module shows an aggregated view of multiple modules including customer information on; personal details, account, kyc, cards and transactions. We want to know if DASH26 improvements alone can improve handling times.

### Findings

- The overall results did show significant evidence to prove a small decrease in handle times (10-20 seconds) in the group of tags we aimed to target. 
- When we drilled down by specific tags within this group, our results were inconclusive. The only significant result seemed to be for tag sign_up and this only makes up 6% of our specific tags group. We have to acknowledge we have smaller sample sizes when we drill down results further. The Snapshot module does not seem to benefit on a tag level so there may be other contributing factors.
- When we drilled down by chat origin we saw there was a significant result for Support Center chats. Agents handling contacts from Support Center may benefit from the Customer Snapshots module since generic customer information may be required for authentication.

### Limitations 

Handle time is made up of time chatting to a customer (Salesforce time), time investigating (DASH time) and wrap up.

Due to a lack of DASH tracking we are unable to track which agents actually use the new feature in the variant group of the split test. We also cannot track the total time actively in DASH. We are planning to add snowplow tracking in the near future. 

In [None]:
pip install seaborn

In [None]:
pip install pymc3 as pm

In [131]:
import datetime
from IPython.display import display_html
import pandas as pd
import pymc3 as pm
import numpy as np
from numpy.random import seed
from numpy.random import randn
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind, mannwhitneyu, beta, wilcoxon
import seaborn as sns
from statsmodels.graphics.gofplots import qqplot
from utils.datalib_database import df_from_sql
from utils.helper_functions import get_data
import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)

In [86]:
handled_contacts = df_from_sql(
    "redshiftreader",
    """ 
    select 
        a.initiated_date::date as date, 
        date_trunc('week',a.initiated_date)::date as week,
        coalesce(s.chat_origin,'unknown') as chat_origin,
        left(a.specialist_id,15) as specialist_id,
        specialist_tenure_months,
        a.cs_tag as tag_name,
        case 
        	when a.cs_tag = 'wise' then 'Payments' 
        	when segment like 'Bank Products%' then 'Bank Products'
   		else coalesce(segment, 'Other')
    	end as tag_group,
        a.contact_language, 
        a.company_general,
        u.tnc_country_group, 
        case 
            when a.cs_tag in (
                'sign_up','kyc_acceptance','card_delivery','card_activation/pin','login',
                'confirmations/statements','dt/standing_order','ct/missing_ct','direct_debit',
                'instant_ct_/missing_ct','instant_dt','savings','overdraft')
            then true else false 
        end as customer_snapshot_specific_tag, 
        handle_time/60 as handle_time_minutes
    from dbt.sf_all_contacts a 
    inner join dbt.sf_chat_summary s 
        on a.id = s.id
    left join dbt.ucm_cs_mapping cs
    	on a.cs_tag = cs.tag_name
    left join dbt.zrh_users u 
        on a.user_id = u.user_id
    where 1=1
    and a.contact_date is not null 
    and a.abandoned is false
    and a.handle_time > 0
    and a.channel_type = '1st level'
    and a.initiated_date >= '2021-04-05' -- from first full week in April
    and a.initiated_date < '2021-06-21' -- test runs for 2-3 weeks in June
    and (tnc_country_group <> 'GBR' or tnc_country_group is null)  -- handful of contacts
    """,
)

In [87]:
handled_contacts["specialist_tenure_months"] = pd.to_numeric(
    handled_contacts["specialist_tenure_months"]
)
handled_contacts["tenure_bins"] = pd.qcut(
    handled_contacts["specialist_tenure_months"], q=4
)

In [88]:
split_groups = get_data(
    "19eND4rsQfdcMa9bc3K5lXIYlVjenfyNTy7fsNlZYGDw",
    "Final List!A1:J1",
    "Final List!A2:J10000000",
)

<a id='section2'></a>
## Current Metrics

In [89]:
stacked = pd.crosstab(
    handled_contacts.week, handled_contacts.customer_snapshot_specific_tag
).plot.bar(stacked=True, figsize=(20, 5))

for c in stacked.containers:
    stacked.bar_label(c, label_type="center")

<font color='blue'>**Findings:**</font> Volumes of 1st level chats can vary week on week. So we'll have to monitor how much volume we get as the A/B test is rolled out. 

In [90]:
fig, ax = plt.subplots(figsize=(20, 5))

sns.boxplot(
    handled_contacts.week, handled_contacts.handle_time_minutes, ax=ax
).set_title("Handle Time(minutes) percentiles", fontsize=16);

<font color='blue'>**Findings:**</font> Percentiles for Handle Time minutes stays roughly constant over weeks. The median roughly sits around 10-12 minutes. 

Note since we're dealing with a continuous variable(time) we can see quite a few outliers in the data which exceed past the 95th percentile.

<a id='section3'></a>
## Hypothesis

Investigation time in Dash26 will be reduced with more easily accessible customer data via Customer Snapchats module thus impacting the overall 1st Level chat handling times. 

H0 : there is no significant difference between 1st Level handling times 

H1 : the 1st Level handling time of the test group is significantly less than the control group. 


<a id='section4'></a>
## Experimental Design

Specialists will be split based on language, company and tenure. 

**A** group is the control; they will see the current version of DASH26. 

**B** group is the variant; they will see the customer snapshots module in DASH26. 

The A/B testing will be implemented from **2021-06-01**.

In [91]:
# merge handle times with A/B groups
handled_contacts_split = pd.merge(
    handled_contacts, split_groups, how="inner", on=["specialist_id"]
)

In [92]:
handled_contacts_split["date"] = pd.to_datetime(
    handled_contacts_split["date"], format="%Y/%m/%d"
)

In [93]:
print(
    "\033[1m" + "Specialist split in test/ dbt.sf_all_contacts from 2021-06-01 onwards"
)

handled_contacts_split[(handled_contacts_split.date > "2021-05-31")].groupby(
    ["Languages", "Company_Group", "test_group"]
)["specialist_id"].nunique().unstack("test_group")

In [94]:
print("\033[1m" + "Handle Time quantiles 3 months before split test")
past_handled_contacts = handled_contacts_split[
    (handled_contacts_split.date < "2021-05-31")
]
past_handled_contacts.describe()

In [95]:
plt.figure(figsize=(20, 5))
vio_split = sns.violinplot(
    past_handled_contacts.week,
    past_handled_contacts.handle_time_minutes,
    hue=handled_contacts_split["test_group"],
    split=True,
)
vio_split.set_title(
    "Distribution of handle time minutes split by experiment groups", fontsize=16
);

<font color='blue'>**Findings:**</font> The distribution of 1st Level chat handle times seem to be right skewed.

We have to acknowledge that we do have many outliers in our distribution. This may make it difficult to compare means between A and B groups under the more common statistical tests for normal distributions. 

<a id='section5'></a>
## Statistical Analysis

**Independent variable** : Dash26 version (binary), agent split by language (categorical) ,company (categorical) and tenure(continuous). 

**Dependant variable** : 1st Level handle time (continuous/ right skewed with many outlier)

**Statistical Test** : Mann-Whitney test (aka the Wilcoxon 2-sample Test) which compares the medians/rank sums from two populations and works when the Y variable is continuous. It is much more robust against outliers and heavy tail distributions. 

**Type I Error** : Significance level, α = 0.05

<font color='grey'> **Type II Error** : β = 0.2 </font> 

**Power** : The Mann Whitney test is under powered in general so the chances of a false positive is quite low. This means if we find a significant result, we can be somewhat confident that it's significant/reflects reality. 

<font color='grey'>(**Standard Power** : 1 – β = 1 – 0.2 = 0.8)</font>  

**Hypothesis** - one tailed test: 

H0: the distributions of both samples are equal

H1: the distributions of the variant sample is lower than the control's  (lower-tailed)


### Historical Data

In [13]:
# Weekly Mann Whitney function for different dataframes input


def mann_whiteney_test(
    df, df_text, time_frequency, alternative, alpha, cumulative=False
):
    for i in df[time_frequency].unique():
        if cumulative == True:
            control = df[(df["Group (A/B)"] == "A") & (df[time_frequency] <= i)][
                "handle_time_minutes"
            ]
            variant = df[(df["Group (A/B)"] == "B") & (df[time_frequency] <= i)][
                "handle_time_minutes"
            ]
        else:
            control = df[(df["Group (A/B)"] == "A") & (df[time_frequency] == i)][
                "handle_time_minutes"
            ]
            variant = df[(df["Group (A/B)"] == "B") & (df[time_frequency] == i)][
                "handle_time_minutes"
            ]

        stat, p = mannwhitneyu(variant, control, alternative=alternative)

        # interpret
        if p > alpha:
            mannwhiteney_list.append(
                [
                    i,
                    df_text,
                    p,
                    "Samples have similar distributions (fail to reject H0)",
                ]
            )
            # print('Same distribution (fail to reject H0)')
        else:
            mannwhiteney_list.append(
                [
                    i,
                    df_text,
                    p,
                    f"Variant time distribution is significantly different ({alternative}) (reject H0)",
                ]
            )
            # print('Different distribution (reject H0)')

In [16]:
specific_tags = past_handled_contacts[
    past_handled_contacts["customer_snapshot_specific_tag"] == True
]

mannwhiteney_list = []

mann_whiteney_test(
    past_handled_contacts, "all tags", "week", "two-sided", 0.025, cumulative=True
)
mann_whiteney_test(
    specific_tags, "specific tags", "week", "two-sided", 0.025, cumulative=True
)

In [17]:
pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_rows", None)

print(
    "\033[1m"
    + "Weekly Mann Whiteney U test on historical handle time quantiles 2 months before split test"
)

mannwhiteney = pd.DataFrame(
    mannwhiteney_list, columns=["week", "test", "p_value", "result"]
)
mannwhiteney.groupby(["week", "test"]).min().unstack("test")

### Stratify to control for differences between groups

In [219]:
# Get group splits across all contacts

stratified_df = (
    handled_contacts["tnc_country_group"]
    + " - "
    + handled_contacts["company_general"]
    + " - "
    + handled_contacts["chat_origin"]
)

In [220]:
stratified_df = (stratified_df.value_counts(normalize=True)).to_frame().reset_index()

In [221]:
handled_contacts_split["stratify"] = (
    handled_contacts_split["tnc_country_group"]
    + " - "
    + handled_contacts_split["company_general"]
    + " - "
    + handled_contacts["chat_origin"]
)

control = handled_contacts_split[(handled_contacts_split["Group (A/B)"] == "A")]
variant = handled_contacts_split[(handled_contacts_split["Group (A/B)"] == "B")]

In [186]:
def stratify_data(
    df_data,
    stratify_column_name,
    stratify_values,
    stratify_proportions,
    random_state=None,
):
    """Stratifies data according to the values and proportions passed in
    Args:
        df_data (DataFrame): source data
        stratify_column_name (str): The name of the single column in the dataframe that holds the data values that will be used to stratify the data
        stratify_values (list of str): A list of all of the potential values for stratifying e.g. "Male, Graduate", "Male, Undergraduate", "Female, Graduate", "Female, Undergraduate"
        stratify_proportions (list of float): A list of numbers representing the desired propotions for stratifying e.g. 0.4, 0.4, 0.2, 0.2, The list values must add up to 1 and must match the number of values in stratify_values
        random_state (int, optional): sets the random_state. Defaults to None.
    Returns:
        DataFrame: a new dataframe based on df_data that has the new proportions represnting the desired strategy for stratifying
    """
    df_stratified = pd.DataFrame(
        columns=df_data.columns
    )  # Create an empty DataFrame with column names matching df_data

    pos = -1
    for i in range(
        len(stratify_values)
    ):  # iterate over the stratify values (e.g. "Male, Undergraduate" etc.)
        pos += 1
        if pos == len(stratify_values) - 1:
            ratio_len = len(df_data) - len(
                df_stratified
            )  # if this is the final iteration make sure we calculate the number of values for the last set such that the return data has the same number of rows as the source data
        else:
            ratio_len = int(
                len(df_data) * stratify_proportions[i]
            )  # Calculate the number of rows to match the desired proportion

        df_filtered = df_data[
            df_data[stratify_column_name] == stratify_values[i]
        ]  # Filter the source data based on the currently selected stratify value

        try:
            df_temp = df_filtered.sample(
                replace=False, n=ratio_len, random_state=random_state
            )  # Sample the filtered data using the calculated ratio
        except:
            df_temp = df_filtered  # pd.DataFrame() # when sample size is small, take the empty/small filtered datframe

        df_stratified = pd.concat(
            [df_stratified, df_temp]
        )  # Add the sampled / stratified datasets together to produce the final result

    return df_stratified  # Return the stratified, re-sampled data

In [222]:
control_stratified = stratify_data(
    control,
    "stratify",
    stratified_df.iloc[:, 0],
    stratified_df.iloc[:, 1],
    random_state=1,
)
variant_stratified = stratify_data(
    variant,
    "stratify",
    stratified_df.iloc[:, 0],
    stratified_df.iloc[:, 1],
    random_state=2,
)

In [223]:
handled_contacts_strat = control_stratified.append([variant_stratified])

In [224]:
print("Total removed rows:")
handled_contacts_split.shape[0] - handled_contacts_strat.shape[0]

In [280]:
past_handled_contacts_strat = handled_contacts_strat[
    (handled_contacts_strat.date < "2021-05-31")
]
specific_tags_strat = past_handled_contacts_strat[
    past_handled_contacts_strat["customer_snapshot_specific_tag"] == True
]
other_tags_strat = past_handled_contacts_strat[
    past_handled_contacts_strat["customer_snapshot_specific_tag"] == False
]

In [226]:
mannwhiteney_list = []

mann_whiteney_test(
    past_handled_contacts_strat, "all tags", "week", "two-sided", 0.025, cumulative=True
)
mann_whiteney_test(
    specific_tags_strat, "specific tags", "week", "two-sided", 0.025, cumulative=True
)

In [227]:
pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_rows", None)

print(
    "\033[1m"
    + "Cumulative weekly Mann Whiteney U test on historical handle time quantiles 3 months before split test"
)

mannwhiteney = pd.DataFrame(
    mannwhiteney_list, columns=["week", "test", "p_value", "result"]
)
mannwhiteney.groupby(["week", "test"]).min().unstack("test")

<a id='section6'></a>
## Final Results

There was an incident in DASH26 on the first day of release, so we will exclude 2021-06-01 from results.

<a id='section6a'></a>
### Overall results

In [229]:
final_handled_contacts = handled_contacts_strat[
    handled_contacts_strat.date > "2021-06-01"
]
final_specific_tags = final_handled_contacts[
    final_handled_contacts["customer_snapshot_specific_tag"] == True
]
final_other_tags = final_handled_contacts[
    final_handled_contacts["customer_snapshot_specific_tag"] == False
]

In [230]:
# Calculate number of obs per group & median to position labels


def box_plot_with_n_label(df, order, title):
    medians = df.groupby(["test_group"])["handle_time_minutes"].median().values
    nobs = df["test_group"].value_counts().values
    nobs = [str(x) for x in nobs.tolist()]
    nobs = ["n: " + i for i in nobs]

    # Add it to the plot
    pos = range(len(nobs))
    for tick, label in zip(pos, ax[order].get_xticklabels()):
        ax[order].text(
            pos[tick],
            medians[tick] + 2,
            nobs[tick],
            horizontalalignment="center",
            size="small",
            color="w",
            weight="semibold",
        )

    sns.boxplot(
        df["test_group"],
        df.handle_time_minutes,
        ax=ax[order],
        order=["control", "variant"],
    ).set_title(title, fontsize=16);
    # sns.boxplot(final_specific_tags["Group (A/B)"], final_specific_tags.handle_time_minutes, ax=ax[1]).set_title('Specific Tags Handle Time(minutes) percentiles', fontsize=16);

In [231]:
fig, ax = plt.subplots(1, 2, figsize=(20, 7))

box_plot_with_n_label(final_handled_contacts, 0, "Handle Time(minutes) percentiles")
box_plot_with_n_label(
    final_specific_tags, 1, "Specific Tags Handle Time(minutes) percentiles"
)

In [232]:
df1 = (
    final_handled_contacts[["handle_time_minutes", "Group (A/B)"]]
    .groupby("Group (A/B)")
    .describe()
    .unstack(1)
    .reset_index()
    .pivot(index="Group (A/B)", values=0, columns="level_1")
)
df2 = (
    final_specific_tags[["handle_time_minutes", "Group (A/B)"]]
    .groupby("Group (A/B)")
    .describe()
    .unstack(1)
    .reset_index()
    .pivot(index="Group (A/B)", values=0, columns="level_1")
)

df1_styler = df1.style.set_table_attributes("style='display:inline'").set_caption(
    "All tags"
)
df2_styler = df2.style.set_table_attributes("style='display:inline'").set_caption(
    "Specific tags"
)

display_html(df1_styler._repr_html_() + df2_styler._repr_html_(), raw=True)

<font color='blue'>**Findings:**</font> Visually there's not much difference in handle time overall or for our specific tags group between control and variant. From the final figures there is a 10-20 seconds decrease from the control's to variant's median in the specific tags group. 

In [233]:
mannwhiteney_list = []
mann_whiteney_test(
    final_handled_contacts, "all tags", "date", "less", 0.05, cumulative=True
)
mann_whiteney_test(
    final_specific_tags, "specific tags", "date", "less", 0.05, cumulative=True
)
mannwhiteney_cumulative = pd.DataFrame(
    mannwhiteney_list, columns=["date", "test", "p_value", "result"]
)

In [234]:
p_val_plot = (
    mannwhiteney_cumulative.groupby(["date", "test"])["p_value"].min().unstack("test")
)
p_val_plot["alpha; reject null hypothesis area"] = 0.05

In [235]:
p_val_plot.plot(figsize=(20, 5)).set_title(
    "Cumulative Mann Whiteney Test - p value", fontsize=16
);

In [236]:
mannwhiteney_cumulative[
    mannwhiteney_cumulative.date == mannwhiteney_cumulative["date"].max()
]

<font color='blue'>**Findings:**</font> Though the decrease is small for the specific tags group; ~2.24% decrease in the median and ~3.76% decrease in the 25th percentile, the result is significant using the Mann Whitney U test. The Snapshots Module feature does not impact the overall handle time distribution. 

<a id='section6b'></a>
### Tag split
#### Specific Tags

We had the assumption that the Customer Snapshots module is more relevent to certain CS tags:
- card_activation/pin
- card_delivery
- confirmations/statements
- ct/missing_ct
- direct_debit
- dt/standing_order
- instant_ct_/missing_ct
- instant_dt
- kyc_acceptance
- login
- overdraft
- savings
- sign_up

In [250]:
# Cumulative daily Mann Whitney function for variable split
def variable_mann_whiteney_test(df, variable):
    for date in df["date"].unique():
        for i in df[variable].unique():
            control = df[
                (df["Group (A/B)"] == "A") & (df["date"] <= date) & (df[variable] == i)
            ]["handle_time_minutes"]
            variant = df[
                (df["Group (A/B)"] == "B") & (df["date"] <= date) & (df[variable] == i)
            ]["handle_time_minutes"]

            try:
                stat, p = mannwhitneyu(variant, control, alternative="less")

                # interpret
                alpha = 0.05

                if p > alpha:
                    cumulative_tag_list.append(
                        [
                            date,
                            len(control.index),
                            len(variant.index),
                            i,
                            p,
                            "Samples have similar distributions (fail to reject H0)",
                        ]
                    )
                else:
                    cumulative_tag_list.append(
                        [
                            date,
                            len(control.index),
                            len(variant.index),
                            i,
                            p,
                            "Variant time distribution is significantly lower than the control (reject H0)",
                        ]
                    )
            except Exception:
                pass

In [283]:
cumulative_tag_list = []
variable_mann_whiteney_test(specific_tags_strat, "tag_name")
mannwhiteney_cumulative_tags = pd.DataFrame(
    cumulative_tag_list,
    columns=["date", "control size", "variant size", "tag_name", "p_value", "result"],
)

In [284]:
print("\033[1m" + "Historicals")

mannwhiteney_cumulative_tags[
    mannwhiteney_cumulative_tags.date == mannwhiteney_cumulative_tags["date"].max()
].sort_values(by=["control size"], ascending=False)

In [286]:
cumulative_tag_list = []
variable_mann_whiteney_test(final_specific_tags, "tag_name")
mannwhiteney_cumulative_tags = pd.DataFrame(
    cumulative_tag_list,
    columns=["date", "control size", "variant size", "tag_name", "p_value", "result"],
)

In [287]:
print("\033[1m" + "Experiment results")

mannwhiteney_cumulative_tags[
    mannwhiteney_cumulative_tags.date == mannwhiteney_cumulative_tags["date"].max()
].sort_values(by=["control size"], ascending=False)

<font color='blue'>**Findings:**</font> Looking at per tag in the specific tags group, there doesn't seem to be sufficient evidence to prove a decrease in handle time distribution. Results are inconclusive. Tag sign_up was the only significant result, but the sample size is smaller and we have to question whether the snapshot module really does benefit this tag.

Note: We have to acknowledge that we would have had to run the test for much longer if we wanted further breakdowns, we can already see the sample size is small for some of these tag names at the end of the list.

<a id='section6c'></a>
### Chat origin split

Assumption: Contacts from support center may benefit from the customer snapshots since generic customer information may be required for authentication. 

In [288]:
cumulative_tag_list = []
variable_mann_whiteney_test(past_handled_contacts_strat, "chat_origin")
mannwhiteney_cumulative_tags = pd.DataFrame(
    cumulative_tag_list,
    columns=[
        "date",
        "control size",
        "variant size",
        "chat_origin",
        "p_value",
        "result",
    ],
)

In [289]:
print("\033[1m" + "Historicals")

mannwhiteney_cumulative_tags[
    mannwhiteney_cumulative_tags.date == mannwhiteney_cumulative_tags["date"].max()
].sort_values(by=["control size"], ascending=False)

In [290]:
cumulative_tag_list = []
variable_mann_whiteney_test(final_handled_contacts, "chat_origin")
mannwhiteney_cumulative_tags = pd.DataFrame(
    cumulative_tag_list,
    columns=[
        "date",
        "control size",
        "variant size",
        "chat_origin",
        "p_value",
        "result",
    ],
)

In [291]:
print("\033[1m" + "Experiment results")

mannwhiteney_cumulative_tags[
    mannwhiteney_cumulative_tags.date == mannwhiteney_cumulative_tags["date"].max()
].sort_values(by=["chat_origin"])

<font color='blue'>**Findings:**</font> Looking at chat origin groups, there is sufficient evidence to prove a decrease in handle time distribution specifically for Support Center contacts. 

<a id='section6d'></a>
### Tenure split

Assumption : newer agents don't have a set way of working and we may see the impact of the snapshot module more prevalent here.

In [268]:
cumulative_tag_list = []
variable_mann_whiteney_test(past_handled_contacts_strat, "tenure_bins")
mannwhiteney_cumulative_tags = pd.DataFrame(
    cumulative_tag_list,
    columns=[
        "date",
        "control size",
        "variant size",
        "tenure_bins",
        "p_value",
        "result",
    ],
)

In [269]:
print("\033[1m" + "Historicals")

mannwhiteney_cumulative_tags[
    mannwhiteney_cumulative_tags.date == mannwhiteney_cumulative_tags["date"].max()
].sort_values(by=["tenure_bins"])

In [270]:
cumulative_tag_list = []
variable_mann_whiteney_test(final_handled_contacts, "tenure_bins")
mannwhiteney_cumulative_tags = pd.DataFrame(
    cumulative_tag_list,
    columns=[
        "date",
        "control size",
        "variant size",
        "tenure_bins",
        "p_value",
        "result",
    ],
)

In [271]:
print("\033[1m" + "Experiment results")

mannwhiteney_cumulative_tags[
    mannwhiteney_cumulative_tags.date == mannwhiteney_cumulative_tags["date"].max()
].sort_values(by=["tenure_bins"])

<font color='blue'>**Findings:**</font> Looking at tenure groups, there doesn't seem to be sufficient evidence to prove a decrease in handle time distribution. Results are inconclusive.