In [1]:
'''
Welcome to the Carvana Analytics experiment evaluation assignment!

In this exercise you will be presented with data from a fictional A/B test and asked to evaluate and interpret the results.

Carvana.com is testing a new feature and is running an A/B test to quantify the impact it has on different types of users. Before their first search, a user is bucketed into one of two treatment groups:
  Test - the user is exposed to the new feature
  Control - the user is not exposed to the new feature

All bucketed users can be found in the "users" dataframe/table. This data contains:
  user_id
  region
  treatment

All searches done by users can be found in the "searches" dataframe/table. This data contains:
  user_id
  event_date_time
  device_type
  event_id

As users progress through the website, their searches may return vehicles they are interested in. Clicking on a vehicle takes them to a vehicle detail page, or VDP. This is a strong sign of engagement.

All VDPs done by users can be found in the "vdps" dataframe/table. This data contains:
  user_id
  event_date_time
  device_type
  event_id

A user can purchase a vehicle driectly from a VDP. This constitues a sale and a conversion for the user.

All sales completed by users can be found in the "sales" dataframe/table. This data contains:
  user_id
  event_date_time
  device_type
  event_id

Use the four datasets described to accomplish the following tasks:

1) Evaluate the effect of the new feature on engagement (searches and VDPs) and conversion using statistical significance where applicable
2) Summarize and highlight insights (or issues) in user behavior across various segments
3) Provide a recommendation on whether or not to permanently deploy the feature to all users, some users, or no users

Clone or copy this notebook and run this cell to begin. Once you do so you will be able to work with the data in python and/or write sql queries against the data (see example cells below)

When submitting the assignment please provide a link to your notebook in addition to your typed respones to the items above in .pdf format.

The estimated time for this exercise is 3-4 hours. Please submit your answers to your recruiting coordinator. Good luck!

''' 

################################
#### do not alter this code ####
################################

import pandas as pd

! pip install ipython-sql

users_path="https://s3-us-west-2.amazonaws.com/carvana-analytics-assignment/users.csv"
searches_path="https://s3-us-west-2.amazonaws.com/carvana-analytics-assignment/searches.csv"
vdps_path="https://s3-us-west-2.amazonaws.com/carvana-analytics-assignment/vdps.csv"
sales_path="https://s3-us-west-2.amazonaws.com/carvana-analytics-assignment/sales.csv"

users=pd.read_csv(users_path)
searches=pd.read_csv(searches_path)
vdps=pd.read_csv(vdps_path)
sales=pd.read_csv(sales_path)

%load_ext sql
%sql sqlite://

%sql persist users
%sql persist searches
%sql persist vdps
%sql persist sales

################################
#### do not alter this code ####
################################

 * sqlite://
(sqlite3.OperationalError) near "persist": syntax error
[SQL: persist users]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
 * sqlite://
(sqlite3.OperationalError) near "persist": syntax error
[SQL: persist searches]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
 * sqlite://
(sqlite3.OperationalError) near "persist": syntax error
[SQL: persist vdps]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
 * sqlite://
(sqlite3.OperationalError) near "persist": syntax error
[SQL: persist sales]
(Background on this error at: https://sqlalche.me/e/14/e3q8)


In [2]:
#imports that I will need 
# ttest_ind is in the independent t-test for comparing means
# datetime is for managing datetime strings given
from scipy.stats import ttest_ind, chi2_contingency, chi2
from datetime import datetime as dt

In [3]:
# lets group the events to their appropriate user 
searches_by_user = {k: v for k, v in searches.groupby('user_id')}
vdps_by_user = {k: v for k, v in vdps.groupby('user_id')}
sales_by_user = {k: v for k, v in sales.groupby('user_id')}


In [4]:
# This function is the main data cleaning and rearranging step

def attribute_actions_to_user(users_df, searches_by_user, vdps_by_user, sales_by_user):
    """
    Attributes the events (i.e., searches, vdps, sales) to the appropriate user in the study.
    Computes KPIs for each user (duration, count_searches, count_vdps, sale).
    each user entry takes the form of a dict with the keys, 'user_id', 'treatment', 'region',
    'device_type', 'duration', 'count_searches', 'count_vdps', 'sale'.
    
    
    Parameters
    ----------
    users_df : pd.DataFrame object
        Dataframe of the users in the study.
        Must contain the columns 'user_id', 'treatment', 'region', 'event_date_time'

    searches_by_user : pd.GroubyObject
        Groupby object of all the searches performed, grouped by 'user_id'
        Must contain the columns 'user_id', 'device_type', 'event_date_time'

    vdps_by_user : pd.GroubyObject
        Groupby object of all the VDP pages visited, grouped by 'user_id'
        Must contain the columns 'user_id', 'device_type', 'event_date_time'

    sales_by_user : pd.GroubyObject
        Groupby object of all the sales made, grouped by 'user_id'
        Must contain the columns 'user_id', 'device_type', 'event_date_time'

    
    Returns
    -------
        summary_df : pd.DataFrame object
            summary KPIs/statistics for each user. Each columns descriptions are
            'user_id': int 
                unique identifier for a user in the study
            'treatment': str
                'Test' or 'Control'
            'region': str 
                region in ['New England', 'Southwest', 'Southeast', 'Midwest', 'Pacific Northwest']
            'device_type': str
                'Desktop' or 'Mobile' 
            'duration': int
                number of minutes of customer journey
            'count_searches': int
                number of searches performed
            'count_vdps': int
                number of vdps pages visited
            'sale': int
                number of sales completed
    Raises
    ------
        None
    """
    summary_dicts = []
    def get_device(searches_by_user, vdps_by_user, sales_by_user, user_id):
        """
        gets a device name from the initial actions of a user.
        outputs the device as the device where the sale was made.
        """        
        if user_id in searches_by_user.keys():
            device_name = searches_by_user[user_id]['device_type'].iloc[0]
        elif user_id in vdps_by_user.keys():
            device_name = vdps_by_user[user_id]['device_type'].iloc[0]
        elif user_id in sales_by_user.keys():
            device_name = sales_by_user[user_id]['device_type'].iloc[0]
        else:
            device_name = None
        return device_name
    
    def get_duration(searches_by_user, vdps_by_user, sales_by_user, user_id):
        """
        gets the amount of time in minutes that a user spent on their
        customer journey.
        """
        _start_se, _start_vd, _start_sa = None, None, None
        _end_se, _end_vd, _end_sa = None, None, None
        
        if user_id in searches_by_user.keys():
            _start_se = min(searches_by_user[user_id]['event_date_time'])
            _end_se = max(searches_by_user[user_id]['event_date_time'])
        
        if user_id in vdps_by_user.keys():
            _start_vd = min(vdps_by_user[user_id]['event_date_time'])
            _end_vd = max(vdps_by_user[user_id]['event_date_time'])
        
        if user_id in sales_by_user.keys():
            _start_sa = min(sales_by_user[user_id]['event_date_time'])
            _end_sa = max(sales_by_user[user_id]['event_date_time'])

        
        j_start = min([_start for _start in [_start_se, _start_vd, _start_sa] if _start != None ])
        
        j_end =  max([_end for _end in [_end_se, _end_vd, _end_sa] if _end != None ])

        j_start_dt = dt.strptime(j_start, '%m/%d/%y %H:%M') if '//' in str(j_start) else dt.strptime(j_start, '%Y-%m-%d %H:%M:%S')
        j_end_dt = dt.strptime(j_end, '%m/%d/%y %H:%M') if '//' in str(j_end) else dt.strptime(j_end, '%Y-%m-%d %H:%M:%S')
            
        duration = j_end_dt - j_start_dt
        mins = duration.total_seconds()/60
        return mins

    def get_action_counts(searches_by_user, vdps_by_user, sales_by_user, user_id):
        """
        gets a count of the three types of actions(searches, vdp views, sales) that a user performed. 
        outputs the counts into 3 item list.
        """
        _count_searches, _count_vdps, _count_sales = 0, 0, 0
        if user_id in searches_by_user.keys():
            _count_searches = len(searches_by_user[user_id])
            
        if user_id in vdps_by_user.keys():
            _count_vdps = len(vdps_by_user[user_id])

        if user_id in sales_by_user.keys():
            _count_sales = len(sales_by_user[user_id])

        counts = [_count_searches, _count_vdps, _count_sales]
        return counts

    for row in users_df.itertuples():
        device = get_device(searches_by_user, vdps_by_user, sales_by_user, row.user_id)
        mins = get_duration(searches_by_user, vdps_by_user, sales_by_user, row.user_id)
        actioncounts = get_action_counts(searches_by_user, vdps_by_user, sales_by_user, row.user_id)
        summary_dicts.append(
                {
                    'user_id':row.user_id,
                    'treatment': row.treatment,
                    'region':row.region,
                    'device_type': device, 
                    'duration':mins,
                    'count_searches':actioncounts[0],
                    'count_vdps': actioncounts[1],
                    'sale':actioncounts[2]
                }
            )
    summary_df = pd.DataFrame(summary_dicts)
    return summary_df

In [5]:
# lets see what user level base analytics show
user_summary_df = attribute_actions_to_user(users, searches_by_user, vdps_by_user, sales_by_user)
print(user_summary_df.describe())
print(user_summary_df.head())

            user_id      duration  count_searches    count_vdps          sale
count  13000.000000  13000.000000    13000.000000  13000.000000  13000.000000
mean    7499.500000     16.621123        5.477846      1.364462      0.103077
std     3752.921085     35.156267        9.040641      8.854193      0.304071
min     1000.000000      0.000000        1.000000      0.000000      0.000000
25%     4249.750000      3.100000        4.000000      0.000000      0.000000
50%     7499.500000      4.800000        5.000000      1.000000      0.000000
75%    10749.250000      7.216667        7.000000      2.000000      0.000000
max    13999.000000    131.850000     1000.000000   1000.000000      1.000000
   user_id treatment       region device_type   duration  count_searches  \
0     1000      Test    Southwest      Mobile   8.833333              10   
1     1001      Test  New England     Desktop   4.366667               5   
2     1002      Test    Southeast      Mobile   3.550000              

In [24]:
def get_user_level_t_stats(user_data):
    """
    Generates summary data table for independent students t-test for the five
    KPIs being tested in the AB study. The user level KPIs are:
        'duration': int
            number of minutes of customer journey
        'count_searches': int
            number of searches performed
        'count_vdps': int
            number of vdps pages visited
        'sale': int
            number of sales completed
    
    Parameters
    ----------
    user_data : pd.DataFrame object
        Summary user data generated from the function 'attribute_actions_to_user' earlier in this
        notebook. Must contain the columns, 'duration', 'count_searches', 'count_vdps', 'sale'. 
    
    Returns
    -------
    tstat_df : pd.DataFrame object
        Dataframe object containing the summary student t-test info for each KPI. 
        columns include:
            'meanTreat': float
                the mean KPI for the Treatment group
            'meanCtrl': float
                the mean KPI for the Control group
            'Tstat': float
                the t-test statistic calculated from the data
            'pvalue': float
                the corresponding p-value for the given test-statistic and degress of freedom
            'RejH0@0.05': bool
                True if the null hypothesis can be rejected at alpha=0.05.
                False if the null hypothesis cannot be reject at alpha=0.05.
    Raises
    ------
        None
    
    Notes
    -----
        Depends on scipy.stats.ttest_ind 
    """
    tstat_dicts = []
    by_treatment = list(user_data.groupby('treatment'))
    KPIs = ['duration', 'count_searches', 'count_vdps', 'sale']
    for i in KPIs:
        ttestResultStat, ttestResultpval = ttest_ind(by_treatment[1][1][i], by_treatment[0][1][i])
        _entry = {
        'KPI': i,
        'meanTest': by_treatment[1][1][i].mean(),
        'meanCtrl': by_treatment[0][1][i].mean(),
        'Tstat': ttestResultStat,
        'pvalue': ttestResultpval,
        'RejH0@0.05': True if ttestResultpval < 0.05 else False
        }
        tstat_dicts.append(_entry)
    tstat_df = pd.DataFrame(tstat_dicts)
    print(tstat_df)
    return tstat_df

In [25]:
# Chisquare test
def chi_square_test(user_data,attribute):
    """
    Function for performing Chi-Squared test on sales data because the sales data
    is a categorical value (0, 1). 
    I borrow much of the original Chi-Squared implementation from
    https://machinelearningmastery.com/chi-squared-test-for-machine-learning/
    
    Parameters
    ----------
    user_data : pd.Dataframe object
        Contains the user_data from the function 'attribute_actions_to_user'.
        Must contain the columns 'sale', 'region', 'device_type', and 'treatment'.
    
    attribute : str
        attribute to be testing sales against. Must be 'region', 'device_type', or 'treatment'.
    
    Returns 
    -------
        None
    
    Notes
    -----
        Contains print statements for 
            1. contingency table
            2. Chi-Squared critical value calculation
            3. conditional statement on whether to reject H0 @ alpha = 0.05 for critical value
            4. Chi-Squared p-value calculation
            5. conditional statement on whether to reject H0 @ alpha = 0.05 for p-value
    """
    # we can try contingnegyt tables to look at the data for sales
    ctable = pd.crosstab(index=user_data[attribute], columns=user_data['sale'], margins=True)
    print(ctable)
    if attribute == 'region':
        regions = ['New England', 'Southwest', 'Southeast', 'Midwest', 'Pacific Northwest']
        ctable_O = [[ctable.at[region, 0] , ctable.at[region, 1]] for region in regions]
    elif attribute == 'device_type':
        ctable_O = [[ctable.at[device, 0] , ctable.at[device, 1]] for device in ['Desktop', 'Mobile']]
    elif attribute == 'treatment':
        ctable_O = [[ctable.at[treat, 0] , ctable.at[treat, 1]] for treat in ['Control', 'Test']]
    else:
        print("unknown device type")
    
    stat, p, dof, expected = chi2_contingency(ctable_O)
    # print('dof=%d' % dof)
    # print(expected)
    # interpret test-statistic
    prob = 0.95
    critical = chi2.ppf(prob, dof)
    print('probability=%.3f, critical=%.3f, stat=%.3f' % (prob, critical, stat))
    if abs(stat) >= critical:
        print('Dependent (reject H0)')
    else:
        print('Independent (fail to reject H0)')
    # interpret p-value
    alpha = 1.0 - prob
    print('significance=%.3f, p=%.3f' % (alpha, p))
    if p <= alpha:
        print('Dependent (reject H0)')
    else:
        print('Independent (fail to reject H0)')
    


In [26]:
### no grouping 
get_user_level_t_stats(user_summary_df)
chi_square_test(user_summary_df, 'treatment')


              KPI   meanTest   meanCtrl     Tstat    pvalue  RejH0@0.05
0        duration  16.679894  16.563516  0.188700  0.850331       False
1  count_searches   5.620047   5.338462  1.775687  0.075808       False
2      count_vdps   1.287024   1.440366 -0.987257  0.323535       False
3            sale   0.101632   0.104494 -0.536508  0.591617       False
sale           0     1    All
treatment                    
Control     5879   686   6565
Test        5781   654   6435
All        11660  1340  13000
probability=0.950, critical=3.841, stat=0.258
Independent (fail to reject H0)
significance=0.050, p=0.612
Independent (fail to reject H0)


In [32]:
### grouping by device
for item, group in user_summary_df.groupby('device_type'):
    print(item)
    get_user_level_t_stats(group)
    # we can try contingnegyt tables to look at the data for sales
    chi_square_test(group, 'treatment')

Desktop
              KPI   meanTest   meanCtrl     Tstat    pvalue  RejH0@0.05
0        duration  17.311658  17.162309  0.179144  0.857830       False
1  count_searches   5.424602   5.587370 -0.596829  0.550640       False
2      count_vdps   1.382789   1.589243 -0.764724  0.444460       False
3            sale   0.108713   0.108643  0.009694  0.992266       False
sale          0    1   All
treatment                 
Control    3331  406  3737
Test       3304  403  3707
All        6635  809  7444
probability=0.950, critical=3.841, stat=0.000
Independent (fail to reject H0)
significance=0.050, p=1.000
Independent (fail to reject H0)
Mobile
              KPI   meanTest   meanCtrl      Tstat        pvalue  RejH0@0.05
0        duration  15.821408  15.772254   0.053815  9.570847e-01       False
1  count_searches   5.885630   5.009547  13.761598  2.138779e-42        True
2      count_vdps   1.156891   1.243635  -2.493909  1.266336e-02        True
3            sale   0.092009   0.099010  -0.

In [28]:
#grouping by region
for item, group in user_summary_df.groupby('region'):
    print(item)
    get_user_level_t_stats(group)
    # we can try contingnegyt tables to look at the data for sales
    chi_square_test(group, 'treatment')

Midwest
              KPI   meanTest   meanCtrl      Tstat        pvalue  RejH0@0.05
0        duration  12.037316  18.840856  -5.231506  1.819126e-07        True
1  count_searches   5.807571   5.773913   0.357704  7.205947e-01       False
2      count_vdps   0.820978   1.429249 -11.643332  1.469912e-30        True
3            sale   0.059148   0.120158  -5.403799  7.133412e-08        True
sale          0    1   All
treatment                 
Control    1113  152  1265
Test       1193   75  1268
All        2306  227  2533
probability=0.950, critical=3.841, stat=28.148
Dependent (reject H0)
significance=0.050, p=0.000
Dependent (reject H0)
New England
              KPI   meanTest   meanCtrl     Tstat    pvalue  RejH0@0.05
0        duration  15.089132  12.324923  2.152508  0.031450        True
1  count_searches   4.537441   4.198157  4.139686  0.000036        True
2      count_vdps   1.164587   1.013057  3.379553  0.000737        True
3            sale   0.096724   0.075269  1.946521  0.

In [29]:
# grouping by device then region
for device, data in user_summary_df.groupby('device_type'):
    for region, device_data in data.groupby('region'):
        print(device, region)
        try:
            tstat_df = get_user_level_t_stats(device_data)
            # we can try contingnegyt tables to look at the data for sales
            chi_square_test(device_data, 'treatment')
        except Exception as e:
            # this only happens because Mobile Midwest users
            # were not shown the feature at all
            print(e)

Desktop Midwest
              KPI   meanTest   meanCtrl     Tstat    pvalue  RejH0@0.05
0        duration  17.269652  18.508367 -0.648065  0.517046       False
1  count_searches   5.675599   5.562416  0.919699  0.357883       False
2      count_vdps   1.468265   1.318121  2.160228  0.030918        True
3            sale   0.105783   0.118121 -0.744999  0.456393       False
sale          0    1   All
treatment                 
Control     657   88   745
Test        634   75   709
All        1291  163  1454
probability=0.950, critical=3.841, stat=0.439
Independent (fail to reject H0)
significance=0.050, p=0.508
Independent (fail to reject H0)
Desktop New England
              KPI   meanTest   meanCtrl     Tstat    pvalue  RejH0@0.05
0        duration  14.692739  13.607519  0.629399  0.529184       False
1  count_searches   4.322497   4.317744  0.044516  0.964499       False
2      count_vdps   1.106632   1.024759  1.446428  0.148267       False
3            sale   0.094928   0.085282  0.

In [33]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
def three_way_anova(KPI, catVar1, catVar2, catVar3, dataset):
    """
    quick function to perform a three way anova using statsmodels.api,
    and statsmodels.formula.api.ols.
    See https://www.statsmodels.org/stable/anova.html
    """
    lm = ols('{} ~ C({}, Sum)*C({}, Sum)*C({}, Sum)'.format(KPI, catVar1, catVar2, catVar3), data=dataset).fit()
    table = sm.stats.anova_lm(lm, typ=2)
    print(table.to_markdown())

In [31]:
# performing a three-way anova for each KPI 
KPIs = ['duration', 'count_searches', 'count_vdps', 'sale']

for KPI in KPIs:
    print(KPI)
    three_way_anova(KPI, 'treatment', 'device_type', 'region', user_summary_df)

duration
                                                          sum_sq       df  \
C(treatment, Sum)                                   5.286280e+01      1.0   
C(device_type, Sum)                                 6.339614e+03      1.0   
C(region, Sum)                                      4.269443e+04      4.0   
C(treatment, Sum):C(device_type, Sum)               2.822341e+01      1.0   
C(treatment, Sum):C(region, Sum)                    4.008989e+04      4.0   
C(device_type, Sum):C(region, Sum)                  1.658538e+04      4.0   
C(treatment, Sum):C(device_type, Sum):C(region,...  3.099058e+04      4.0   
Residual                                            1.592819e+07  12980.0   

                                                           F        PR(>F)  
C(treatment, Sum)                                   0.043078  8.355812e-01  
C(device_type, Sum)                                 5.166197  2.304685e-02  
C(region, Sum)                                      8.698000  5.23


# Direct Answers to Questions

**1) Evaluate the effect of the new feature on engagement (searches and VDPs) and conversion using statistical significance where applicable**
## Section 1: User Attributes
Users in this dataset have three categorical attributes as Summarized in the table below
| attribute | python\_name | Description |
| -- | -- | -- |
| Device Type | `device_type` | User's device, can be Desktop or Mobile | 
| Region | `region` | User's region, can be Midwest, New England, Pacific Northwest, Southeast, Southwest | 
| Treatment | `treatment` | What testing group the user is in. Can be Test or Control | 


## Section 2: Description of Metrics and KPIs 
The metrics used to evaluate the engagement of the feature were:
 | Metric Name | pd column name | Description | 
 | --------------- | -------------------- | -------- |
| Duration | `duration` | The amount of time in minutes that a user spent on their customer journey. |
| Searches | `count_searches` | The number of searches performed by the user. | 
| VDP Views | `vdps` | The number of VDP pages viewed by the user. |
| Sale | `sale` | The number of performed by the user |

The metrics used to evaluate the conversion attributable to the feature were main raw sales, and sales/user. Being a non-normally distributed value special care had to be taken with this metric.

## Section 3: Significance testing for Feature Evaluation
### Section 3.1: Student t-testing 
The student t-test for independent samples was used to evaluate feature engagement. 
Our goal for the t-test was to compare the means of the populations between test and control groups.
The Null Hypothesis, $H_0$ is that the means for the tested KPI are the same between Test and Control groups. The Alternate Hypothesis $H_1$ is that the means for the tested KPI is *different* between Test and Control Groups  
#### Table 3.1.1: t-test results over entire dataset
| KPI | meanTest | meanCtrl | Tstat | p-value | RejH0@0.05 | 
| -- | -- | -- | -- | -- | -- |
| duration | 16.679894 | 16.563516 | 0.188700  |0.850331 | False | 
| count_searches |5.620047| 5.338462 |1.775687 |0.075808 | False|
| count_vdps|  1.287024 | 1.440366 | -0.987257 | 0.323535 | False| 
| sale | 0.101632  |0.104494|  -0.536508 | 0.591617 | False |

From this test we can see that at the highest level there is no difference between test and control groups. 
We see a KPI called sale that I have not described. This is the average number of sales per customer. It is also the conversion rate for the entire population. However, given that any given user did not buy more than 1 car, the t-test is not appropriate because the output variable for sale is (1,0). We must employ the Chi-Squared test 
### Section 3.2: Chi-Squared testing 
The Chi-Squared test is used to determine whether two categorical variables are related or not. Our goal for the Chi-Squared test was to compare the counts of sales between test and control groups. The null hypothesis, $H_0$ is that there is no difference between the distribution of sales between the Test and Control groups. The alternate hypothesis, $H_1$, is that there *is* a difference between the distributions of the test and control groups. 

#### Table 3.2.1: Contingency Table over the entire dataset
|   | No Sale | Sale | Total |
| -- | -- | -- | -- |
| Control |  5879 | 686 | 6565 | 
| Test | 5781 | 654 | 6435 | 
|Total | 11660 | 1340 | 13000 |   

The Null Hypothesis $H_0$ is that there is no difference between the distribution of sales between 
Test and Control groups. The Alternate Hypothesis $H_1$ is that there *is* a significant difference between Test and Control groups. The p-value observed for this contingency table was $0.612$ which was much larger than the $\alpha=0.05$  test level, meaning that we failed to reject $H_0$. 

While these initial tests may not be promising lets see if there is niche population that could benefit from this feature. 
### Section 3.3: Significance Testing on Grouped Data 
The groupable attributes of a user are device and region. Lets see what kind of Significance testing results we get for users that are grouped by device type. 

#### Table 3.3.1: t-test results for Desktop users

| KPI | meanTest | meanCtrl  |Tstat | pvalue | RejH0@0.05 |
| -- | -- | -- | -- | -- |-- |
 | duration | 17.311658|  17.162309 | 0.179144 | 0.857830 | False| 
  | count_searches | 5.424602 | 5.587370 | -0.596829 | 0.550640 | False | 
  | count_vdps | 1.382789 | 1.589243 | -0.764724 | 0.444460 | False | 
  | sale | 0.108713 | 0.108643 | 0.009694 | 0.992266 | False | sale |
  
  From Table 3.3.1 we can see that this feature has no *significant* effect on the user engagement KPIs for Desktop users.
  #### Table 3.3.2: Contingency Table for Desktop users
  |   | No Sale | Sale | Total |
  | -- | -- | -- | --| 
  | **Control** | 3331 | 406 | 3737
  | **Test** | 3304 | 403 | 3707 |
  |**Total** | 6635 | 809 | 7444 |
  
Following the $H_0$, $H_1$ formats presented in Section 3.2, the Chi-Squared test affords a p-value observed for this contingency table was $1.000$ which was much larger than the $\alpha=0.05$  test level, meaning that we failed to reject $H_0$. 

**We can see from this data that Desktop users are unaffected by the Feature.**

#### Table 3.3.3: t-test results for Mobile 
| KPI | meanTest | meanCtrl | Tstat | pvalue | RejH0@0.05 | 
| -- | -- | -- | -- | -- | -- |
| duration | 15.821408 | 15.772254 | 0.053815 | 9.570847e-01 | False |
| count_searches | 5.885630 | 5.009547 | 13.761598 | 2.138779e-42 | True | 
| count_vdps | 1.156891 | 1.243635 | -2.493909 | 1.266336e-02 | True | 
| sale | 0.092009 | 0.099010 | -0.887252 | 3.749819e-01 | False |

From the results shown in Table 3.3.3 we see that for mobile users they had *statistically significant* increases in the number of searches and the number of VDP pages visited per user. However they did not spend any more time on the site.

#### Table 3.3.4: Contingency Table for Mobile users
|        | No Sale  | Sale | Total |              
| -- | -- | -- | -- |
| **Control** |  2548 |  280 |  2828 |
| **Test**      | 2477 | 251 | 2728 |
| **All**       | 5025 | 531 |  5556 |

Following the $H_0$, $H_1$ formats presented in Section 3.2, the Chi-Squared test affords a p-value observed for this contingency table was $0.400$ which was much larger than the $\alpha=0.05$  test level, meaning that we failed to reject $H_0$. 

**We can see from this data that Mobile users exposed to this feature search and view VDP's more, but neither buy more often nor spend more time on the site.**

Lets also try grouping by region to identify market segments where the feature might be successful. Since we also Identified that Mobile might be the better market we can summarize t-test results by regions in  the following tables 
#### Significant Testing Summary Tables by Region
####  Table 1.1: Duration 
| Region            | % Change | %Ch. Mobile | Significant | Sign. Mobile |
| ----------------- | -------- | ----------- | ----------- | ------------ |
| Midwest           | \-36%    | \-72%       | TRUE        | TRUE         |
| New England       | 22%      | 47%         | TRUE        | TRUE         |
| Pacific Northwest | 12%      | 16%         | FALSE       | FALSE        |
| Southeast         | \-4%     | 12%         | FALSE       | FALSE        |
| Southwest         | 15%      | 26%         | FALSE       | FALSE        |

#### Table 1.2 : Number of Searches 
| Region            | % Change | %Ch. Mobile | Significant | Sign. Mobile |
| ----------------- | -------- | ----------- | ----------- | ------------ |
| Midwest           | 1%       | \-2%        | FALSE       | TRUE         |
| New England       | 8%       | 20%         | TRUE        | TRUE         |
| Pacific Northwest | 11%      | 22%         | TRUE        | TRUE         |
| Southeast         | 13%      | 29%         | TRUE        | TRUE         |
| Southwest         | \-8%     | 19%         | FALSE       | TRUE         |

#### Table 1.3: Number of VDP visits 
| Region            | % Change | %Ch. Mobile | Significant | Sign. Mobile |
| ----------------- | ----- | ------ | ----- | ---- |
| Midwest           | \-43% | \-100% | TRUE  | TRUE |
| New England       | 15%   | 25%    | TRUE  | TRUE |
| Pacific Northwest | 15%   | 25%    | TRUE  | TRUE |
| Southeast         | 4%    | 20%    | FALSE | TRUE |
| Southwest         | \-38% | 27%    | FALSE | TRUE |

#### Table 1.4: Conversion Rate t-test
Note: Conversion rate = (sales/total number of customers in group) percent changes are shown  
| Region            | % Change | %Ch. Mobile | Significant | Sign. Mobile |
| ----------------- | -------- | ----------- | ----------- | ------------ |
| Midwest           | \-51%    | \-100%      | TRUE        | TRUE         |
| New England       | 29%      | 59%         | FALSE       | TRUE         |
| Pacific Northwest | 11%      | 14%         | FALSE       | FALSE        |
| Southeast         | \-11%    | 4%          | FALSE       | FALSE        |
| Southwest         | 18%      | 29%         | FALSE       | FALSE        |

#### Table 1.5: Sales Chi-Squared test 
| Region            | Significant | Sign. Mobile |
| ----------------- | ----------- | ------------ |
| Midwest           | TRUE        | TRUE         |
| New England       | FALSE       | TRUE         |
| Pacific Northwest | FALSE       | FALSE        |
| Southeast         | FALSE       | FALSE        |
| Southwest         | FALSE       | FALSE        |

### Section 3.4 Three-way ANOVA for Attribute and Feature Evaluation
A limitation of a t-test and Chi-Square test is that many individual tests that must be run to evaluate the feature using engagement KPIs. One must also run preliminary tests to conclude that an attribute or subgroup would not benefit from the feature. In addition, Cross effects cannot be analyzed sufficiently. Three-way ANOVA is a way to evaluate categorical variables' (device type, region, treatment) their effect on continuous variables(duration, searches, VDP views, conversion rate). 

A summary of Two-Way ANOVA can be found on Wikipedia, but I summarize the Three-way ANOVA. I summarize my findings for Three way ANOVA in Table 3.4.1

#### Table 3.4.1: Summary of Three-Way ANOVA results
| KPI | Influencing Attributes | 
| -- | -- |
| Duration | device type, region | 
| Searches | region, treatment |
| VDP Views | region |   
| Conv. Rate | device type, region | 

From Table 3.4.1 we can easily see which attributes were the key drivers for the variation in the means of the KPI for each group. Surprisingly enough we only see treatment be an driver for variation when it came to searches. From our t-test study we saw differences in means between control and treatment groups. In Table 3.4.2 We examine cross effects between categorical variables.

#### Table 3.4.2: Summary of  crossed variable that influence
Abbrevaiting device type to D, region to R, and Treatment to T.
| KPI | Influencing Crossed Attributes | 
| -- | -- |
| Duration | T $\times$ R, D $\times$ R, D $\times$ R $\times$ T
| Searches | T $\times$ D
| VDP Views | D $\times$ R $\times$ T
| Conv. Rate | T $\times$ R, D $\times$ R, D $\times$ R $\times$ T

Table 3.4.2 Shows some cross effects. These cross effects largely confirm that device type combined with Region and Treatment are significant sources of variation for the means of each KPI. 
  
**2) Summarize and highlight insights (or issues) in user behavior across various segments**
### Insights 
1. Desktop users were unaffected by the feature, while mobile users had increased engagement when the feature was delpoyed.
2. The Midwestern mobile users did not enjoy this feature. 
3. With the exception of Midwest, mobile users from other regions spent, 25% more time on the site.
4. With the exception of Midwest, mobile users from other regions performed 23% more searches. 
5. With the exception of Midwest, mobile users from other regions viewed 24% more VDP's.
6. With the exception of Midwest, mobile users bought 26% more than the control group. 

### Issues
1. Desktop users were unaffected by the feature. 
2. The feature had no *statistically significant* effect on conversion rate for mobile users in Pacific Northwest, Southeast, and Southwest.
3. The feature had no *statistically significant* effect on website duration for mobile users in Pacific Northwest, Southeast and Southwest. 
4. The feature did not have a *statistically significant* effect on VDP visits for Southeast mobile users.

**3) Provide a recommendation on whether or not to permanently deploy the feature to all users, some users, or no users**
From the results of the Engagement study, I would first only deploy this change to Mobile users. I would deploy the feature to all regions except Midwest. 
From the results of the Conversion study, I would definitely deploy the feature in New England, and later deploy in Southwest, Pacific Northwest and Southeast in that order. I would not deploy the feature in the Midwest. 


