## Analyzing Farmburg's A/B Test

Brian is a Product Manager at FarmBurg, a company that makes a farming simulation social network game. In the FarmBurg game, you can plow, plant, and harvest different crops. Brian has been conducting an A/B Test with three different variants, and he wants you to help him analyze the results. Using the Python modules pandas and SciPy, you will help him make some important business decisions!

In [19]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency
from scipy.stats import binom_test

abdata = pd.read_csv('clicks.csv')

### Project Requirements

Inspect the data using the .head() method.

In [4]:
print(abdata.head())

    user_id group is_purchase
0  8e27bf9a     A          No
1  eb89e6f0     A          No
2  7119106a     A          No
3  e53781ff     A          No
4  02d48cf1     A         Yes


Note that we have two categorical variables: group and is_purchase. We are interested in whether visitors are more likely to make a purchase if they are in any one group compared to the others.

We first need to create a contingency table of the variables group and is_purchase. Use pd.crosstab() to create this table and name the result Xtab, then print it out. Which group appears to have the highest number of purchases?

Then, use the function chi2_contingency with the data in Xtab to calculate the p-value. Save the p-value to a variable named pval and print the result. Using a significance threshold of 0.05, is there a significant difference in the purchase rate for groups A, B, and C?

In [5]:
Xtab = pd.crosstab(abdata.group, abdata.is_purchase)
print(Xtab)

is_purchase    No  Yes
group                 
A            1350  316
B            1483  183
C            1583   83


In [7]:
chi2, pval, dof, expected = chi2_contingency(Xtab)
print('The p-value is equal to', pval)
print ('The significance threshold is significant' if pval < 0.05 else 'The significance threshold is not significant')

The p-value is equal to 2.4126213546684264e-35
The significance threshold is significant


Our day is a little less busy than expected, so we decide to ask Brian about his test.

Us: Hey Brian! What was that test you were running anyway?

Brian: We are trying to get users to purchase a small FarmBurg upgrade package. It’s called a microtransaction. We’re not sure how much to charge for it, so we tested three different price points: 0.99  (group A), 1.99 (group B), and 4.99 (group C). It looks like significantly more people bought the upgrade package for 0.99, so I guess that’s what we’ll charge.

Us: Oh no! We should have asked you this before we did that Chi-Square test. That wasn’t the right test at all. It’s true that more people wanted to purchase the upgrade at 0.99; you probably expected that. What we really want to know is whether each price point allows us to make enough money that we can exceed some target goal. Brian, how much do you think it cost to build this feature?

Brian: Hmm. I guess that we need to generate a minimum of $1000 in revenue per week in order to justify this project.

Us: We have some work to do!

In order to justify this feature, we will need to calculate the necessary purchase rate for each price point. Let’s start by calculating the number of visitors to the site this week.

It turns out that Brian ran his original test over the course of a week, so the number of visitors in abdata is equal to the number of visitors in a typical week. Calculate the number of visitors in the data and save the value in a variable named num_visits. Make sure to print the value.

In [8]:
num_visits = len(abdata)
print(num_visits)

4998


Now that we know how many visitors we generally get each week (num_visits), we need to calculate the number of visitors who would need to purchase the upgrade package at each price point (0.99, 1.99, 4.99) in order to generate Brian’s minimum revenue target of $1,000 per week.

In [14]:
num_sales_needed_099 = 1000 / 0.99
print(num_sales_needed_099)
p_sales_needed_099 = num_sales_needed_099 / num_visits
print('{:.2%} visitors need to make a purchase'.format(p_sales_needed_099))

1010.1010101010102
20.21% visitors need to make a purchase


In [17]:
num_sales_needed_199 = 1000 / 1.99
print(num_sales_needed_199)
p_sales_needed_199 = num_sales_needed_199 / num_visits
print('{:.2%} visitors need to make a purchase'.format(p_sales_needed_199))

502.51256281407035
10.05% visitors need to make a purchase


In [18]:
num_sales_needed_499 = 1000 / 4.99
print(num_sales_needed_499)
p_sales_needed_499 = num_sales_needed_499 / num_visits
print('{:.2%} visitors need to make a purchase'.format(p_sales_needed_499))

200.40080160320642
4.01% visitors need to make a purchase


Now let’s return to Brian’s question. To start, we want to know if the percent of Group A (the 0.99 price point) that purchased an upgrade package is significantly greater than p_sales_needed_099 (the percent of visitors who need to buy an upgrade package at 0.99 in order to make our minimum revenue target of $1,000).

To answer this question, we want to focus on just the visitors in group A. Then, we want to compare the number of purchases in that group to p_sales_needed_099.

Since we have a single sample of categorical data and want to compare it to a hypothetical population value, a binomial test is appropriate. In order to run a binomial test for group A, we need to know two pieces of information:
- The number of visitors in group A (the number of visitors who were offered the 0.99 price point)
- The number of visitors in Group A who made a purchase

Calculate these two numbers and save them as samp_size_099 and sales_099, respectively. Note that you can use the contingency table that you printed earlier to get these numbers OR you can use Python syntax.

In [22]:
group_a = abdata[abdata.group == 'A']
samp_size_099 = len(group_a)
print(samp_size_099)
sales_099 = len(group_a[group_a.is_purchase == 'Yes'])
print(sales_099)

1666
316


Calculate the sample size and number of purchases in group B (the 1.99 price point) and save them as samp_size_199 and sales_199, respectively. Then do the same for group C (the 4.99 price point) and save them as samp_size_499 and sales_499, respectively.

In [24]:
group_b = abdata[abdata.group == 'B']
samp_size_199 = len(group_b)
print(samp_size_199)
sales_199 = len(group_b[group_b.is_purchase == 'Yes'])
print(sales_199)

1666
183


In [25]:
group_c = abdata[abdata.group == 'C']
samp_size_499 = len(group_c)
print(samp_size_499)
sales_499 = len(group_c[group_c.is_purchase == 'Yes'])
print(sales_499)

1666
83


For Group A (0.99 price point), perform a binomial test using binom_test() to see if the observed purchase rate is significantly greater than p_sales_needed_099. Remember that there are four inputs to binom_test():
- x will be the number of purchases for Group A
- n will be the total number of visitors assigned group A
- p will be the target percent of purchases for the 0.99 price point
- alternative will indicate the alternative hypothesis for this test; in this case, we want to know if the observed purchase rate is significantly 'greater' than the purchase rate that results in the minimum revenue target.

Save the results to pvalueA, and print its value.

In [26]:
pvalueA = binom_test(sales_099, samp_size_099, p_sales_needed_099, alternative = 'greater')
print('The pvalueA is equal to', pval)
print ('The significance threshold is significant' if pvalueA < 0.05 else 'The significance threshold is not significant')

The pvalueA is equal to 2.4126213546684264e-35
The significance threshold is not significant


For Group B (1.99 price point), perform a binomial test to see if the observed purchase rate is significantly greater than p_sales_needed_199.

Save the results to pvalueB, and print its value.

In [27]:
pvalueB = binom_test(sales_199, samp_size_199, p_sales_needed_199, alternative = 'greater')
print('The pvalueB is equal to', pval)
print ('The significance threshold is significant' if pvalueB < 0.05 else 'The significance threshold is not significant')

The pvalueB is equal to 2.4126213546684264e-35
The significance threshold is not significant


For Group C (4.99 price point), perform a binomial test to see if the observed purchase rate is significantly greater than p_sales_needed_499.

Save the results to pvalueC, and print its value.

In [28]:
pvalueC = binom_test(sales_499, samp_size_499, p_sales_needed_499, alternative = 'greater')
print('The pvalueC is equal to', pval)
print ('The significance threshold is significant' if pvalueC < 0.05 else 'The significance threshold is not significant')

The pvalueC is equal to 2.4126213546684264e-35
The significance threshold is significant


Based on the three p-values you calculated for the binomial tests in each group and a significance threshold of 0.05, were there any groups where the purchase rate was significantly higher than the target? Based on this information, what price should Brian charge for the upgrade package?

**The only group where the purchase rate was significantly higher was C so, based on this information, Brian should charge 4.99 dollars for the upgrade package**