### Analyzing Farmburg's A/B Test

Brian is a Product Manager at FarmBurg, a company that makes a farming simulation social network game. In the FarmBurg game, you can plow, plant, and harvest different crops. Brian has been conducting an A/B Test with three different variants, and he wants you to help him analyze the results. Using the Python modules pandas and SciPy, you will help him make some important business decisions!


Brian ran an A/B test with three different groups: A, B, and C. He has provided us with a CSV file of his results named clicks.csv. It has the following columns:

* user_id: a unique id for each visitor to the FarmBurg site
* group: either 'A', 'B', or 'C' depending on which group the visitor was assigned to
* is_purchase: either 'Yes' if the visitor made a purchase or 'No' if they did not.

In [2]:
# Import libraries
import pandas as pd
import numpy as np

# Read in the `clicks.csv` file as `abdata`
abdata = pd.read_csv('clicks.csv')
#Inspect data
abdata.head()

Unnamed: 0,user_id,group,is_purchase
0,8e27bf9a,A,No
1,eb89e6f0,A,No
2,7119106a,A,No
3,e53781ff,A,No
4,02d48cf1,A,Yes


Note that we have two categorical variables: group and is_purchase. We are interested in whether visitors are more likely to make a purchase if they are in any one group compared to the others. Because we want to know if there is an association between two categorical variables, we’ll start by using a Chi-Square test to address our question.

In [3]:
# Creating contingency table
Xtab = pd.crosstab(abdata.group, abdata.is_purchase)

# Chi-Square Test
from scipy.stats import chi2_contingency
chi2, pval_chi, dof, expected = chi2_contingency(Xtab)
print("P-value:", pval_chi)
print("Since our p-value is less than 0.05 threshold we can conclude that there is a significant difference in the purchase rate for groups A, B, and C.")

P-value: 2.4126213546684264e-35
Since our p-value is less than 0.05 threshold we can conclude that there is a significant difference in the purchase rate for groups A, B, and C.


##### Problem

Our day is a little less busy than expected, so we decide to ask Brian about his test.

**Us**: Hey Brian! What was that test you were running anyway?

**Brian**: We are trying to get users to purchase a small FarmBurg upgrade package. It’s called a microtransaction. We’re not sure how much to charge for it, so we tested three different price points: 0.99 (group 'A'), 1.99 (group 'B'), and 4.99 (group 'C'). It looks like significantly more people bought the upgrade package for 0.99, so I guess that’s what we’ll charge.

**Us**: Oh no! We should have asked you this before we did that Chi-Square test. That wasn’t the right test at all. It’s true that more people wanted to purchase the upgrade at 0.99; you probably expected that. What we really want to know is whether *each price point* allows us to make enough money that we can *exceed* some *target goal*. Brian, how much do you think it cost to build this feature?

**Brian**: Hmm. I guess that we need to generate a minimum of 1000$ in revenue per week in order to justify this project.

**Us**: We have some work to do!

In order to justify this feature, we will need to calculate the *necessary purchase rate* for each price point. It turns out that Brian ran his original test over the course of a week, so the number of visitors in abdata is equal to the number of visitors in a typical week.

In [9]:
# Calculating necessary purchase rate
num_visits = len(abdata)

# number of sales of 0.99 needed to get 1000$ a week 
num_sales_needed_099 = np.ceil(1000/0.99)
# proportion of week sales
p_sales_needed_099 = num_sales_needed_099 / num_visits


# # number of sales of 1.99 needed to get 1000$ a week 
num_sales_needed_199 = np.ceil(1000/1.99)
# proportion of week sales
p_sales_needed_199 = num_sales_needed_199 / num_visits

# number of sales of 4.99 needed to get 1000$ a week 
num_sales_needed_499 = np.ceil(1000/4.99)
# proportion of week sales
p_sales_needed_499 = num_sales_needed_499 / num_visits

print(p_sales_needed_099,p_sales_needed_199,p_sales_needed_499)


0.20228091236494597 0.10064025610244097 0.040216086434573826


Now let’s return to Brian’s question. To start, we want to know if the percent of Group A (the 0.99 price point) that purchased an upgrade package is significantly greater than p_sales_needed_099 (the percent of visitors who need to buy an upgrade package at 0.99 in order to make our minimum revenue target of 1,000).

To answer this question, we want to focus on just the visitors in group A. Then, we want to compare the number of purchases in that group to p_sales_needed_099.

Since we have a single sample of categorical data and want to compare it to a hypothetical population value, a binomial test is appropriate. In order to run a binomial test for group A, we need to know two pieces of information:

* The number of visitors in group A (the number of visitors who were offered the 0.99 price point)
* The number of visitors in Group A who made a purchase

In [18]:
# Let's go back to our contingency table and get those values from there
Xtab

is_purchase,No,Yes
group,Unnamed: 1_level_1,Unnamed: 2_level_1
A,1350,316
B,1483,183
C,1583,83


In [23]:
# Group A number of visitors and their purchase
samp_size_099 = Xtab.No[0] + Xtab.Yes[0]
sales_099 = Xtab.Yes[0]

#Alternative way
#samp_size_099 = np.sum(abdata.group == 'A')
#sales_099 = np.sum((abdata.group == 'A') & (abdata.is_purchase == 'Yes'))

print(samp_size_099, sales_099)

1666 316


In [24]:
# Group B number of visitors and their purchase
samp_size_199 = Xtab.No[1] + Xtab.Yes[1]
sales_199 = Xtab.Yes[1]

#Alternative way
#samp_size_199 = np.sum(abdata.group == 'B')
#sales_199 = np.sum((abdata.group == 'B') & (abdata.is_purchase == 'Yes'))

print(samp_size_199, sales_199)

1666 183


In [25]:
# Group C number of visitors and their purchase
samp_size_499 = Xtab.No[2] + Xtab.Yes[2]
sales_499 = Xtab.Yes[2]

#Alternative way
#samp_size_499 = np.sum(abdata.group == 'C')
#sales_499 = np.sum((abdata.group == 'C') & (abdata.is_purchase == 'Yes'))

print(samp_size_499, sales_499)

1666 83


For Group A (0.99 price point), perform a binomial test using binom_test() to see if the observed purchase rate is significantly greater than p_sales_needed_099. Remember that there are four inputs to binom_test():

* x will be the number of purchases for Group A
* n will be the total number of visitors assigned group A
* p will be the target percent of purchases for the 0.99 price point
* alternative will indicate the alternative hypothesis for this test; in this case, we want to know if the observed purchase rate is significantly 'greater' than the purchase rate that results in the minimum revenue target.

In [30]:
# Binomial test Group A
from scipy.stats import binom_test
p_valueA = binom_test(sales_099, n=samp_size_099, p =p_sales_needed_099, alternative = 'greater')
print("Binomial test P-value:", p_valueA)
print("Since our p-value is more than 0.05 threshold we can conclude that the purchase rate is NOT significantly higher than the target needed to reach $1000 revenue per week.")

Binomial test P-value: 0.9058887362654584
Since our p-value is more than 0.05 threshold we can conclude that the purchase rate is NOT significantly higher than the target needed to reach $1000 revenue per week.


In [32]:
# Binomial test Group B
from scipy.stats import binom_test
p_valueB = binom_test(sales_199, n=samp_size_199, p =p_sales_needed_199, alternative = 'greater')
print("Binomial test P-value:", p_valueB)
print("Since our p-value is more than 0.05 threshold we can conclude that the purchase rate is NOT significantly higher than the target needed to reach $1000 revenue per week.")

Binomial test P-value: 0.11441815431122185
Since our p-value is more than 0.05 threshold we can conclude that the purchase rate is NOT significantly higher than the target needed to reach $1000 revenue per week.


In [35]:
# Binomial test Group C
from scipy.stats import binom_test
p_valueC = binom_test(sales_499, n=samp_size_499, p =p_sales_needed_499, alternative = 'greater')
print("Binomial test P-value:", p_valueC)
print("Since our p-value is less than 0.05 threshold we can conclude that the purchase rate is significantly higher than the target needed to reach $1000 revenue per week.")

Binomial test P-value: 0.029642608610084057
Since our p-value is less than 0.05 threshold we can conclude that the purchase rate is significantly higher than the target needed to reach $1000 revenue per week.


#### Conclusion

After running binomial test for Groups A,B and C and getting their *pvalue* we can conclude that only Group C has pvalue less than 0.05 threshold. Which makes it the only group that will bring more revenue than target needed to reach 1000 a week. Therefore, Brian should charge 4.99 for the upgrade.