In [68]:
# Import libraries
import pandas as pd
import numpy as np

# Read in the `clicks.csv` file as `abdata`
abdata = pd.read_csv('clicks.csv')

In [69]:
abdata.head()

Unnamed: 0,user_id,group,is_purchase
0,8e27bf9a,A,No
1,eb89e6f0,A,No
2,7119106a,A,No
3,e53781ff,A,No
4,02d48cf1,A,Yes


In [70]:
Xtab = pd.crosstab(abdata.group, abdata.is_purchase)
Xtab

is_purchase,No,Yes
group,Unnamed: 1_level_1,Unnamed: 2_level_1
A,1350,316
B,1483,183
C,1583,83


In [71]:
from scipy.stats import chi2_contingency

chi2, pval, dof, expected = chi2_contingency(Xtab)
pval 

2.4126213546684264e-35

Our day is a little less busy than expected, so we decide to ask Brian about his test.

Us: Hey Brian! What was that test you were running anyway?

Brian: We are trying to get users to purchase a small FarmBurg upgrade package. It’s called a microtransaction. We’re not sure how much to charge for it, so we tested three different price points: $0.99 (group 'A'), $1.99 (group 'B'), and $4.99 (group 'C'). It looks like significantly more people bought the upgrade package for $0.99, so I guess that’s what we’ll charge.

Us: Oh no! We should have asked you this before we did that Chi-Square test. That wasn’t the right test at all. It’s true that more people wanted to purchase the upgrade at $0.99; you probably expected that. What we really want to know is whether each price point allows us to make enough money that we can exceed some target goal. Brian, how much do you think it cost to build this feature?

Brian: Hmm. I guess that we need to generate a minimum of $1000 in revenue per week in order to justify this project.

Us: We have some work to do!

In order to justify this feature, we will need to calculate the necessary purchase rate for each price point. Let’s start by calculating the number of visitors to the site this week.

It turns out that Brian ran his original test over the course of a week, so the number of visitors in abdata is equal to the number of visitors in a typical week.

In [72]:
num_visits = len(abdata)
num_visits

4998

In [73]:
num_sales_needed_099 = np.ceil(1000/0.99)
num_sales_needed_099

1011.0

In [74]:
p_sales_needed_099 = num_sales_needed_099/num_visits*100
print("The proportion of weekly visitors needed at 0.99$ is {} %".format(p_sales_needed_099))

The proportion of weekly visitors needed at 0.99$ is 20.228091236494596 %


In [75]:
num_sales_needed_199 = np.ceil(1000/1.99)
num_sales_needed_499 = np.ceil(1000/4.99)
p_sales_needed_199 = num_sales_needed_199/num_visits*100
print("The proportion of weekly visitors needed at 1.99$ is {} %".format(p_sales_needed_199))
p_sales_needed_499 = num_sales_needed_499/num_visits*100
print("The proportion of weekly visitors needed at 4.99$ is {} %".format(p_sales_needed_499))


The proportion of weekly visitors needed at 1.99$ is 10.064025610244096 %
The proportion of weekly visitors needed at 4.99$ is 4.021608643457383 %


In [76]:
samp_size_099 = np.sum(abdata.group == 'A')
print(samp_size_099)
sales_099 = np.sum((abdata.group == 'A') & (abdata.is_purchase == 'Yes'))
print(sales_099)

1666
316


In [77]:
samp_size_199 = np.sum(abdata.group == 'B')
sales_199 = np.sum((abdata.group == 'B') & (abdata.is_purchase == 'Yes'))
samp_size_499 = np.sum(abdata.group == 'C')
sales_499 = np.sum((abdata.group == 'C') & (abdata.is_purchase == 'Yes'))

In [79]:
from scipy.stats import binom_test

pvalueA = binom_test(sales_099,samp_size_099,p_sales_needed_099/100,alternative='greater')
print("The P value for group A is : {} ".format(pvalueA))
pvalueB = binom_test(sales_199,samp_size_199,p_sales_needed_199/100,alternative='greater')
print("The P value for group B is : {} ".format(pvalueB))
pvalueC = binom_test(sales_499,samp_size_499,p_sales_needed_499/100,alternative='greater')
print("The P value for group C is : {} ".format(pvalueC))


The P value for group A is : 0.9058887362654583 
The P value for group B is : 0.1144181543112181 
The P value for group C is : 0.029642608610084484 


  pvalueA = binom_test(sales_099,samp_size_099,p_sales_needed_099/100,alternative='greater')
  pvalueB = binom_test(sales_199,samp_size_199,p_sales_needed_199/100,alternative='greater')
  pvalueC = binom_test(sales_499,samp_size_499,p_sales_needed_499/100,alternative='greater')


pvalueC is the only p-value below the threshold of 0.05. Therefore, the C group is the only group where we would conclude that the purchase rate is significantly higher than the target needed to reach $1000 revenue per week.

Therefore, Brian should charge $4.99 for the upgrade.