# Analyzing Farmburg's A/B Test

Brian is a Product Manager at FarmBurg, a company that makes a farming simulation social network game. In the FarmBurg game, you can plow, plant, and harvest different crops. ​Brian has been conducting an A/B Test with three different variants, and he wants you to help him analyze the results. Using the Python modules pandas and SciPy, you will help him make some important business decisions!

In [1]:
import pandas as pd
import numpy as np

### Project Requirements 

In [2]:
df = pd.read_csv('clicks.csv')
df.head()

Unnamed: 0,user_id,group,is_purchase
0,8e27bf9a,A,No
1,eb89e6f0,A,No
2,7119106a,A,No
3,e53781ff,A,No
4,02d48cf1,A,Yes


In [4]:
df.dtypes

user_id        object
group          object
is_purchase    object
dtype: object

We are interested in whether visitors are more likely to make a purchase if they are in any one group compared to the others. Because we want to know if there is an association between two categorical variables.

In [5]:
xtab = pd.crosstab(df['group'], df['is_purchase'])
xtab

is_purchase,No,Yes
group,Unnamed: 1_level_1,Unnamed: 2_level_1
A,1350,316
B,1483,183
C,1583,83


In [6]:
# Chi square test (2 categorical vars)

from scipy.stats import chi2_contingency
chi2, pval, dof, expected = chi2_contingency(xtab)
pval

2.4126213546684264e-35

(p-value (0.0000000000000000000000000000000000241) is close to zero. p-value is less than threshold 0.05, we 'reject the null' conlclude there is significant different in the purchase between a group A, B, C.)

In [7]:
# people visit

num_visits = len(df)
num_visits

4998

In [9]:
# visitors should purchase each week

num_sales_needed_099 = 1000 / .99
num_sales_needed_099

1010.1010101010102

In [11]:
# proportion purchase visitors each week

p_sales_needed_099 = num_sales_needed_099 / num_visits
p_sales_needed_099

0.20210104243717691

In [13]:
# visitors should purchase each week

num_sales_needed_199 = 1000 / 1.99
num_sales_needed_199

502.51256281407035

In [14]:
# proportion purchase visitors each week

p_sales_needed_199 = num_sales_needed_199 / num_visits
p_sales_needed_199

0.10054272965467594

In [15]:
# visitors should purchase each week

num_sales_needed_499 = 1000 / 4.99
num_sales_needed_499

200.40080160320642

In [16]:
# proportion purchase visitors each week

p_sales_needed_499 = num_sales_needed_499 / num_visits
p_sales_needed_499

0.040096198800161346

In [17]:
# Group A ($0.99)

samp_size_099 = np.sum(df['group'] == 'A')
samp_size_099

1666

In [18]:
# purchase by group A

sales_099 = np.sum((df['group'] == 'A') & (df['is_purchase'] == 'Yes'))
sales_099

316

In [19]:
# Group B ($1.99)

samp_size_199 = np.sum(df['group'] == 'B')
samp_size_199

1666

In [26]:
# Purchase by group B

sales_199 = np.sum((df['group'] == 'B') & (df['is_purchase'] == 'Yes'))
sales_199

183

In [27]:
# Group C ($4.99)

samp_size_499 = np.sum(df['group'] == 'C')
samp_size_499

1666

In [28]:
# Purchase by group C

sales_499 = np.sum((df['group'] == 'C') & (df['is_purchase'] == 'Yes'))
sales_499

83

Perform a binomial test using binom_test() to see if the observed purchase rate is significantly greater than p_sales_needed_099

In [29]:
# binom test
# Group A

from scipy.stats import binom_test

pval = binom_test(sales_099, samp_size_099, p_sales_needed_099, alternative='greater')
pval

0.9028081076188985

In [31]:
# Group B
pvalB = binom_test(sales_199, samp_size_199, p_sales_needed_199, alternative='greater')
pvalB

2.692606807038034e-183

In [32]:
# Group C
pvalC = binom_test(sales_499, samp_size_499, p_sales_needed_499, alternative='greater')
pvalC

0.027944826659907135

- Based on the three p-values you calculated for the binomial tests in each group and a significance threshold of 0.05, were there any groups where the purchase rate was significantly higher than the target? 
- Based on this information, what price should Brian charge for the upgrade package?

(Only C value below the treshold 0.05. Therefore, the group C is only group where we conclude that the purchase rate is significant higher than the target needed to reach $1,000 revenue per week.) 

In [33]:
final_answer = 4.99
print('Brian should charge $' + str(final_answer) + ' for the upgrade package.')

Brian should charge $4.99 for the upgrade package.
