# Farm Simulator

There is a hypothetical farm simulator game that is played on a social media network.
In the game you can grow, plant, and harvest different crops.

While the game is free to play, users are able to make transactions in the game which assist them and their farm in different ways.
The developers want to release a new upgrade package and are unsure of how much to charge.

## What to Charge?
In this project we will develop an A/B test that segments our users and figures out what the optimal price point is in order to generate the most revenue.

We have 3 price points and groups:
- $0.99 : A group
- $1.99 : B group
- $4.99 : C group

First we will import our database and create a dataframe from it. We could filter with SQL queries but this will make the demonstration simpler.

In [1]:
import sqlite3
import pandas as pd
# connects to db
conn = sqlite3.connect('users.db')

users = pd.read_sql_query("SELECT * FROM users", conn)

# previewing the table
print(users.head())

    user_id test_group is_purchase
0  8e27bf9a          A          No
1  eb89e6f0          A          No
2  7119106a          A          No
3  e53781ff          A          No
4  02d48cf1          A         Yes


Next we can run a Chi-Square test to create a contingency table of the variables `test_group` and `is_purchase`.

In [2]:
cross_tab = pd.crosstab(users.test_group, users.is_purchase)
# viewing the cross_tab
print(cross_tab)

is_purchase    No  Yes
test_group            
A            1350  316
B            1483  183
C            1583   83


So the most purchases were at the $0.99 price point, but is that significant? Let's compare them.

In [3]:
from scipy.stats import chi2_contingency

# Calculate the p-value
chi2, pval, dof, expected = chi2_contingency(cross_tab)

# Print the p-value
print(pval)

2.4126213546684264e-35


Since the p-value is less than our threshold of 0.05, there is a difference between the groups.

However, does that let us meet our goals?
Let's say the company wants to generate a minimum of $1000 per week in order to justify the projects expense.

In order to solve this, we need to calculate teh necessary purchase rate for each price point.

Let's begin by calculating the number of visitors in a typical week - which is the timeframe the database was collected over.

In [4]:
num_visits = len(users)
print(num_visits)

4998


Now we can derive how many purchases we need at each price point to justify our new product.

In [5]:
# The purchase rate at 0.99
num_sales_needed_099 = 1000/0.99
p_sales_needed_099 = num_sales_needed_099/num_visits

print(f'Percent of sales needed at $0.99: {round(p_sales_needed_099,4)*100}')

# The purchase rate needed at 1.99
num_sales_needed_199 = 1000/1.99
p_sales_needed_199 = num_sales_needed_199/num_visits
print(f'Percent of sales needed at $1.99: {round(p_sales_needed_199,4)*100}')

# The purchase rate needed at 4.99
num_sales_needed_499 = 1000/4.99
p_sales_needed_499 = num_sales_needed_499/num_visits
print(f'Percent of sales needed at $4.99: {round(p_sales_needed_499,4)*100}')

Percent of sales needed at $0.99: 20.21
Percent of sales needed at $1.99: 10.05
Percent of sales needed at $4.99: 4.01


So we need a much higher percentage of sales at the lower price point than the higher one to reach our goals.

## Binomial Testing

Let's perform a binomial test to compare our sample with the hypothetical population.
Since we now know what number we need to reach for each price point to be viable, we can compare our sample data with it.

First we'll get the sample size and sales at each category.

In [6]:
import numpy as np

# Sample size and sales at 0.99
samp_size_099 = np.sum(users.test_group == 'A')
sales_099 = np.sum((users.test_group == 'A') & (users.is_purchase == 'Yes'))

# Sample size and sales at 1.99
samp_size_199 = np.sum(users.test_group == 'B')
sales_199 = np.sum((users.test_group == 'B') & (users.is_purchase == 'Yes'))

# Sample size and sales at 4.99
samp_size_499 = np.sum(users.test_group == 'C')
sales_499 = np.sum((users.test_group == 'C') & (users.is_purchase == 'Yes'))

Then we'll perform our test. Our values will be:
- `x`: The number of purchases in each group
- `n`: The total number of visitors in the group
- `p`: the target percent of purchase for the price point
- `alternative`: The alternative hypothesis for our test, in this case if the observed purchase rate is significantly `greater` than than the purchase rate that results in meeting our minimum revenue target

In [7]:
from scipy.stats import binom_test

# P-value for group A
pvalueA = binom_test(sales_099, n=samp_size_099, p=p_sales_needed_099, alternative='greater')
print(f'P-value for group A: {pvalueA}')

# P-value for group B
pvalueB = binom_test(sales_199, n=samp_size_199, p=p_sales_needed_199, alternative='greater')
print(f'P-value for group B: {pvalueB}')

# P-value for group C
pvalueC = binom_test(sales_499, n=samp_size_499, p=p_sales_needed_499, alternative='greater')
print(f'P-value for group C: {pvalueC}')


P-value for group A: 0.9028081076188554
P-value for group B: 0.11184562623740614
P-value for group C: 0.02794482665983064


## Conclusion

The only p-value that rejected our null hypothesis was group C at a price point of $4.99.

This is the price we should charge for our package because it lets us reach our minimum revenue per week target of $1000.
