# A/B Testing at Nosh Mish Mosh

The Nosh Mish Mosh is a recipe and ingredient meal delivery service. They ship the raw materials and you get to cook them at your home! They’ve decided to hire a data analyst to help make product and interface decisions. Get started to help them figure out the amount of data they’ll need to make meaningful decisions.

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

## Nosh Mish Mosh: An Assortment of Edible Aliments

1. We’ve collected customer data for the past week and exposed it through a Python library, so first import `noshmishmosh`.

In [2]:
import noshmishmosh

## A/B Testing at Nosh Mish Mosh
3. Nosh Mish Mosh wants to run an experiment to see if we can convince more people to purchase meal plans if we use a more artisanal-looking vegetable selection. We’ve photographed these modern meals with blush tomatoes and graffiti eggplants, but aren’t sure if this strategy will sell enough units to benefit from establishing a business relationship with a new provider.

   Before running this experiment, of course, we need to know the sample size that will be required to detect the difference we are hoping for. There are three things we need to know before we can determine that number.

    -    the **Baseline Conversion Rate**
    -    **Minimum Detectable Effect** (desired lift)
    -    and the **Statistical Significance Threshold**


4. Let’s get the ball rolling on finding those numbers! In order to get our baseline, we need to first know how many users visit the site in a typical week. Let’s grab that logged information, which is stored in `noshmishmosh.customer_visits`. Assign that to a new variable called `all_visitors`.


In [15]:
all_visitors = noshmishmosh.customer_visits
all_visitors[0:3]

[{'purchased': False,
  'clickedthrough': True,
  'id': 83421,
  'moneyspent': 0,
  'name': 'Michael Todd'},
 {'purchased': False,
  'clickedthrough': True,
  'id': 46042,
  'moneyspent': 0,
  'name': 'Brianna Harmon'},
 {'purchased': False,
  'clickedthrough': False,
  'id': 23766,
  'moneyspent': 0,
  'name': 'Mario Arnold'}]

5. Next we need to know how many visitors to the site ultimately end up buying a meal or set of meals in a typical week. We have that information saved into `purchasing_customers` field on noshmishmosh. Save that information into a variable called `paying_visitors`.

In [16]:
paying_visitors = noshmishmosh.purchasing_customers
paying_visitors[0:3]

[{'purchased': True,
  'clickedthrough': True,
  'id': 15153,
  'moneyspent': 39.01,
  'name': 'Jacob Harmon'},
 {'purchased': True,
  'clickedthrough': True,
  'id': 74271,
  'moneyspent': 10.16,
  'name': 'Wayne Potter'},
 {'purchased': True,
  'clickedthrough': True,
  'id': 83489,
  'moneyspent': 36.88,
  'name': 'Jimmy Carrillo'}]

6. Calculate the lengths of the two lists, saving the results into variables called `total_visitor_count` and `paying_visitor_count`, respectively.

In [9]:
total_visitor_count = len(all_visitors)
paying_visitor_count = len(paying_visitors)
print('The total visitors to the site is: ' + str(total_visitor_count) + '\n')
print('The total of visitors who puchased meals is: ' + str(paying_visitor_count))

The total visitors to the site is: 500

The total of visitors who puchased meals is: 93


7. Now to get the baseline: Divide the number of purchasing visitors by the number of total visitors. Save the result in a variable called `baseline_percent`. Since we want a percentage as our answer, multiply the result by `100.0`.

In [10]:
baseline_percent = paying_visitor_count / total_visitor_count * 100

Print out the `baseline_percent` so we know what to use for our baseline percentage in the A/B Sample Size Calculator.

In [12]:
print('The baseline is ' + str(baseline_percent) + '%.')

The baseline is 18.6%.


## Mish Mosh B'Gosh: The Effect Size
9. These rainbow fingerling potatoes don’t come cheap. We’d like to know for sure that, with this change, we’ll be pulling in at least $1240 more every week. In order to figure out how many more customers we need, we’ll have to investigate the average revenue generated from a given sale. Luckily we have a list of the money spent by each customer in a typical week: `noshmishmosh.money_spent`. Save that list into a variable called `payment_history`.

In [14]:
payment_history = noshmishmosh.money_spent
payment_history[0:3]

[39.01, 10.16, 36.88]

10. We need to find how many purchases it would take to reach $1240 in additional revenue using our historical data.

   Let’s start with computing the average payment per paying customer using np.mean, saving it as average_payment.


In [20]:
average_payment = np.mean(payment_history)
print('The average purchase is $' + str(round(average_payment,2)) + '.')

The average purchase is $26.54.


11. We want to know how many of these “usual” payments it would take to clear our \\$1240 mark. Round the number up using `np.ceil` (because that’s how many new customers it takes to bring in more than \$1240). Save that value into a `new_customers_needed` variable.

In [21]:
new_customers_needed = np.ceil(1240 / average_payment)
print(str(new_customers_needed) + ' new customers are needed to clear the $1240 mark.')

47.0 new customers are needed to clear the $1240 mark.


12. Now find the additional percent of weekly visitors who must make a purchase in order to make this change worthwhile. Do this by dividing the number of customers by the total visitor count for a typical week (calculated earlier), and multiplying by 100. Save the result in a variable called `percentage_point_increase`. Print `percentage_point_increase` to see what it is.

In [23]:
percentage_point_increase = (new_customers_needed / total_visitor_count) * 100
print('The additional percent of weekly visitors who must make a purchase in order to make this change worthwhile is: ' + str(percentage_point_increase) + '%')

The additional percent of weekly visitors who must make a purchase in order to make this change worthwhile is: 9.4%


13. In order to find our minimum detectable effect/desired lift, we need to express `percentage_point_increase` as a percent of `baseline_percent`. You can do this by dividing `percentage_point_increase` by `baseline_percent` and multiplying by `100.0`.

    Store the results in a variable called `mde`.

In [25]:
mde = percentage_point_increase / baseline_percent * 100

14. Print out the result `mde`.

In [29]:
print('The minimum detectable effect, or desired lift, is ' + str(round(mde,2)) + '%.')

The minimum detectable effect, or desired lift, is 50.54%.


## Nosh Mish Mosh: Tying It All Together
15. The last thing we need to calculate the sample size for Nosh Mish Mosh’s artisanal rebranding is our statistical significance threshold. We’d like to be fairly certain, but this isn’t going to be a million dollar decision, so let’s go with 10%.

16. Now put it all together! Punch the baseline, the minimum detectable effect, and the statistical significance threshold into the calculator and evaluate how many people need to be shown the new assets before we can check if the results are a significant improvement. Save the results in a variable called `ab_sample_size`.

In [30]:
ab_sample_size = 440
print(str(ab_sample_size) + ' people need to be shown the new assets before we can check if the results are a significant improvement.')

440 people need to be shown the new assets before we can check if the results are a significant improvement.
