Lambda School Data Science

*Unit 1, Sprint 3, Module 4*

---

# Lambda School Data Science Module 144
## Real-world Experiment Design

![Induction experiment](https://upload.wikimedia.org/wikipedia/commons/1/1c/Induction_experiment.png)

[Induction experiment, Wikipedia](https://commons.wikimedia.org/wiki/File:Induction_experiment.png)

## Prepare - Learn about JavaScript and Google Analytics

Python is great - but with web applications, it's impossible to avoid JavaScript. The lingua franca of the web, JavaScript runs in all browsers, and thus all front-end code must either be JS or transpiled to it. As a data scientist you don't have to learn JavaScript - but you do have to be aware of it, and being able to figure out snippets of it is an invaluable skill to connect your skills with real-world applications.

So, we leave the warm comfort of Python, and venture to a bigger world - check out the [LambdaSchool/AB-Demo repo](https://github.com/LambdaSchool/AB-Demo) and [live experiment](https://lambdaschool.github.io/AB-Demo/) before class.

Additionally, sign up for [Google Analytics](https://www.google.com/analytics) - if you're not sure on the steps or what "property" to give it, you can put a placeholder or wait until the live lecture. Google also has [Analytics documentation](https://support.google.com/analytics/) that is worth a look.

Note - if you use any of the various tracker blocking techniques, it's quite likely you won't show up in Google Analytics. You'll have to disable them to be able to fully test your experiment.

## Live Lecture - Using Google Analytics with a live A/B test

Again we won't do much Python here, but we'll put a few notes and results in the notebook as we go.

## Assignment - Set up your own A/B test!

For a baseline, a straight fork of the Lambda School repo is OK. Getting that working with your own Analytics profile is already a task. But if you get through that, stretch goals:

1. Explore Google Analytics - it's big and changes frequently, but powerful (can track conversions and events, flows, etc.)
2. Customize the experiment to be more interesting/different (try colors!)
3. Check out the various tools for setting up A/B experiments (e.g. [Optimizely](https://www.optimizely.com/) and [alternatives](https://alternativeto.net/software/optimizely/))
4. Try to get enough traffic to actually have more real data (don't spam people, but do share with friends)
5. If you do get more traffic, don't just apply a t-test - dig into the results and use both math and writing to describe your findings

Additionally, today it is a good idea to go back and review the frequentist hypothesis testing material from the first two modules. And if you feel on top of things - you can use your newfound GitHub Pages and Google Analytics skills to build/iterate a portfolio page, and maybe even instrument it with Analytics!

# Lecture Notes:

In [0]:
# import pandas library.
import pandas as pd 

In [0]:
# read in the data set.
df = pd.read_csv('Kevin_Hillstrom_MineThatData_E-MailAnalytics_DataMiningChallenge_2008.03.20.csv')
# show the data frame shape.
print(df.shape)
# show the data set with headers.
df.head()

FileNotFoundError: ignored

In [0]:
# check the data for NaN's.
df.isna().sum()

In [0]:
# were the tests 'segment' evenly split?
df.segment.value_counts()

In [0]:
# check the overall visit 'mean'.
df.visit.mean()

In [0]:
# use 'groupby' with 'segment' & 'visit.mean' to see the each 'test' and the visit rate.
df.groupby('segment').visit.mean()

In [0]:
# use 'groupby' with 'segment' & 'conversion.mean' to see the each 'test' and the conversion rate, use *100 to show %.
df.groupby('segment').conversion.mean()*100

In [0]:
# use 'groupby' with 'segment' & 'spend.mean' to see the each 'test' and the spend rate.
df.groupby('segment').spend.mean()

In [0]:
# use crosstab to put 'segment', 'visists', 'conversion' and 'spend' together.
pd.crosstab(df['segment'], [df['visit'], df['conversion'], df['spend']])

In [0]:
# import numpy for numbers work.
import numpy as np

In [0]:
# create a pivot table for 'segment', 'visit', 'conversion', 'spend', using the 'mean' for each.
pd.pivot_table(df,index=["segment"],values=["visit","conversion","spend"], aggfunc=[np.mean])

In [0]:
# create a line plot for the pivot table.
pd.pivot_table(df,index=["segment"],values=["visit","conversion","spend"], aggfunc=[np.mean]).plot()

## Resources

- [Demo Google Analytics Data](https://support.google.com/analytics/answer/6367342?hl=en) - an Analytics profile you can add to your account with real data from the Google swag store
- [Design of Experiment](https://explorable.com/design-of-experiment) - an essay summarizing some of the things to be aware of when designing and running an experiment