# Project: Product Management From Scratch

The boutique ice cream sandwich startup you've joined, Snackr, has gotten by on the personality of it's founder, I. Sandwich (real name?), for too long.  The last 2 quarters of features the team has launched to the website have not landed with customers.

You've decided it's time to step up, lay out a plan for how you're going to identify paths forward which are more likely to succeed than "whatever I. Sandich happened to promise a random person he met at a bar last night".  You pitch the team on an iterative design process which brings qualitative and quantitative techniques together in order to minimize industry standard risk types so that you can move forward confidently.

Someone in your company recently learned about a/b experiments - so they instrumented tracking and ran an experiment based on your founder's unfounded random idea.

Unsurprisingly - the results of the experiment flopped. 

You know that a/b experiments give you confidence if something you tried worked, but unless very carefully designed they don't tell you what to do next.

On the bright side! 

You now have some user event logs you can dig through in order to surface user personas and a journey.  You tell the team you'd like to collect all of the known assumptions before you launch the next experiment to the site so you have the best chance of success.  You tell the CEO and Board (wait - is it really just his friends from college and mom?) that at the end of your week long design sprint you're going to launch an experiment.  

Your goal end goal for this project is to identify actionable hypotheses based on the data for your team to act on based on the type of risks identified.



## NOTES
Along the way you'll identify:
 - Actionable user personas & stages of a user journey
 - Develop hypotheses based on the data 
 
Desired Actions to identify:
 - User Interviews
 - Usability Testing
 - A/B Experiments


<!--
%%ulab_page_divider
--><hr/>

## Section 1 - Evaluate previous a/b experiment [STUDENT]

In [96]:
import numpy as np
import pandas as pd
import glob
import datetime
import itertools

## One of your engineers helpfully exported all of the events from the previous a/b experiment for you.  
## She sent them over to you broken down by country and explained how to put them all into one dataframe.
## 1: Ensure you got data
## 2: Check to see if the previous a/b experiment was valid (95% confidence)
## 3: Check if the experiment or control won 
##    by performing a chi-square test on outcomes and reporting on if the difference is signifcant.

log_files = glob.glob(r'events_*.csv')
df = pd.concat((pd.read_csv(f) for f in log_files))

## Section 1 - Evaluate previous a/b experiment [SOLUTION]

In [95]:
import numpy as np
from scipy import stats
import pandas as pd
import glob
import datetime
import itertools

## One of your engineers helpfully exported all of the events from the previous a/b experiment for you.  
## She sent them over to you broken down by country and explained how to put them all into one dataframe.
## 1: Ensure you got data
## 2: Check to see if the previous a/b experiment was valid
## 3: Check if the experiment or control won
log_files = glob.glob(r'events_*.csv')
df = pd.concat((pd.read_csv(f) for f in log_files))

In [83]:
#S1_1

df.describe()

Unnamed: 0,event_id,user_uuid,event_time,device_type,session_uuid,experiment_group,user_country,event_page,event_type
count,250957,250957,250957,250957,250957,250957,250957,250957,250957
unique,250957,16450,202835,4,45604,2,3,4,5
top,7c6b8c89-10d7-43d0-af33-df2884522815,75b05502-5ed1-48e8-886d-ed093ea46e56,2019-09-28 21:41:08,mobile_web,1b6dfa64-ca7e-496d-9e4b-d5125812df2d,experiment,JP,home_page,view
freq,1,47,6,63336,10,125565,248455,96817,164668


In [88]:
#S1_2
## Dedupe the events

df = df[~df.index.duplicated()]
df[df.index.duplicated()]

In [87]:
## Get the number of sessions by experiment or control group
df.groupby('experiment_group').apply(lambda x: x['session_uuid'].nunique()).reset_index(name='count')

Unnamed: 0,experiment_group,count
0,control,22725
1,experiment,22879


In [89]:
## Ensure the experiment was split evenly between sessions
desired_p = 0.5
z_score = 1.960  #TODO Student to get the z-score for 95% confidence range?

total_control =  22725 #TODO Student to get the total number of events
total_experiment = 22879 #TODO Student to get the total number of events


standard_deviation = np.sqrt((0.5*0.5)/(total_control + total_experiment))
margin_error = standard_deviation * z_score

confidence_interval = (desired_p-margin_error, desired_p+margin_error)

p_hat = total_control / (total_control + total_experiment)

print("p_hat:\t\t", p_hat)
print("confidence int:\t", confidence_interval)
print("Continue analysis?:", p_hat > confidence_interval[0] and p_hat < confidence_interval[1])

p_hat:		 0.4983115516182791
confidence int:	 (0.49541093079380116, 0.50458906920619884)
Continue analysis?: True


In [90]:
## Get the number of purchases by experiment or control group
df.groupby('experiment_group')['event_type'].apply(lambda x: (x == 'purchase').sum()).reset_index(name='count')


Unnamed: 0,experiment_group,count
0,control,509
1,experiment,458


In [77]:
control_convert = df.groupby('experiment_group')['event_type'].apply(lambda x: (x == 'purchase').sum())['control']
control_sessions = df.groupby('experiment_group').apply(lambda x: x['session_uuid'].nunique())['control']


experiment_convert = df.groupby('experiment_group')['event_type'].apply(lambda x: (x == 'purchase').sum())['experiment']
experiment_sessions = df.groupby('experiment_group').apply(lambda x: x['session_uuid'].nunique())['experiment']

In [92]:
#S1_3
#chisuared test

#P-value here greater than .05 (our test for significance).  This result could be due to randomness

control_stats = [control_convert, control_sessions - control_convert]
experiment_stats = [experiment_convert, experiment_sessions - experiment_convert]
outcomes =  np.array([control_stats, experiment_stats])
stats.chi2_contingency(outcomes)

(2.9976282106370364,
 0.08338650751208887,
 1,
 array([[   481.86727041,  22243.13272959],
        [   485.13272959,  22393.86727041]]))

## Section 2 - Develop Actionable Personas and Journey Steps [STUDENT]

In [104]:
## Your work goes here

## Section 2 - Develop Actionable Personas and Journey Steps [SOLUTION]

In [17]:
#S1_3
#TODO - MATT - Generate opportunities in dataset.
#1 - bounce
#2 - bug
#3 - typo
#4 - l10n
#5 - price
#6 - flow

In [97]:
#S1_1
devices = df.device_type.unique()

In [98]:
#S1_1
countries = df.user_country.unique()

In [107]:
# Cartesian Product of devices and region
list(itertools.product(devices,countries))

[('desktop_web', 'JP'),
 ('desktop_web', 'FR'),
 ('desktop_web', 'CN'),
 ('desktop_web', 'US'),
 ('desktop_web', 'UK'),
 ('ios', 'JP'),
 ('ios', 'FR'),
 ('ios', 'CN'),
 ('ios', 'US'),
 ('ios', 'UK'),
 ('mobile_web', 'JP'),
 ('mobile_web', 'FR'),
 ('mobile_web', 'CN'),
 ('mobile_web', 'US'),
 ('mobile_web', 'UK'),
 ('android', 'JP'),
 ('android', 'FR'),
 ('android', 'CN'),
 ('android', 'US'),
 ('android', 'UK')]

In [105]:
pages = df.event_page.unique()

In [106]:
events = df.event_type.unique()

In [102]:
list(itertools.product(pages, events))

[('home_page', 'view'),
 ('home_page', 'share'),
 ('home_page', 'favorite'),
 ('home_page', 'cart'),
 ('home_page', 'purchase'),
 ('search_page', 'view'),
 ('search_page', 'share'),
 ('search_page', 'favorite'),
 ('search_page', 'cart'),
 ('search_page', 'purchase'),
 ('item_page', 'view'),
 ('item_page', 'share'),
 ('item_page', 'favorite'),
 ('item_page', 'cart'),
 ('item_page', 'purchase'),
 ('marketing_page', 'view'),
 ('marketing_page', 'share'),
 ('marketing_page', 'favorite'),
 ('marketing_page', 'cart'),
 ('marketing_page', 'purchase')]

## Section 3 - Generate Hypotheses and Recommended Next Step [STUDENT]

## Section 3 - Generate Hypotheses and Recommended Next Step [SOLUTION]

Because **evidence**   
we believe that by **product change**  
for **defined audience**  
users will be **XX% (more/less)** likely   
to **(view,share,favorite,cart,purchase)**  
  
In order to be certain we will run an a/b test for **DD days**  

## Section 5 - Draw Product Insights


In [21]:
#S5_1
df.describe()

Unnamed: 0,event_id,user_uuid,event_time,device_type,session_uuid,experiment_group,user_country,event_page,event_type
count,1247914,1247914,1247914,1247914,1247914,1247914,1247914,1247914,1247914
unique,1247914,81641,525817,4,227081,2,5,4,5
top,bff56d56-b039-44e8-8bb5-9a0e99a83f2c,9d3592ba-f1ed-45ce-a7c2-7f242ee353a9,2019-09-27 16:47:06,mobile_web,68ef8208-8aa6-44e6-a137-d1ae72afa22c,experiment,CN,home_page,view
freq,1,49,12,312812,10,624849,250957,482690,817324


In [None]:
#S5_3
tktk - out of scope?

In [25]:
#S5_4
#tktk - to do

In [26]:
#S5_5
#tktk - todo

## Section 6

#S6_1 tktk - todo
#S6_2 tktk - todo
#S6_3 tktk - todo
#S6_4 tktk - todo