# Sample Size Calculator (WIP)
Purpose: we want a version of the [Periscope dashboard](https://app.periscopedata.com/app/adrise:tubi/676521/(Official)-Experimentation-Sample-Size-Calculator), but with additional flexibility of filtering for a specific set of users. 

The very high level concept of this tool:
1. Dynamically generate a SQL query based on a set of user-generated inputs. Run the query on Redshift to pull into a dataframe.
2. Run through the sample size calculations (with the ability for the user to set custom parameters). Output a table that displays sample required for all chosen platforms. 

The bulk of the work is focused on adding flexibility to #1. This tool is WIP and there are many improvements to be done (in order of priority):
- Adding analytics_richevent filtering capabilities (ie. can we pull all users who watched 70% of a video, then hit the back button?)
- Adding capability to have more than 1 attribute and metric filter
- Adding flexibility for the user to define their own primary metrics

In [1]:
import tubi_data_runtime as tdr
import math
import pandas as pd
import numpy as np
from datetime import date
from statsmodels.stats.power import tt_ind_solve_power
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

from ssc_utils.filter_generator import filter_generator
from ssc_utils.raw_user_data import raw_user_data
from ssc_utils.metric_switcher import metric_switcher
from ssc_utils.metric_summary import metric_summary
from ssc_utils.cuped import cuped
import ssc_utils.calculator as c

## Workflow

The general workflow of this tool (for internal; will alter the description of this section to be more user friendly in the future):

### 1. User filtering
The goal of this section is to pull a list of specific users that are eligible for the experiment. This is where the bulk of our efforts will be focused. 
- We want this to be dynamic based on the complexity of any filters (see 3 levels below)
- We also want this to be interactable, to minimize the amount of adhoc SQL coding. It's daunting (and unscalable) for our stakeholders to alter the SQL in our current calculator to fit their specific needs. 

##### 3 different levels, in order of complexity (easiest to hardest):
1. device_metric_daily
    - all_metric_hourly covers the same metrics/attributes available for filtering, although device_metric_daily will be more performant
2. all_metric_hourly 
3. analytics_richevent
    - using events level data adds near infinite flexibilty, but makes this problem much harder
    
For our first prototype, we're only using all_metric_hourly to cover most filtering cases. To achieve a higher level of filtering flexibilty, we will include analytics_richevent. To reduce run times and processing work, we can include device_metric_daily.

#### In general, there are 2 ways to filter users:
1. Attributes (ie. ROKU+AMAZON, certain os, kids mode only, etc.)
2. Metrics (ie. watched at least 60 mins TVT, completed 3 movies, etc.)

### 2. Raw user data
Catch-all CTE to pull a list of standard metrics of active devices in the last 4 weeks, from device_metric_daily. 
- In the future, we may want to improve this to allow flexibility for more complex metrics not available in device_metric_daily 
- ie. verification rates can only be calculated from analytics_richevent using is_confirmed = 't'

### 3. User data
Dynamic CTE that calculates a specific user-chosen metric.

### 4. Metric summary
Catch-all CTE that allows us to summarize/prep the data for CUPED.

### 5. CUPED
Catch-all CTE to calculate CUPED for all platforms, platform types, and all Tubi.

## Interactive calculator (end user starts here)

In [2]:
print('choose attribute filter')
attribute_filter_str = interactive(filter_generator().make_sql_where_string, 
                                   field = filter_generator().filter_attributes_choices(), 
                                   condition = filter_generator().attribute_conditions_choices(), 
                                   value = '')
display(attribute_filter_str)

print('')
print('choose metric filter')
metric_filter_str = interactive(filter_generator().make_sql_where_string, 
                                field = filter_generator().filter_metrics_choices(), 
                                condition = filter_generator().metric_conditions_choices(), 
                                value = '')
display(metric_filter_str)

print('')
print('choose your primary metric')
metric_str = interactive(metric_switcher().choose_metric, metric = metric_switcher().possible_metrics())
display(metric_str)


choose attribute filter


interactive(children=(Dropdown(description='field', options=('no filters', 'user_id', 'device_id', 'platform',…


choose metric filter


interactive(children=(Dropdown(description='field', options=('no filters', 'visit_total_count', 'tvt_sec', 'mo…


choose your primary metric


interactive(children=(Dropdown(description='metric', options=('capped_tvt', 'new_viewer_first_day_capped_tvt',…

In [3]:
print('relative effect size')
EFFECT_SIZE_RELATIVE = interactive(c.effect, x=(0.0,1.0,0.01))
display(EFFECT_SIZE_RELATIVE)

print('')
print('number of treatments')
NUMBER_VARIATIONS = interactive(c.variations, x=(0,8,1))
display(NUMBER_VARIATIONS)

print('')
print('allocation per variation (including control)')
ALLOCATION = interactive(c.allocation, x=(0.0,1.0,0.01))
display(ALLOCATION)

print('')
print('power')
POWER = interactive(c.power, x=(0.0,1.0,0.01))
display(POWER)

print('')
print('alpha')
ALPHA = interactive(c.alpha, x=(0.0,1.0,0.01))
display(ALPHA)

relative effect size


interactive(children=(FloatSlider(value=0.01, description='x', max=1.0, step=0.01), Output()), _dom_classes=('…


number of treatments


interactive(children=(IntSlider(value=1, description='x', max=8), Output()), _dom_classes=('widget-interact',)…


allocation per variation (including control)


interactive(children=(FloatSlider(value=0.5, description='x', max=1.0, step=0.01), Output()), _dom_classes=('w…


power


interactive(children=(FloatSlider(value=0.8, description='x', max=1.0, step=0.01), Output()), _dom_classes=('w…


alpha


interactive(children=(FloatSlider(value=0.05, description='x', max=1.0, step=0.01), Output()), _dom_classes=('…

## After the user specifies the settings, run everything below

In [13]:
step1 = filter_generator().generate_filter_cte(attribute_sql = attribute_filter_str, metric_sql = metric_filter_str)
step2 = raw_user_data().generate_raw_user_data_cte()
step3 = metric_switcher().generate_user_data_cte(metric_str.result) 
step4 = metric_summary().generate_metric_summary_cte() 
step5 = cuped().generate_cuped_cte()

FINAL_SQL = step1 + step2 + step3 + step4 + step5

# output SQL for debugging purposes
# you can manually copy and run this elsewhere
print(FINAL_SQL)

df = tdr.query_redshift(FINAL_SQL).to_df()


        WITH elig_devices as (
            -- Pull list of devices that were active (has any row; don't need TVT >0) in the past 4 weeks
            -- Using all_metric_hourly for additional filters
            SELECT DISTINCT device_id
            FROM tubidw.all_metric_hourly
            WHERE DATE_TRUNC('week',hs) >= dateadd('week',-4,DATE_TRUNC('week',GETDATE()))
            AND DATE_TRUNC('week',hs) < DATE_TRUNC('week',GETDATE())
             -- attribute filters dynamically populate here

        --     for example:
        --     AND user_id is not null AND device_id <> user_id   -- Guest vs signed in device
        --     AND platform IN ('ROKU', 'AMAZON')                 -- Platform/Platform type specific
        --     AND country in ('US')                              -- Geo specific
        --     AND os IN ('abcdefg')                              -- OS/version specific
        --     AND content_id IN () AND tvt_sec > 0               -- Browsed/watched specific content/co

In [14]:
df.sort_values('platform').style\
    .hide_index()\

metric_name,platform,observations,avg_cuped_result,std_cuped_result
new_user_1_to_8_days_retained,ALL,16244183,0.080392,0.274442
new_user_1_to_8_days_retained,AMAZON,1034885,0.146588,0.353679
new_user_1_to_8_days_retained,ANDROID,3127529,0.122493,0.328265
new_user_1_to_8_days_retained,COMCAST,326671,0.095652,0.293931
new_user_1_to_8_days_retained,COX,29023,0.091244,0.287586
new_user_1_to_8_days_retained,IPAD,171361,0.115078,0.319614
new_user_1_to_8_days_retained,IPHONE,1296564,0.098646,0.297818
new_user_1_to_8_days_retained,MOBILE,4645780,0.115794,0.320264
new_user_1_to_8_days_retained,OTT,5793258,0.116524,0.321335
new_user_1_to_8_days_retained,PS4,252736,0.15352,0.360427


## Sample size results

In [15]:
# ---------- Constants ---------- # 
COL_NAME_P = 'avg_cuped_result'
STD_COL_NAME = 'std_cuped_result'
RATIO = 1
SAMPLING = 1  # TODO: make this dynamic between 1 and 1000 for sampled analytics

CORRECTED_ALPHA = ALPHA.result / NUMBER_VARIATIONS.result
P2_MULTIPLICATIVE_FACTOR =  1 + EFFECT_SIZE_RELATIVE.result
    
# ---------- Functions --------- # 
def sample_power_ttest(p1, p2, sd_diff, alpha=0.05, power=0.8, ratio=1, alternative = 'two-sided'):
    mean_diff = abs(p2 - p1)
    std_effect_size = mean_diff / sd_diff
    n = tt_ind_solve_power(effect_size=std_effect_size, 
                         alpha=alpha, 
                         power=power, 
                         ratio=ratio, 
                         alternative=alternative) # Potential improvement: make this able to handle one-sided tests
    return np.array(n).round()

# ---------- Implementation ---------- #
df['sample_required'] =  df.apply(lambda row: sample_power_ttest(
    p1 = row[COL_NAME_P],
    p2 = row[COL_NAME_P] * P2_MULTIPLICATIVE_FACTOR,
    sd_diff = row[STD_COL_NAME],
    alpha = CORRECTED_ALPHA,
    power = POWER.result,
    ratio = RATIO)
                                  , axis=1)

df['weeks_required'] = df['sample_required'] / (df['observations'] * 0.5 * ALLOCATION.result * SAMPLING)

df.sort_values('platform').style\
    .hide_index()\
    .set_precision(3)\
    

metric_name,platform,observations,avg_cuped_result,std_cuped_result,sample_required,weeks_required
new_user_1_to_8_days_retained,ALL,16244183,0.08,0.274,1829409.0,0.45
new_user_1_to_8_days_retained,AMAZON,1034885,0.147,0.354,913812.0,3.532
new_user_1_to_8_days_retained,ANDROID,3127529,0.122,0.328,1127373.0,1.442
new_user_1_to_8_days_retained,COMCAST,326671,0.096,0.294,1482310.0,18.15
new_user_1_to_8_days_retained,COX,29023,0.091,0.288,1559418.0,214.922
new_user_1_to_8_days_retained,IPAD,171361,0.115,0.32,1210895.0,28.265
new_user_1_to_8_days_retained,IPHONE,1296564,0.099,0.298,1430809.0,4.414
new_user_1_to_8_days_retained,MOBILE,4645780,0.116,0.32,1200823.0,1.034
new_user_1_to_8_days_retained,OTT,5793258,0.117,0.321,1193779.0,0.824
new_user_1_to_8_days_retained,PS4,252736,0.154,0.36,865244.0,13.694
