In [None]:
import pandas as pd
import numpy as np
import confidence

# Let's generate some example data

In [None]:
data = confidence.examples.example_data()
data.head()

__Cool, now let's walk through some use cases!__


## You have one categorical variable (e.g. a country), and you'd like to see the proportion of successes along with a confidence interval

In [None]:
# Let's choose the test variation for now, and then we'll aggreate over all dates in the test
single_group_df = (data.loc[lambda x: (x.country == 'us') & (x.variation_name == 'test')]
                        .groupby(by='country')[['success', 'total']].sum()
                        .reset_index()
      )
single_group_df

## To get started with this case, instantiate an object for the ChiSquared model

In [None]:
single_group = confidence.ChiSquared(data_frame=single_group_df,
                                     numerator_column='success',
                                     denominator_column='total',
                                     categorical_group_columns='country')

### To see what methods are available, try a couple things:

In [None]:
# This reveals the documentation and can help with syntax
single_group?

In [None]:
# press tab on the line below to reveal the available methods in this class
single_group.

In [None]:
# You can also use this to see whats available from the base module
confidence.

### Let's start with the summary method for now, which will return the the probability along with the 95% confidence interval

In [None]:
single_group.summary()

In [None]:
# That's fun, but I want to plot the data!
single_group.summary_plot().show()

### We've just created a plot of the confidence interval of the single "us" category

## Your turn! Generate an interval plot for the canada control group

In [None]:
## Your code goes here


# Now let's look at a categorical variable with multiple levels (e.g. a test variation), and you'd like to make inferences on the difference between those groups

In [None]:
multi_level_df = (data
                .groupby(by='variation_name')[['success', 'total']].sum()
                .reset_index())
multi_level_df

## Your turn! Instantiate the Chi Squared test below

In [None]:
multi_level_test = confidence.ChiSquared(# Fill in the parameters here)

## Notice how the summary in this case outputs a confidence interval for each group

In [None]:
multi_level_test.summary()

## Same with the plot method

In [None]:
multi_level_test.summary_plot().show()

## Now that we have two groups, it's meaningful to consider the difference between the two groups

In [None]:
multi_level_test.difference(level_1='control', level_2='test', absolute=True)

## Note the columns above -- what does each mean?

## Your turn: Test the difference between control and test, but this time return the relative % difference.

In [None]:
# Todo: Write your code here.

## We can also visualize the confidence interval of the difference in probability between the two groups

In [None]:
multi_level_test.difference_plot(level_1='control', level_2='test').show()

# Multiple categorical groups

In [None]:
multi_group_df = (data.groupby(['variation_name', 'country'])[['success', 'total']].sum()
        .reset_index())
multi_group_df

## Your turn: Implement a chisquared test with 2 categorical groups

In [None]:
multi_group_test = # Fill in

## Look at how the output changes with multiple groups

In [None]:
multi_group_test.summary()

In [None]:
multi_group_test.summary_plot().show()

## Your turn: Look at how country and variation_name are grouped on the x-axis above. Try to flip the order of the grouping

## Difference works a little differently with multiple groups. For each variation you need to specify levels of both groupings with a tuple or list. E.g. ('test', 'us')

## Your turn: Test the difference between "us" & "test" vs. "us" & "control"

In [None]:
## Your code goes here

## What if you wanted to test the difference between control and test for every level of the "country" variable? For this you can use the `groupby` kwarg.

## Your turn: Implement the difference between control and test for every level of "country" by using the `groupby` kwarg. 

In [None]:
multi_group_test.difference(## Your input goes here)

## Look at how the difference plot returns multiple groupbys

In [None]:
multi_group_test.difference_plot(# Fill in the parameters here (same as above) #).show()

# Ordinal variables: We can also use confidence to generate visualizations for ordinal variables like time-series data (e.g. date) or numeric data (e.g. days since registration)


In [None]:
ordinal_group_df = (data.groupby(['variation_name', 'date'])[['success', 'total']].sum()
        .reset_index())
ordinal_group_df

## Note: in order for ordinal data to work properly it must be cast to a numeric or datetime type

In [None]:
ordinal_group_df['date'] = pd.to_datetime(ordinal_group_df['date'])

In [None]:
ordinal_test = confidence.ChiSquared(data_frame=ordinal_group_df,
                                     numerator_column='success',
                                     denominator_column='total',
                                     categorical_group_columns='variation_name',
                                     ordinal_group_column='date')

## Look at how the summary and summary_plot outputs change when an ordinal group is included

In [None]:
ordinal_test.summary()

In [None]:
ordinal_test.summary_plot().show()

# Bayesian models:
- Confidence also includes bayesian models. We'll walk through an example to see how the output changes.

In [None]:
bayesian_df = (data.groupby(['variation_name'])[['success', 'total']].sum()
        .reset_index())
bayesian_df

## Your turn: Implement a BetaBinomial model with the data above.
- The API is exactly the same as the frequentist case!

In [None]:
bayesian_test = confidence.BetaBinomial(data_frame=bayesian_df,
                                        numerator_column='success',
                                        denominator_column='total',
                                        categorical_group_columns='variation_name')

## The .summary() method output looks similar, but this time the intervals are credible intervals

In [None]:
bayesian_test.summary()

## The summary plot outputs the probability density of the posterior distribution for each variation. This represents our belief of the underlying "success rate" for each variation given the data collected so far.

In [None]:
bayesian_test.summary_plot().show()

## The difference method returns:
- p2-p1 mean: Our best estimate of the difference between variation 2 and 1. (Posterior mean of the difference between variation 1 and variation 2.)
- ci_lower: Lower credible interval
- ci_upper: Upper credible interval
- P(variation_2 > variation_1): Probability that the success rate of variation_2 is greater than that of variation_1
- variation_1 potential loss: The expected loss if we switch to variation 1, but variation 2 is actually better.
- variation_1 potential gain: The expected gain if we 
    switch to variation 1, and variation 1 is actually better.
- variation_2 potential loss: The expected loss if we 
    switch to variation 2, but variation 1 is actually better.
- variation_2 potential gain: The expected gain if we 
    switch to variation 2, and variation 2 is actually better.
    
__If you need to look this up again, refer to the .difference documentation__

In [None]:
bayesian_test.difference(level_1='control', level_2='test')

## We can also visualize the posterior distribution of the difference in probability between the two groups

In [None]:
bayesian_test.difference_plot(level_1='test', level_2='test2').show()

# Bayesian AB testing works a little differently than the Frequentist approach
- Rather than using "statistical significance" as a decision criteria, we use "expected loss" to determine when to end the test. The use of "expected loss" allows us to potentially end tests sooner.
- Expected loss is a measure of risk -- it can be interpreted as the rate difference (e.g. conversion difference) that we would lose if we were to switch to the apparent winning variation given the data so far, but it were _actually_ the losing variation.
- When running a Bayesian AB test we choose a risk threshold in advance. We end the test when potential loss falls under the threshold.
- __We'll follow up with a more detailed explaination of the stats and this approach, both for data scientists and our stakeholders!__ In the meantime you can read more about it here:
    - [AB testing at VWO](https://www.chrisstucchio.com/pubs/slides/gilt_bayesian_ab_2015/slides.html#1)
    - [VWO whitepaper](https://cdn2.hubspot.net/hubfs/310840/VWO_SmartStats_technical_whitepaper.pdf)
    - [Variance Explained blog](http://varianceexplained.org/r/bayesian-ab-testing/)