### Background

Let's say we have an investing app (INVESTING, not trading) called LittleJohn. Things are going well - we're getting customers, people are using the app...people are investing!

However, the board want more. 

One major problem that has been identified is some problems with the conversion funnel:

                            \    Registrations    /
                             \   Deposit money   /
                              \     Invest      /
                               \ Invest again  / 

Plenty of people and registering. When people actually deposit money, they are investing money and using the app regularly. The problem is getting users to deposit money after they've actually registered. The PM suggests changing the design of the deposit money page. They think that changing the size and colour of the button that connects an external account will increase conversions. But you're not so sure. How can we test this? Enter the A/B test.

You assign users randomly to visit either version A (with changed button) or version B (the original) of your app and after 2 weeks, collect the results.

1,837 registered users saw version A and of those 567 deposited money.
1,739 registered users saw version B and of those 421 deposited money.

Is this result statistically significant?

### Maths

Each user (trial) is distributed as a Bernoulli with binary outcome either success or failure:

E[X] = p <br>
Var(X) = p(1-p)

Where p is the probability of success on that trial.

According to the central limit theorum, the mean of a collection of RVs will be normally distributed with:

$\mu_X$ = (E[X1] + E[X2] + ... + E[Xn]) / n = np/n = p <br>
$\sigma_X$ = $\sigma$ / $\sqrt{n}$ = $\sqrt{p(1-p)}$ / $\sqrt{n}$

Our null hypothesis is that $\mu$ for both versions is the same: $d_H$ = $p_A$ - $p_B$ = 0 with the alternative hypothesis being that $d_H$ is statisticlly significantly different from 0. <br>
We now need the standard deviation of $d_H$:

Var($d_H$) = Var($p_A$ - $p_B$) = ${\sigma_A}^2$ + ${\sigma_A}^2$

In [1]:
import numpy as np
import pandas as pd
import nbimporter
from Some_Distributions import normal

In [2]:
normal(3, 2)

<Some_Distributions.normal at 0x124b9f5b0>

In [3]:
class AB_vars:
    
    def __init__(self, sample, conv):
        self.sample = sample
        self.conv = conv
        self.convp = conv / sample
    
    def exp_ber(self):
        return self.convp
    
    def var_ber(self):
        self.convp * (1 - self.convp)
        
    def mew_bi(self):
        return self.exp_ber
    
    def sigma_bi(self):
        return (self.var_ber**(1/2)) / (self.sample**(1/2))
    
class AB_test(AB_vars):
    
    def __init__(self, A, B):
        self.A = A
        self.B = B
        
    def mew_d_H(self):
        return 0
    
    def sigma_d_H(self):
        return (self.A.sigma_bi()**2 + self.B.sigma_bi()**2)**(1/2)
    
    def signi:
        
    # get the cumilitive distribution by convering to standard normal and using z-table
    def cdf(self, k): #specify a value for cumilitive probablity
        (k - self.mew_d_h)/self.sigma_d_h #get standard normal
        row_val = round(abs(snorm), 1) #digits of matching row and column names
        col_val = round(abs(round(snorm, 2)) - abs(round(snorm, 1)), 2)
        if col_val == 0: #for matching the first column
            col_val = '0.00'
        pos_cdf = float(z_table.loc[[row_val], [str(col_val)]].values[0])
        if snorm < 0: #to get the right probability for negative snorms
            return 1 - pos_cdf
        else:
            return pos_cdf
        

In [21]:
#round all the values then do a search thorugh all the columns!
normal(1,2).z_table[normal(1, 2).z_table.eq(0.95).any(1)]

Unnamed: 0,0.00,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09
