## Business Problem

### - Electronic House
Electronic House is online commerce (e-commerce) of computer products for homes and offices. Customers can buy mice, monitors, keyboards, computers, laptops, HDMI cables, headphones, and webcam cameras, among others, through an online site and receive the products in the comfort of their homes.
The UX designers team has been working on a new sales page, with the objective of increasing the conversion rate of a store product, a bluetooth keyboard. The product manager said that the conversion rate of the current page is 13% on average over the last year.
The product manager's goal is to increase the conversion rate by 2%, that is, the new sales page, developed by the UX team, would be a success if your conversion rate was 15%.
The bluetooth keyboard has a sale price of R$ 4,500.00 in cash or in installments at 12% interest-free on your credit card.
Before exchanging the old sales page for the new one, the product manager would like to test the effectiveness of the new page on a smaller group of customers in order to run less risk of falling conversion if the new page shows a worse conversion than the current page.

### Challenge
Validate the effectiveness of the new page more securely,
with greater security and rigidity in the analysis.
The final results of the work are as follows:
1. Converting a new page is actually better than converting a page
current?
2. What is the potential number of sales that the new page can bring?
3. What is the total revenue from the sale of the bluetooth keyboard through the new page?

### Dataset
https://www.kaggle.com/datasets/zhangluyuan/ab-testing?select=ab_data.csv


## Import

In [9]:
import pandas as pd
import numpy as np
import math 
from statsmodels.stats import api as sms
from scipy.stats import chi2_contingency
import lux
from lux.vis.VisList import VisList

## Loading

In [2]:
df_raw=pd.read_csv('C:/Users/Utente77/repos/AB_Test/dataset/ab_data.csv')

In [48]:
df_raw.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


In [14]:
VisList(["converted","landing_page",],df_raw)

LuxWidget(recommendations=[{'action': 'Vis List', 'description': 'Shows a vis list defined by the intent', 'vs…

In [49]:
df_raw.shape

(294478, 5)

## DOE

### Hypothesys

In [5]:
# H0: A conversao da nova pagina è de 13%
# H1: A conversao è diferente de 13%

### Parametres

In [6]:
## Nível de confiança:
confidence_level = 0.95
    
## Nível de significância:
significance_level= 0.05

## webpage conversion
p1=0.13
p2=0.15
p=0.1 ## conversao historica
                   
## Effect Size
effect_size=sms.proportion_effectsize(p1,p2)

## Poder Estatístico
statistic_power=0.8


### Sampling - Aleatoria simples

In [7]:
## sample size
sample_n=sms.NormalIndPower().solve_power( ## metade dos dados
    effect_size,
    power=statistic_power,
    alpha=significance_level
    )

sample_n=math.ceil(sample_n)

sample_n ## controle

4720

In [8]:
sample_total= 2*sample_n # samplig de p1 e p2
sample_total

9440

In [9]:
#sabendo que temos 10% de abertura mail para obter a amostra necessaria precisamos enviar 110% a mais
n_invio_mail=sample_n/p
n_invio_mail

47200.0

## Preparaçao dos dados

In [10]:
df_aux=df_raw[['user_id','group']].groupby('user_id').count().reset_index().query('group >1')
df3=df_raw[~df_raw['user_id'].isin(df_aux['user_id'])]

## Amostragem

In [11]:
df_control_sample=df3[df3['group']=='control'].sample(n=sample_n, random_state=32)
print('Size of control Group:{}'.format(df_control_sample.shape[0]))

Size of control Group:4720


In [12]:
df_treatment_sample=df3[df3['group']=='treatment'].sample(n=sample_n, random_state=32)
print('Size of treatment Group:{}'.format(df_treatment_sample.shape[0]))

Size of treatment Group:4720


## Taxa de conversao 

In [13]:
converted=df_control_sample.loc[df_control_sample['converted']==1,'converted'].sum()
conversion_rate_control=converted/len(df_control_sample['converted'])
print('Conversion Rate - Control Group:{}'.format(conversion_rate_control))

converted=df_treatment_sample.loc[df_treatment_sample['converted']==1,'converted'].sum()
conversion_rate_treatment=converted/len(df_treatment_sample['converted'])
print('Conversion Rate - Treatment Group:{}'.format(conversion_rate_treatment))

df_ab=pd.concat([df_control_sample,df_treatment_sample])


Conversion Rate - Control Group:0.11864406779661017
Conversion Rate - Treatment Group:0.11970338983050847


## Teste de hipotese

In [14]:
df_table=df_ab[['group','converted']].groupby('group').agg({'converted':['sum','count']})
df_table.columns=['converted','non_converted']#so pra melhorar a disposiçao da tabela
chi_val, pval, dof, expected = chi2_contingency(df_table)
print('p-value:{:.2f}:'.format(pval))
if pval < significance_level:
    print('Rejeita hipotese nula')
else:
    print('Falha em rejeitar a hipotese nula' )
df_table


p-value:0.91:
Falha em rejeitar a hipotese nula


Unnamed: 0_level_0,converted,non_converted
group,Unnamed: 1_level_1,Unnamed: 2_level_1
control,560,4720
treatment,565,4720


## Conversao de resultado para R$

In [15]:
# conversao pagina_atual =13%
# conversao pagina_nova = 15%

# compradores= n_visitantes diario * % conversao pagina_atual 
# GMV=compradores*tiket medio (4500)

In [16]:
df4=df3.copy()

In [17]:
df4['timestamp']=pd.to_datetime(df4['timestamp']).apply(lambda x: x.strftime('%Y-%m-%d'))
df5=df4[['user_id','timestamp']].groupby('timestamp').count().reset_index()

In [30]:
# Currente GMV
df5['current_purchase']=np.ceil(df5['user_id']*0.13).astype(int)
df5['current_GMV']=df5['current_purchase']*4500
current_gmv=df5['current_GMV'].sum()


In [38]:
# New GMV
df5['new_purchase']=np.ceil(df5['user_id']*0.14).astype(int)
df5['new_GMV']=df5['new_purchase']*4500
new_GMV=df5['new_GMV'].sum()


In [42]:
lift_abs=new_GMV-current_gmv
lift=100*(new_GMV-current_gmv)/current_gmv

In [43]:

print('GMV on period:{}'.format(current_gmv))
print('New GMV on period:{}'.format(new_GMV))
print('Abs Lift: {}'.format(lift_abs))
print('Expected Lift:{:.2f}%'.format(lift))



GMV on period:167760000
New GMV on period:180666000
Abs Lift: 12906000
Expected Lift:7.69%


In [22]:
df5.head()

Unnamed: 0,timestamp,user_id,current_purchase,current_GMV,new_purchase,new_GMV
0,2017-01-02,5631,732.03,3294135.0,844.65,3800925.0
1,2017-01-03,13025,1693.25,7619625.0,1953.75,8791875.0
2,2017-01-04,12929,1680.77,7563465.0,1939.35,8727075.0
3,2017-01-05,12748,1657.24,7457580.0,1912.2,8604900.0
4,2017-01-06,13168,1711.84,7703280.0,1975.2,8888400.0
