<a href="https://colab.research.google.com/github/RaniereRamos/Consultoria/blob/master/AA_test_results.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p><img alt="Colaboratory logo" height="45px" src="/img/colab_favicon.ico" align="left" hspace="10px" vspace="0px"></p>

<h1>What is A/B testing?</h1>

**A/B testing** is a framework for you to <font color = "#0097a7">**test different ideas**</font> for how to improve upon an existing design, often a website. With A/B testing you're able to take a set of new ideas, <font color = "#0097a7">**test them with a new experiment**</font>, <font color = "#0097a7">**statistically analyze the results**</font> to confidenttly say which idea is better, update your website or app to <font color = "#0097a7">**use the winning idea**</font>, and then continue the cycle over again. 


Let's walk through a simplified set of steps with an experiment. For do that, let's say we run a shopping website.

We want to know if a **different recommendation algorithm** would result in more visitors clicking the **'BUY!"** button.


*   This is also referred to as **"conversion rate"**. If someone clicks we say they "converted".
*   The **conversion rate** is generally the number of people who did an action (for example, clicked a button)<font color = "#0097a7">**(orders)**</font> divided by the number of people who went to the page <font color = "#0097a7">**(visits)**</font>.

**Notes:** In our case, to test this we need three conditions: 

*   One, a **control** (our current recommendation algorithm).
*   Two, a **treatment 1** (a new recommendation algorithm).
*   Three, a **treatment 2** (other new recommendation algorithm).

>***Our hypothesis is that a new recommendation algorithm will make people more likely to buy what they needed.***

### Variables
*   **Question:** Will changing the recommendation algorithm result in more visitors clicking the **"BUY"** button?
*   **Hypothesis:** Using a new recommendation algorithm result in more **"BUY"** clicks.
*   **Dependent variable:** Clicked **"BUY"** button or not.
*   **Independent variable:** Buy Box's new recommendation algorithm.

## Get the data

### Import libraries
Here, we import the necessary libraries used in this notebook.

In [None]:
# Install libraries for a progress bar
!pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'

### Provide your credentials to the runtime

### Authentications
**Notes:** You need to user identification and project to be accessed in Big Query.

In [18]:
# User authentication
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


In [19]:
# Authentication of credentials
from oauth2client.client import GoogleCredentials
credentials = GoogleCredentials.get_application_default()
import getpass

In [20]:
import google.auth
from google.cloud import bigquery
from google.cloud import bigquery_storage

# Explicity create a credentials object. This allows you to use the same 
# credentials for both the BigQuery and BigQuery Storage clients,
# avoiding unnecessary API calls to fecth duplicate authentication tokens.

credentials, your_project_id = google.auth.default(
    scopes = ["https://www.googleapis.com/auth/cloud-platgorm"]
)

# Make clients
project_bq = 'utility-canto-251714'       # Project name in BigQuery

bqclient = bigquery.Client(credentials = credentials, project = project_bq)
bqstorageclient = bigquery_storage.BigQueryReadClient(credentials = credentials)

### Enable data table display
Colab includes the ```google.colab.data_table``` package that can be used to display large pandas dataframes as an interactive data table. It can be enabled with:

In [21]:
%load_ext google.colab.data_table

### Import dataset
Load the data and make it available in the object.

#### Use BigQuery via magics
The ```google.cloud.bigquery``` library also includes a magic command which runs a query and either displays the result or saves it to a variable as a ```Dataframe```.

In [30]:
# Display query output immediately

%%bigquery --project utility-canto-251714

WITH brand_name AS (
 SELECT
     ARRAY(SELECT * FROM UNNEST(["ACOM"])) AS arr
),
dates AS (
 SELECT
     DATE("2021-06-23") AS min_date,
     DATE("2021-07-03") AS max_date
),
sort AS (
 SELECT
     sk_timestamp AS sort_date,
     UPPER(brand) AS brand,
     bboxtoken,
     DATE(sk_timestamp) 
           || '-' || 
         EXTRACT(hour FROM sk_timestamp)
           || '-' || 
         EXTRACT(minute FROM sk_timestamp)
           || '-' || 
         EXTRACT(second FROM sk_timestamp) 
           || '-' || 
         UPPER(brand) 
           || '-' || 
         CASE WHEN contextregion IS NULL THEN 'NAO IDENTIFICADO' ELSE contextregion END
           || '-' ||
         CASE WHEN context_opn IS NULL THEN 'NAO IDENTIFICADO' ELSE context_opn END
           || '-' || 
         productid 
           || '-' || 
         isfinance
           || '-' || 
         isprime
           || '-' || 
         CASE WHEN salesSolution IS NULL THEN 'NAO IDENTIFICADO' ELSE salesSolution END AS token_geral,
     buyboxTestAB,
     contextregion,
     departmentid,
  FROM
     utility-canto-251714.raw_buybox.ordenacao_2021,
     dates
 WHERE
     DATE(sk_timestamp) BETWEEN min_date AND max_date
     AND UPPER(brand) IN (SELECT pair FROM brand_name, UNNEST(arr) AS pair)
),
sales_ AS (
 SELECT
     partition_date,
     buy_box_token,
     department_id,
     device_subtype,
     order_id,
     order_line_id
 FROM
     b2w-bee-analytics.evaluated_sales.sales,
     dates
 WHERE
     DATE(partition_date) BETWEEN min_date AND max_date
     AND UPPER(brand) IN (SELECT pair FROM brand_name, UNNEST(arr) AS pair)
     AND buy_box_token LIKE 'smartbuybox-acom-v2%'
     AND delivery_type = 'VENDA'
     AND high_value_flag = 'N'
     AND payment_status = 'APROVADO'
     AND sku_type = 'PRODUTO'
     AND department_id NOT IN ('9087')
)
        SELECT
            DATE(sort_date) AS data,
            ord.brand,
            CASE 
              WHEN buyboxTestAB = 'control-abexperiment20210622' THEN 'Control'
              WHEN buyboxTestAB = 'treatment1-abexperiment20210622' THEN "Treatment 1"
              ELSE 'Treatment 2'
            END 
              AS AB_Test,
            ord.departmentid,
            contextregion,
            COUNT(DISTINCT order_id) AS Orders,
            COUNT(DISTINCT token_geral) AS Visits          
        FROM
            sales_ sales RIGHT OUTER JOIN sort ord
                           ON sales.buy_box_token = ord.bboxtoken
        WHERE buyboxTestAB IN ('control-abexperiment20210622','treatment1-abexperiment20210622','treatment2-abexperiment20210622')
        GROUP BY 1,2,3,4,5
        ORDER BY 1 


Query complete after 0.01s: 100%|██████████| 15/15 [00:00<00:00, 1047.83query/s]
Downloading: 100%|██████████| 568110/568110 [00:01<00:00, 374565.43rows/s]


Unnamed: 0,data,brand,AB_Test,departmentid,contextregion,Orders,Visits
0,2021-06-23,ACOM,Treatment 2,9105,NORTHEAST_CAPITAL_2601_992601,5,332
1,2021-06-23,ACOM,Treatment 1,9072,NORTHEAST_INTERIOR_2604_992604,0,46
2,2021-06-23,ACOM,Treatment 2,60,SP_CAPITAL_3501_935030,65,3431
3,2021-06-23,ACOM,Treatment 2,9072,NORTHEAST_CAPITAL_2301_992301,18,2338
4,2021-06-23,ACOM,Control,9072,NORTHEAST_INTERIOR_2602_992602,0,113
...,...,...,...,...,...,...,...
568105,2021-07-03,ACOM,Treatment 2,9076,SP_INTERIOR_3505_350018,0,1
568106,2021-07-03,ACOM,Treatment 1,9084,RJ_CAPITAL_3301_933028,0,1
568107,2021-07-03,ACOM,Treatment 1,9105,MG_INTERIOR_3106_310030,1,19
568108,2021-07-03,ACOM,Control,9065,MG_INTERIOR_3109_310052,0,3


## Preliminary data exploration

### Install packages
Here, we install and load the necessary packages used in this notebook.

In [2]:
# To enable the magics below and use R within Python
%load_ext rpy2.ipython

In [None]:
%%R
# Install packages
install.packages("tidyverse")     # Packages for data science

In [None]:
%%R
# Library packages
library(tidyverse)

In [None]:
%%R
# Read in data
