# Marketing A/B Testing

A/B testing is a method used to **compare two versions of a website or app** to determine which one performs better. It involves randomly showing users version A or version B and then measuring a specific metric, such as click-through rate or conversion rate, to determine which version is more successful. A/B testing is commonly used in digital product development to optimize user experience and increase conversions.


<p><img src="https://images.unsplash.com/photo-1616418625172-c607e16733ca?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2069&q=80" alt></p>

Using data from an anonymous company, <a href="https://www.kaggle.com/datasets/faviovaz/marketing-ab-testing"> collected on Kaggle </a>, we will analyze the impact of the implementaion of a new website feature on conversion rates. 
The test was run for 1 month. Now the company is trying to determine what strategy to adopt: 
A. Keep this new feature 
B. Keep the old version 
C. Or extend the tests

https://www.kaggle.com/code/mysticmedons/a-b-testing-for-landingpage/notebook
https://thecleverprogrammer.com/2022/11/14/a-b-testing-using-python/

In [21]:
#Libraries 
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from pandas_profiling import ProfileReport

## 1. The AB-test data

<p>Below is a description of this data set:</p>
<ul>
<li><code>Campaign Name</code> - target campaign for ad landing page</li>
<li><code>Spend [USD]</code> - the amount of money spent on advertising in the campaign</li>
<li><code># of Impressions</code> - the number of people who viewed the ad in the campaign (contains repeated viewing of the same person for the ad).</li>
<li><code>Reach:</code> - the number of unique people who saw the ad in the campaign. 
<li><code># of Website Clicks</code> - the number of users who clicked on the website link in the campaign's advertisement. </li>
<li><code># of Searches</code> - the number of users who performed a search on the website </li>
<li><code># of View Content</code> -  number of users who have viewed product details. </li>
<li><code># of Add to Cart </code> -  number of users who have added the product to the cart. </li>
<li><code># of Add to Purchase </code> -  number of users who have purchased the product </li>
</ul>

In [22]:
# Load files
control_data = pd.read_csv('datasets/control_group.csv', sep=";")
test_data = pd.read_csv('datasets/test_group.csv', sep=';')

In [23]:
print(control_data.head())
print(test_data.head())

      Campaign Name       Date  ...  # of Add to Cart  # of Purchase
0  Control Campaign  1.08.2019  ...            1819.0          618.0
1  Control Campaign  2.08.2019  ...            1219.0          511.0
2  Control Campaign  3.08.2019  ...            1134.0          372.0
3  Control Campaign  4.08.2019  ...            1183.0          340.0
4  Control Campaign  5.08.2019  ...               NaN            NaN

[5 rows x 10 columns]
   Campaign Name       Date  ...  # of Add to Cart  # of Purchase
0  Test Campaign  1.08.2019  ...               894            255
1  Test Campaign  2.08.2019  ...               879            677
2  Test Campaign  3.08.2019  ...              1268            578
3  Test Campaign  4.08.2019  ...               566            340
4  Test Campaign  5.08.2019  ...               956            768

[5 rows x 10 columns]


## 2. Data Preparation

In a first step we explore the data.
The two data set for both campaigns are equal in number. But we notice that the data we have collected have some errors in colmuns names, types errors and null values for the contol data set.

In [24]:
control_report = ProfileReport(control_data, title = 'Control')
test_report = ProfileReport(test_data, title = 'Test')
test_report.compare(control_report)

Summarize dataset: 100%|██████████| 83/83 [00:03<00:00, 21.01it/s, Completed]                                       
Summarize dataset: 100%|██████████| 84/84 [00:04<00:00, 17.97it/s, Completed]                                       
Generate report structure: 100%|██████████| 1/1 [00:01<00:00,  1.82s/it]
Render HTML: 100%|██████████| 1/1 [00:01<00:00,  1.00s/it]




In [25]:
# columns names
def to_clean(val):
    return val.strip().lower().replace("# ", "").replace("of ", "").replace(" ","_").replace("[usd]", "usd")

control_data.rename(columns=to_clean, inplace = True)
test_data.rename(columns=to_clean, inplace=True)

In [26]:
#merge data
campaign_data = control_data.merge(test_data, how='outer').sort_values(['date']).reset_index(drop= True)
campaign_data.head()

Unnamed: 0,campaign_name,date,spend_usd,impressions,reach,website_clicks,searches,view_content,add_to_cart,purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Test Campaign,1.08.2019,3008,39550.0,35820.0,3038.0,1946.0,1069.0,894.0,255.0
2,Test Campaign,10.08.2019,2790,95054.0,79632.0,8125.0,2312.0,1804.0,424.0,275.0
3,Control Campaign,10.08.2019,2149,117624.0,91257.0,2277.0,2475.0,1984.0,1629.0,734.0
4,Test Campaign,11.08.2019,2420,83633.0,71286.0,3750.0,2893.0,2617.0,1075.0,668.0


In [27]:
# correct data type
campaign_data['date'] = pd.to_datetime(campaign_data['date'])

In [29]:
# null values 
campaign_data[campaign_data.isnull().any(axis= 1)]

Unnamed: 0,campaign_name,date,spend_usd,impressions,reach,website_clicks,searches,view_content,add_to_cart,purchase
51,Control Campaign,2019-05-08,1835,,,,,,,


## 3. Distribution and EDA

In [28]:
ab_testing_data_cleaned.describe()

NameError: name 'ab_testing_data_cleaned' is not defined

In [None]:
ab_group = ab_testing_data_cleaned.groupby('test_group')['userid'].count()
ab_group_count = pd.concat([ab_group, 
               ab_group / ab_group.sum()] , keys=('counts','percentage'), axis=1)
ab_group_count

Unnamed: 0_level_0,counts,percentage
test_group,Unnamed: 1_level_1,Unnamed: 2_level_1
ad,564577,0.96
psa,23524,0.04


In [None]:
ab_testing_data_cleaned['converted']= ab_testing_data_cleaned['converted'].apply(lambda x:1 if x== True  else  0)