Here is a dataset based on A/B testing submitted by İlker Yıldız on Kaggle. Below are all the features in the dataset:

**Campaign Name:**The name of the campaign

**Date:** Date of the record

**Spend:** Amount spent on the campaign in dollars

**of Impressions**: Number of impressions the ad crossed through the campaign

**Reach:** The number of unique impressions received in the ad

**of Website Clicks:** Number of website clicks received through the ads

**of Searches**: Number of users who performed searches on the website 

**of View Content:** Number of users who viewed content and products on the website

**of Add to Cart:** Number of users who added products to the cart

**of Purchase:** Number of purchases


**Two campaigns** were performed by the company:

**Control Campaign & Test Campaign**

Perform A/B testing to find the best campaign for the company to get more customers.


In [1]:
import pandas as pd
import datetime
from datetime import date, timedelta
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"

control_data = pd.read_csv("control_group.csv", sep = ";")
test_data = pd.read_csv("test_group.csv", sep = ";")

In [2]:
control_data.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [4]:
control_data.shape

(30, 10)

In [3]:
test_data.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


In [5]:
test_data.shape

(30, 10)

**The datasets have some errors in column names. Let’s give new column names before moving forward :**

In [6]:
control_data.columns = ["Campaign_Name","Date","Amount_Spent","Number_of_Impressions","Reach","Website_Clicks","Searches_Received","Content_Viewed","Added_to_Cart","Purchases"]

test_data.columns = ["Campaign_Name","Date","Amount_Spent","Number_of_Impressions","Reach","Website_Clicks","Searches_Received","Content_Viewed","Added_to_Cart","Purchases"]

In [8]:
control_data.head()

Unnamed: 0,Campaign_Name,Date,Amount_Spent,Number_of_Impressions,Reach,Website_Clicks,Searches_Received,Content_Viewed,Added_to_Cart,Purchases
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [9]:
test_data.head()

Unnamed: 0,Campaign_Name,Date,Amount_Spent,Number_of_Impressions,Reach,Website_Clicks,Searches_Received,Content_Viewed,Added_to_Cart,Purchases
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


In [10]:
#Check for missing values in both datasets
control_data.isnull().sum()

Campaign_Name            0
Date                     0
Amount_Spent             0
Number_of_Impressions    1
Reach                    1
Website_Clicks           1
Searches_Received        1
Content_Viewed           1
Added_to_Cart            1
Purchases                1
dtype: int64

In [35]:
control_data[control_data.isnull().any(axis= 1)]

Unnamed: 0,Campaign_Name,Date,Amount_Spent,Number_of_Impressions,Reach,Website_Clicks,Searches_Received,Content_Viewed,Added_to_Cart,Purchases


In [11]:
test_data.isnull().sum()

Campaign_Name            0
Date                     0
Amount_Spent             0
Number_of_Impressions    0
Reach                    0
Website_Clicks           0
Searches_Received        0
Content_Viewed           0
Added_to_Cart            0
Purchases                0
dtype: int64

**The dataset of the control campaign has missing values in a row(5th row). Let’s fill in these missing values by the mean value of each column :**

In [13]:
control_data["Number_of_Impressions"].fillna(value=control_data["Number_of_Impressions"].mean(),inplace=True)
control_data["Reach"].fillna(value=control_data["Reach"].mean(),inplace=True)
control_data["Website_Clicks"].fillna(value=control_data["Website_Clicks"].mean(),inplace=True)
control_data["Searches_Received"].fillna(value=control_data["Searches_Received"].mean(),inplace=True)
control_data["Content_Viewed"].fillna(value=control_data["Content_Viewed"].mean(),inplace=True)
control_data["Added_to_Cart"].fillna(value=control_data["Added_to_Cart"].mean(),inplace=True)
control_data["Purchases"].fillna(value=control_data["Purchases"].mean(),inplace=True)

In [30]:
control_data.head(6)

Unnamed: 0,Campaign_Name,Date,Amount_Spent,Number_of_Impressions,Reach,Website_Clicks,Searches_Received,Content_Viewed,Added_to_Cart,Purchases
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,109559.758621,88844.931034,5320.793103,2221.310345,1943.793103,1300.0,522.793103
5,Control Campaign,6.08.2019,3083,109076.0,87998.0,4028.0,1709.0,1249.0,784.0,764.0


## **Check the basic information of the both data frame.**

In [31]:
control_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Campaign_Name          30 non-null     object 
 1   Date                   30 non-null     object 
 2   Amount_Spent           30 non-null     int64  
 3   Number_of_Impressions  30 non-null     float64
 4   Reach                  30 non-null     float64
 5   Website_Clicks         30 non-null     float64
 6   Searches_Received      30 non-null     float64
 7   Content_Viewed         30 non-null     float64
 8   Added_to_Cart          30 non-null     float64
 9   Purchases              30 non-null     float64
dtypes: float64(7), int64(1), object(2)
memory usage: 2.5+ KB


In [32]:
test_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Campaign_Name          30 non-null     object
 1   Date                   30 non-null     object
 2   Amount_Spent           30 non-null     int64 
 3   Number_of_Impressions  30 non-null     int64 
 4   Reach                  30 non-null     int64 
 5   Website_Clicks         30 non-null     int64 
 6   Searches_Received      30 non-null     int64 
 7   Content_Viewed         30 non-null     int64 
 8   Added_to_Cart          30 non-null     int64 
 9   Purchases              30 non-null     int64 
dtypes: int64(8), object(2)
memory usage: 2.5+ KB


In [15]:
#Merge both datasets

ab_data = control_data.merge(test_data, how="outer").sort_values(["Date"])
ab_data = ab_data.reset_index(drop=True)
ab_data.head()



Unnamed: 0,Campaign_Name,Date,Amount_Spent,Number_of_Impressions,Reach,Website_Clicks,Searches_Received,Content_Viewed,Added_to_Cart,Purchases
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Test Campaign,1.08.2019,3008,39550.0,35820.0,3038.0,1946.0,1069.0,894.0,255.0
2,Test Campaign,10.08.2019,2790,95054.0,79632.0,8125.0,2312.0,1804.0,424.0,275.0
3,Control Campaign,10.08.2019,2149,117624.0,91257.0,2277.0,2475.0,1984.0,1629.0,734.0
4,Test Campaign,11.08.2019,2420,83633.0,71286.0,3750.0,2893.0,2617.0,1075.0,668.0


In [33]:
ab_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Campaign_Name          60 non-null     object 
 1   Date                   60 non-null     object 
 2   Amount_Spent           60 non-null     int64  
 3   Number_of_Impressions  60 non-null     float64
 4   Reach                  60 non-null     float64
 5   Website_Clicks         60 non-null     float64
 6   Searches_Received      60 non-null     float64
 7   Content_Viewed         60 non-null     float64
 8   Added_to_Cart          60 non-null     float64
 9   Purchases              60 non-null     float64
dtypes: float64(7), int64(1), object(2)
memory usage: 4.8+ KB


**We now see that the data columns in the test group have also been automatically converted from type int64 to the same data type in the control group after they are included in it.**

In [17]:
#Before moving forward, let’s have a look if the dataset has an equal number of samples about both campaigns  cross check the merge was succesful and did not generate any duplicates :
ab_data['Campaign_Name'].value_counts()

Control Campaign    30
Test Campaign       30
Name: Campaign_Name, dtype: int64

**The dataset has 30 samples for each campaign. Now let’s start with A/B testing to find the best marketing strategy.**

To get started with A/B testing, I will first analyze
**the relationship between the number of impressions we got from both campaigns and the amount spent on both campaigns:**

In [18]:
figure = px.scatter(data_frame = ab_data, 
                    x="Number_of_Impressions",
                    y="Amount_Spent", 
                    size="Amount_Spent", 
                    color= "Campaign_Name", 
                    trendline="ols")
figure.show()

**The control campaign resulted in more impressions according to the amount spent on both campaigns.**

**Now let’s have a look at the number of searches performed on the website from both campaigns:**

In [19]:
label = ["Total Searches from Control Campaign", "Total Searches from Test Campaign"]
counts = [sum(control_data["Searches_Received"]),sum(test_data["Searches_Received"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Searches')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

**The test campaign resulted in more searches on the website.**

**Now let’s have a look at the number of website clicks from both campaigns:**

In [20]:
label = ["Website Clicks from Control Campaign","Website Clicks from Test Campaign"]
counts = [sum(control_data["Website_Clicks"]), 
          sum(test_data["Website_Clicks"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Website Clicks')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

**The test campaign wins in the number of website clicks.**

**Now let’s have a look at the amount of content viewed after reaching the website from both campaigns:**

In [21]:
label = ["Content Viewed from Control Campaign","Content Viewed from Test Campaign"]
counts = [sum(control_data["Content_Viewed"]), 
          sum(test_data["Content_Viewed"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Content Viewed')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

**The audience of the control campaign viewed more content than the test campaign. Although there is not much difference, as the website clicks of the control campaign were low, its engagement on the website is higher than the test campaign.**

**Now let’s have a look at the number of products added to the cart from both campaigns:**

In [23]:
label = ["Products added to cart from Control Campaign", "Products added to cart from Test Campaign"]
count = [sum(control_data["Added_to_Cart"]),
         sum(test_data["Added_to_Cart"])]
colors = ["gold" ,"lightgreen"]

fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Added to Cart')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

**Despite low website clicks more products were added to the cart from the control campaign.**

**Now let’s have a look at the amount spent on both campaigns:**

In [25]:
label = ["Amount Spent in Control Campaign", 
         "Amount Spent in Test Campaign"]
counts = [sum(control_data["Amount_Spent"]), 
          sum(test_data["Amount_Spent"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Amount Spent')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

**The amount spent on the test campaign is higher than the control campaign. But as we can see that the control campaign resulted in more content views and more products in the cart, the control campaign is more efficient than the test campaign.**

**Now let’s have a look at the purchases made by both campaigns:**

In [26]:
label = ["Purchases Made by Control Campaign", 
         "Purchases Made by Test Campaign"]
counts = [sum(control_data["Purchases"]), 
          sum(test_data["Purchases"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Purchases')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

**There’s only a difference of around 1% in the purchases made from both ad campaigns. As the Control campaign resulted in more sales in less amount spent on marketing, the control campaign wins here!**

**Now let’s analyze some metrics to find which ad campaign converts more. I will first look at the relationship between the number of website clicks and content viewed from both campaigns:**

In [27]:
figure = px.scatter(data_frame = ab_data, 
                    x="Content_Viewed",
                    y="Website_Clicks", 
                    size="Website_Clicks", 
                    color= "Campaign_Name", 
                    trendline="ols")
figure.show()

**The website clicks are higher in the test campaign, but the engagement from website clicks is higher in the control campaign. So the control campaign wins!** 

**Now I will analyze the relationship between the amount of content viewed and the number of products added to the cart from both campaigns:**

In [28]:
figure = px.scatter(data_frame = ab_data, 
                    x="Added_to_Cart",
                    y="Content_Viewed", 
                    size="Added_to_Cart", 
                    color= "Campaign_Name", 
                    trendline="ols")
figure.show()

**Again, the control campaign wins!**

**Now let’s have a look at the relationship between the number of products added to the cart and the number of sales from both campaigns:**

In [29]:
figure = px.scatter(data_frame = ab_data, 
                    x="Purchases",
                    y="Added_to_Cart", 
                    size="Purchases", 
                    color= "Campaign_Name", 
                    trendline="ols")
figure.show()

**Although the control campaign resulted in more sales and more products in the cart, the conversation rate of the test campaign is higher.**

### Conclusion

From the above A/B tests, we found that the control campaign resulted in more sales and engagement from the visitors. More products were viewed from the control campaign, resulting in more products in the cart and more sales. But the conversation rate of products in the cart is higher in the test campaign. The test campaign resulted in more sales according to the products viewed and added to the cart. And the control campaign results in more sales overall. So, the Test campaign can be used to market a specific product to a specific audience, and the Control campaign can be used to market multiple products to a wider audience.