Dataset Obtained : https://www.kaggle.com/datasets/ilkeryildiz/example-dataset-for-ab-test


A/B testing helps in finding a better approach to finding customers, marketing products, getting a higher reach, or anything that helps a business convert most of its target customers into actual customers. A/B Testing means analyzing two marketing strategies to choose the best marketing strategy that can convert more traffic into sales (or more traffic into your desired goal) effectively and efficiently. 

**About Dataset**

1. Campaign Name: The name of the campaign
2. Date: Date of the record
3. Spend: Amount spent on the campaign in dollars
4. Number of Impressions: Number of impressions the ad crossed through the campaign
5. Reach: The number of unique impressions received in the ad
6. Number of Website Clicks: Number of website clicks received through the ads
7. Number of Searches: Number of users who performed searches on the website 
8. Number of View Content: Number of users who viewed content and products on the website
9. Number of Add to Cart: Number of users who added products to the cart
10. Number of Purchase: Number of purchases

Two campaigns were performed by the company:

1. Control Campaign
2. Test Campaign

**TASK :** Perform A/B testing to find the best campaign for the company to get more customers.

In [None]:
#Importing necessary libraries
import pandas as pd
import datetime
from datetime import date, timedelta
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"

control_data = pd.read_csv("control_group.csv", sep = ";")
test_data = pd.read_csv("test_group.csv", sep = ";")

In [None]:
# viewing top 5 rows in control dataset
print(control_data.head())

      Campaign Name       Date  Spend [USD]  # of Impressions     Reach  \
0  Control Campaign  1.08.2019         2280           82702.0   56930.0   
1  Control Campaign  2.08.2019         1757          121040.0  102513.0   
2  Control Campaign  3.08.2019         2343          131711.0  110862.0   
3  Control Campaign  4.08.2019         1940           72878.0   61235.0   
4  Control Campaign  5.08.2019         1835               NaN       NaN   

   # of Website Clicks  # of Searches  # of View Content  # of Add to Cart  \
0               7016.0         2290.0             2159.0            1819.0   
1               8110.0         2033.0             1841.0            1219.0   
2               6508.0         1737.0             1549.0            1134.0   
3               3065.0         1042.0              982.0            1183.0   
4                  NaN            NaN                NaN               NaN   

   # of Purchase  
0          618.0  
1          511.0  
2          372.0  
3   

In [None]:
# viewing top 5 rows in test dataset
print(test_data.head())

   Campaign Name       Date  Spend [USD]  # of Impressions  Reach  \
0  Test Campaign  1.08.2019         3008             39550  35820   
1  Test Campaign  2.08.2019         2542            100719  91236   
2  Test Campaign  3.08.2019         2365             70263  45198   
3  Test Campaign  4.08.2019         2710             78451  25937   
4  Test Campaign  5.08.2019         2297            114295  95138   

   # of Website Clicks  # of Searches  # of View Content  # of Add to Cart  \
0                 3038           1946               1069               894   
1                 4657           2359               1548               879   
2                 7885           2572               2367              1268   
3                 4216           2216               1437               566   
4                 5863           2106                858               956   

   # of Purchase  
0            255  
1            677  
2            578  
3            340  
4            768  


In [None]:
#Data Preparation
def to_clean(val):
    return val.strip().lower().replace("# ", "").replace("of ", "").replace(" ","_").replace("[usd]", "usd")
control_data.rename(columns=to_clean, inplace= True)
test_data.rename(columns=to_clean, inplace= True)


In [None]:
control_data.head()

Unnamed: 0,campaign_name,date,spend_usd,impressions,reach,website_clicks,searches,view_content,add_to_cart,purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [None]:
test_data.head()

Unnamed: 0,campaign_name,date,spend_usd,impressions,reach,website_clicks,searches,view_content,add_to_cart,purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


In [None]:
control_data.shape

(30, 10)

In [None]:
test_data.shape

(30, 10)

In [None]:
control_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   campaign_name   30 non-null     object 
 1   date            30 non-null     object 
 2   spend_usd       30 non-null     int64  
 3   impressions     29 non-null     float64
 4   reach           29 non-null     float64
 5   website_clicks  29 non-null     float64
 6   searches        29 non-null     float64
 7   view_content    29 non-null     float64
 8   add_to_cart     29 non-null     float64
 9   purchase        29 non-null     float64
dtypes: float64(7), int64(1), object(2)
memory usage: 2.5+ KB


In [None]:
#Checking for null values 
print(control_data.isnull().sum())

campaign_name     0
date              0
spend_usd         0
impressions       1
reach             1
website_clicks    1
searches          1
view_content      1
add_to_cart       1
purchase          1
dtype: int64


In [None]:
#Checking for null values 
print(test_data.isnull().sum())

campaign_name     0
date              0
spend_usd         0
impressions       0
reach             0
website_clicks    0
searches          0
view_content      0
add_to_cart       0
purchase          0
dtype: int64


In [None]:
#Control campaign has missing values in a row. 
#Fill in missing values by the mean value of each column

control_data["impressions"].fillna(value=control_data["impressions"].mean(), 
                                             inplace=True)
control_data["reach"].fillna(value=control_data["reach"].mean(), 
                             inplace=True)
control_data["website_clicks"].fillna(value=control_data["website_clicks"].mean(), 
                                      inplace=True)
control_data["searches"].fillna(value=control_data["searches"].mean(), 
                                         inplace=True)
control_data["view_content"].fillna(value=control_data["view_content"].mean(), 
                                      inplace=True)
control_data["add_to_cart"].fillna(value=control_data["add_to_cart"].mean(), 
                                     inplace=True)
control_data["purchase"].fillna(value=control_data["purchase"].mean(), 
                                 inplace=True)

In [None]:
# create a new dataset by merging both datasets
# so that one data set is placed below the other data 
ab_data = control_data.merge(test_data, how="outer").sort_values(["date"])
ab_data = ab_data.reset_index(drop=True)
print(ab_data.head())

      campaign_name        date  spend_usd  impressions    reach  \
0  Control Campaign   1.08.2019       2280      82702.0  56930.0   
1     Test Campaign   1.08.2019       3008      39550.0  35820.0   
2     Test Campaign  10.08.2019       2790      95054.0  79632.0   
3  Control Campaign  10.08.2019       2149     117624.0  91257.0   
4     Test Campaign  11.08.2019       2420      83633.0  71286.0   

   website_clicks  searches  view_content  add_to_cart  purchase  
0          7016.0    2290.0        2159.0       1819.0     618.0  
1          3038.0    1946.0        1069.0        894.0     255.0  
2          8125.0    2312.0        1804.0        424.0     275.0  
3          2277.0    2475.0        1984.0       1629.0     734.0  
4          3750.0    2893.0        2617.0       1075.0     668.0  


  ab_data = control_data.merge(test_data, how="outer").sort_values(["date"])


In [None]:
#Check if the dataset has an equal number of samples about both campaigns
print(ab_data["campaign_name"].value_counts())

Control Campaign    30
Test Campaign       30
Name: campaign_name, dtype: int64


The dataset has 30 samples for each campaign.

**A/B Testing to Find the Best Marketing Strategy**

1. Analyze the relationship between the number of impressions we got from both campaigns and the amount spent on both campaigns

In [None]:
figure = px.scatter(data_frame = ab_data, 
                    x="impressions",
                    y="spend_usd", 
                    size="spend_usd", 
                    color= "campaign_name", 
                    trendline="ols")
figure.show()

The control campaign resulted in more impressions according to the amount spent on both campaigns.

**A/B Testing to Find the Best Marketing Strategy**

2. Look at the number of searches performed on the website from both campaigns

In [None]:
label = ["Total Searches from Control Campaign", 
         "Total Searches from Test Campaign"]
counts = [sum(control_data["searches"]), 
          sum(test_data["searches"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Searches')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

The test campaign resulted in more searches on the website. 

**A/B Testing to Find the Best Marketing Strategy**

3. Look at the number of website clicks from both campaigns

In [None]:
label = ["Website Clicks from Control Campaign", 
         "Website Clicks from Test Campaign"]
counts = [sum(control_data["website_clicks"]), 
          sum(test_data["website_clicks"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Website Clicks')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

The test campaign wins in the number of website clicks.

**A/B Testing to Find the Best Marketing Strategy**

4. Look at the amount of content viewed after reaching the website from both campaigns



In [None]:
label = ["Content Viewed from Control Campaign", 
         "Content Viewed from Test Campaign"]
counts = [sum(control_data["view_content"]), 
          sum(test_data["view_content"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Content Viewed')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

The audience of the control campaign viewed more content than the test campaign. Although there is not much difference, as the website clicks of the control campaign were low, its engagement on the website is higher than the test campaign

**A/B Testing to Find the Best Marketing Strategy**

5. Look at the number of products added to the cart from both campaigns

In [None]:
label = ["Products Added to Cart from Control Campaign", 
         "Products Added to Cart from Test Campaign"]
counts = [sum(control_data["add_to_cart"]), 
          sum(test_data["add_to_cart"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Added to Cart')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

Despite low website clicks more products were added to the cart from the control campaign.

**A/B Testing to Find the Best Marketing Strategy**

6. Look at the amount spent on both campaigns

In [None]:
label = ["Amount Spent in Control Campaign", 
         "Amount Spent in Test Campaign"]
counts = [sum(control_data["spend_usd"]), 
          sum(test_data["spend_usd"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Amount Spent')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

The amount spent on the test campaign is higher than the control campaign. But as we can see that the control campaign resulted in more content views and more products in the cart, the control campaign is more efficient than the test campaign.

**A/B Testing to Find the Best Marketing Strategy**

7. Let us look at the purchases made by both campaigns

In [None]:
label = ["Purchases Made by Control Campaign", 
         "Purchases Made by Test Campaign"]
counts = [sum(control_data["purchase"]), 
          sum(test_data["purchase"])]
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Purchases')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

There’s only a difference of around 1% in the purchases made from both ad campaigns. As the Control campaign resulted in more sales in less amount spent on marketing, the control campaign wins here!

**Now let’s analyze some metrics to find which ad campaign converts more.**

In [None]:
#Analyze relationship between the number of website clicks and content viewed from both campaigns
figure = px.scatter(data_frame = ab_data, 
                    x="view_content",
                    y="website_clicks", 
                    size="website_clicks", 
                    color= "campaign_name", 
                    trendline="ols")
figure.show()

The website clicks are higher in the test campaign, but the engagement from website clicks is higher in the control campaign. So the control campaign wins!

In [None]:
#Analyze the relationship between the amount of content viewed and the number of products added to the cart from both campaigns
figure = px.scatter(data_frame = ab_data, 
                    x="add_to_cart",
                    y="view_content", 
                    size="add_to_cart", 
                    color= "campaign_name", 
                    trendline="ols")
figure.show()


Again, the control campaign wins! 

In [None]:
# Analyze the relationship between the number of products added to the cart and the number of sales from both campaigns
figure = px.scatter(data_frame = ab_data, 
                    x="purchase",
                    y="add_to_cart", 
                    size="purchase", 
                    color= "campaign_name", 
                    trendline="ols")
figure.show()

Although the control campaign resulted in more sales and more products in the cart, the conversation rate of the test campaign is higher.

**Conclusion**

From the above A/B tests, it can be found that the control campaign resulted in more sales and engagement from the visitors. More products were viewed from the control campaign, resulting in more products in the cart and more sales. But the conversation rate of products in the cart is higher in the test campaign. The test campaign resulted in more sales according to the products viewed and added to the cart. And the control campaign results in more sales overall. So, the Test campaign can be used to market a specific product to a specific audience, and the Control campaign can be used to market multiple products to a wider audience.