# A/B Testing

A/B Testing means analyzing two marketing strategies to choose the best marketing strategy that can convert more traffic into sales (or more traffic into your desired goal) effectively and efficiently. 

In A/B testing, we analyze the results of two marketing strategies to choose the best one for future marketing campaigns. For example, when I started an ad campaign on Instagram to promote my Instagram post for the very first time, my target audience was different from the target audience of my second ad campaign. After analyzing the results of both ad campaigns, I always preferred the audience of the second ad campaign as it gave better reach and followers than the first one.  

That is what A/B testing means. Your goal can be to boost sales, followers, or traffic, but when we choose the best marketing strategy according to the results of our previous marketing campaigns, it is nothing but A/B testing.  

In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = 'plotly_white'

# the data needs to be read with delimiter
control_data = pd.read_csv('control_group.csv', sep=';')
test_data = pd.read_csv('test_group.csv', sep=';')

print(control_data.head())
print(test_data.head())


      Campaign Name       Date  Spend [USD]  # of Impressions     Reach  \
0  Control Campaign  1.08.2019         2280           82702.0   56930.0   
1  Control Campaign  2.08.2019         1757          121040.0  102513.0   
2  Control Campaign  3.08.2019         2343          131711.0  110862.0   
3  Control Campaign  4.08.2019         1940           72878.0   61235.0   
4  Control Campaign  5.08.2019         1835               NaN       NaN   

   # of Website Clicks  # of Searches  # of View Content  # of Add to Cart  \
0               7016.0         2290.0             2159.0            1819.0   
1               8110.0         2033.0             1841.0            1219.0   
2               6508.0         1737.0             1549.0            1134.0   
3               3065.0         1042.0              982.0            1183.0   
4                  NaN            NaN                NaN               NaN   

   # of Purchase  
0          618.0  
1          511.0  
2          372.0  
3   

## Data Preparation
The datasets have some errors in column names. Let’s give new column names before moving forward:

Ways to rename columns:

1. Using df.columns = ['a','b','c'] can rename column names
2. df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)


In [2]:
list(control_data.columns)
list(test_data.columns)

['Campaign Name',
 'Date',
 'Spend [USD]',
 '# of Impressions',
 'Reach',
 '# of Website Clicks',
 '# of Searches',
 '# of View Content',
 '# of Add to Cart',
 '# of Purchase']

In [3]:
control_data.columns = ['Campaign Name',
                        'Date', 'Amount Spent', 
                        'Number of Impressions', 'Reach',
                        'Website Clicks',
                        'Searches', 'Content Viewed',
                        'Added to Cart', 'Purchases']
test_data.columns = ['Campaign Name',
 'Date', 'Amount Spent',
 'Number of Impressions',
 'Reach', 'Website Clicks',
 'Searches', 'Content Viewed',
 'Added to Cart', 'Purchases']

In [4]:
list(control_data.columns)
list(test_data.columns)

['Campaign Name',
 'Date',
 'Amount Spent',
 'Number of Impressions',
 'Reach',
 'Website Clicks',
 'Searches',
 'Content Viewed',
 'Added to Cart',
 'Purchases']

Now let’s see if the datasets have null values or not:

In [5]:
print(control_data.isna().sum())

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    1
Reach                    1
Website Clicks           1
Searches                 1
Content Viewed           1
Added to Cart            1
Purchases                1
dtype: int64


In [6]:
print(test_data.isna().sum())

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    0
Reach                    0
Website Clicks           0
Searches                 0
Content Viewed           0
Added to Cart            0
Purchases                0
dtype: int64


The columns below has na values, lets use average values to imputate them:
Number of Impressions    1
Reach                    1
Website Clicks           1
Searches                 1
Content Viewed           1
Added to Cart            1
Purchases                1

Ways to fill NA values:  

1. filling one column with mean
df['col1'] = df['col1'].fillna(df['col1'].mean())

2. filling multiple columns with mean
df[['col1', 'col2']] = df[['col1', 'col2']].fillna(df[['col1', 'col2']].mean()

3. filling all columns with mean
df = df.fillna(df.mean())

In [10]:
control_data['Number of Impressions'].fillna(control_data['Number of Impressions'].mean(), inplace=True)
control_data['Reach'].fillna(control_data['Reach'].mean(), inplace=True)
control_data['Website Clicks'].fillna(control_data['Website Clicks'].mean(), inplace=True)
control_data['Searches'].fillna(control_data['Searches'].mean(), inplace=True)
control_data['Content Viewed'].fillna(control_data['Content Viewed'].mean(), inplace=True)
control_data['Added to Cart'].fillna(control_data['Added to Cart'].mean(), inplace=True)
control_data['Purchases'].fillna(control_data['Purchases'].mean(), inplace=True)


In [11]:
print(control_data.isna().sum())

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    0
Reach                    0
Website Clicks           0
Searches                 0
Content Viewed           0
Added to Cart            0
Purchases                0
dtype: int64


Before we starts the analyzing phase, I did a research on the difference between Number of Impressions vs Searches:

#### What’s the difference between reach vs. impressions?
**Reach** is the total number of people who see your content. 
**Impressions** are the number of times your content is displayed, no matter if it was clicked or not.

Think of reach as the number of unique people who see your content. In a perfect world, every one of your followers would see every piece of content you posted. Unfortunately, that’s not how things work on social, and not all of your followers will see every single post you publish. 
For instance, Groupon has 17 million followers, but their organic content doesn’t come close to getting that number of engagements because only a fraction of their audience sees it.

However, an impression means that content was delivered to someone’s feed. A viewer doesn’t have to engage with the post in order for it to count as an impression. Also, one person could have multiple impressions for a single piece of content.
For example, a Facebook post could show up in the News Feed from the original publisher and appear again when a friend shares the publisher’s post. If you saw both forms of activity in your feed, that counts as two impressions for the same post.

#### In order to compare the data, lets merge the dataset:

In [13]:
print(control_data.head())
print(test_data.head())

      Campaign Name       Date  Amount Spent  Number of Impressions  \
0  Control Campaign  1.08.2019          2280           82702.000000   
1  Control Campaign  2.08.2019          1757          121040.000000   
2  Control Campaign  3.08.2019          2343          131711.000000   
3  Control Campaign  4.08.2019          1940           72878.000000   
4  Control Campaign  5.08.2019          1835          109559.758621   

           Reach  Website Clicks     Searches  Content Viewed  Added to Cart  \
0   56930.000000     7016.000000  2290.000000     2159.000000         1819.0   
1  102513.000000     8110.000000  2033.000000     1841.000000         1219.0   
2  110862.000000     6508.000000  1737.000000     1549.000000         1134.0   
3   61235.000000     3065.000000  1042.000000      982.000000         1183.0   
4   88844.931034     5320.793103  2221.310345     1943.793103         1300.0   

    Purchases  
0  618.000000  
1  511.000000  
2  372.000000  
3  340.000000  
4  522.79310

In [19]:
data = control_data.merge(test_data, how='outer').sort_values(by='Date') # how=Outer is like outer join in sql
data = data.reset_index(drop=True)
print(data.head())

      Campaign Name        Date  Amount Spent  Number of Impressions    Reach  \
0  Control Campaign   1.08.2019          2280                82702.0  56930.0   
1     Test Campaign   1.08.2019          3008                39550.0  35820.0   
2     Test Campaign  10.08.2019          2790                95054.0  79632.0   
3  Control Campaign  10.08.2019          2149               117624.0  91257.0   
4     Test Campaign  11.08.2019          2420                83633.0  71286.0   

   Website Clicks  Searches  Content Viewed  Added to Cart  Purchases  
0          7016.0    2290.0          2159.0         1819.0      618.0  
1          3038.0    1946.0          1069.0          894.0      255.0  
2          8125.0    2312.0          1804.0          424.0      275.0  
3          2277.0    2475.0          1984.0         1629.0      734.0  
4          3750.0    2893.0          2617.0         1075.0      668.0  




In [20]:
print(data["Campaign Name"].value_counts())

Control Campaign    30
Test Campaign       30
Name: Campaign Name, dtype: int64


In [28]:
figure = px.scatter(data, y='Amount Spent', x='Number of Impressions',
                   title='Amount Spent vs Number of Impressions',
                   color='Campaign Name',
                   trendline="ols",
                   size='Amount Spent')
figure.show()

From this chart we can see that the amount spent on both the campaign is similar, but the number of impressions on the the control campaign is higher.
#### Control campaign win 1

In [30]:
figure = px.scatter(data, y='Amount Spent', x='Reach',
                   title='Amount Spent vs Reach',
                   color='Campaign Name',
                   trendline="ols",
                   size='Amount Spent')
figure.show()

Once again, we can see that the amount spent on both the campaign is similar, but the number of reaches on the control campaign is higher.
#### Control campaign win 2

In [31]:
list(control_data.columns)
list(test_data.columns)

['Campaign Name',
 'Date',
 'Amount Spent',
 'Number of Impressions',
 'Reach',
 'Website Clicks',
 'Searches',
 'Content Viewed',
 'Added to Cart',
 'Purchases']

In [58]:
total = round(data.groupby('Campaign Name')['Website Clicks'].sum().reset_index())
print(total)

figure = px.bar(total, x='Campaign Name', y='Website Clicks',
               title='Number of Clicks for Each Campaign',
               text='Website Clicks',
               color='Campaign Name')
figure.show()

      Campaign Name  Website Clicks
0  Control Campaign        159624.0
1     Test Campaign        180970.0


In [59]:
total = round(data.groupby('Campaign Name')['Searches'].sum().reset_index())
print(total)

figure = px.bar(total, x='Campaign Name', y='Searches',
               title='Number of Searches for Each Campaign',
               text='Searches',
               color='Campaign Name')
figure.show()

      Campaign Name  Searches
0  Control Campaign   66639.0
1     Test Campaign   72569.0


Although control campaign has higher number of impressions and reach, the website clicks and searches are higher for test campaign.

#### Test Campaign Win 2

Now let’s have a look at the amount of content viewed after reaching the website from both campaigns:

In [57]:
total = round(data.groupby('Campaign Name')['Content Viewed'].sum().reset_index())
print(total)

figure = px.bar(total, x='Campaign Name', y='Content Viewed',
               title='Content Viewed for Each Campaign',
               text='Content Viewed',
               color='Campaign Name')

figure.show()

      Campaign Name  Content Viewed
0  Control Campaign         58314.0
1     Test Campaign         55740.0


Although the website clicks and searches are higher for test campaign, slightly more people viewed the content through control campaign.

#### Control Campaign Win 3

Now lets look at how the content brings customer to adding products to their cart:

In [64]:
figure = px.scatter(data, x='Added to Cart', y='Content Viewed',
                   title='Added to Cart vs Content Viewed',
                   color='Campaign Name',
                   trendline="ols",
                   size='Added to Cart')
figure.show()

More people added the product to cart through control campaign!

#### Control Campaign Win 4

Before get to the most important part where we see how many purchases were made after added to cart for both campaign, lets look at the purchases for each campaign:


In [65]:
total = round(data.groupby('Campaign Name')['Purchases'].sum().reset_index())
print(total)

figure = px.bar(total, x='Campaign Name', y='Purchases',
               title='Total Purchases for Each Campaign',
               text='Purchases',
               color='Campaign Name')

figure.show()

      Campaign Name  Purchases
0  Control Campaign    15684.0
1     Test Campaign    15637.0


Purchases for both of the campaigns generate very little difference with control campaign slightly higher than test campaign.

#### Control Campaign Win 5

Now lets get to the most important part where we see how many purchases were made after added to cart for both campaign:

In [63]:
figure = px.scatter(data, x='Purchases', y='Added to Cart',
                   title='Added to Cart vs Purchases',
                   color='Campaign Name',
                   trendline="ols",
                   size='Purchases')
figure.show()

Although the control campaign resulted in more sales and more products in the cart, the conversion rate of the test campaign is higher.

### Test campaign win 3


## Conclusion

### Control campaign
From the above A/B tests, we found that the control campaign resulted in more sales and engagement from the visitors. More products were viewed from the control campaign, resulting in more products in the cart and more sales overall. 
The Control campaign can be used to market multiple products to a wider audience.

### Test campaign
The conversation rate of products in the cart is higher in the test campaign. The test campaign resulted in more sales according to the products viewed and added to the cart. 
The Test campaign can be used to market a specific product to a specific audience.