### Introduction:
This is a practice of A/B test to evaluate the performance of two distinct advertising campaigns – the Control and Test Campaigns. The objective was to decipher the effectiveness of each campaign variant in driving user interaction, product engagement, and conversion rates on their website. Through meticulous analysis of key performance metrics, including search behavior, website clicks, content views, cart additions, and purchase activities, the study aimed to identify nuanced differences between the campaigns.

### 1. Ask: 
* How did the changes in user behavior (searches, clicks, content views, cart additions, purchases) affect the overall engagement between the control and test campaigns?
* How did the conversion rate vary between the control and test campaigns?
* Considering the impressions, spend, and actions taken by users, which campaign demonstrated better efficiency in terms of cost and performance?
* What correlations can we identify between various user actions (content views, cart additions, purchases) and how did they differ between the campaigns?

### 2. Prepare

#### 2.1 Import packages

In [1]:
import pandas as pd
import datetime 
from datetime import date, timedelta #'timedelta' represents a duration or difference between two dates or times.
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"

#### 2.2 Import data

In [2]:
control_data = pd.read_csv("/kaggle/input/ab-test/control_group.csv", sep = ";")
test_data = pd.read_csv("/kaggle/input/ab-test/test_group.csv", sep = ";")

#### 2.3 Get the insight of datasets

In [3]:
print(control_data.head())

      Campaign Name       Date  Spend [USD]  # of Impressions     Reach  \
0  Control Campaign  1.08.2019         2280           82702.0   56930.0   
1  Control Campaign  2.08.2019         1757          121040.0  102513.0   
2  Control Campaign  3.08.2019         2343          131711.0  110862.0   
3  Control Campaign  4.08.2019         1940           72878.0   61235.0   
4  Control Campaign  5.08.2019         1835               NaN       NaN   

   # of Website Clicks  # of Searches  # of View Content  # of Add to Cart  \
0               7016.0         2290.0             2159.0            1819.0   
1               8110.0         2033.0             1841.0            1219.0   
2               6508.0         1737.0             1549.0            1134.0   
3               3065.0         1042.0              982.0            1183.0   
4                  NaN            NaN                NaN               NaN   

   # of Purchase  
0          618.0  
1          511.0  
2          372.0  
3   

In [4]:
print(test_data.head())

   Campaign Name       Date  Spend [USD]  # of Impressions  Reach  \
0  Test Campaign  1.08.2019         3008             39550  35820   
1  Test Campaign  2.08.2019         2542            100719  91236   
2  Test Campaign  3.08.2019         2365             70263  45198   
3  Test Campaign  4.08.2019         2710             78451  25937   
4  Test Campaign  5.08.2019         2297            114295  95138   

   # of Website Clicks  # of Searches  # of View Content  # of Add to Cart  \
0                 3038           1946               1069               894   
1                 4657           2359               1548               879   
2                 7885           2572               2367              1268   
3                 4216           2216               1437               566   
4                 5863           2106                858               956   

   # of Purchase  
0            255  
1            677  
2            578  
3            340  
4            768  


##### We can see that some of the values in control_data are float, but they are int in test_data. Let's change the name of the columns and the values to the same data type.

### 3. Process

#### 3.1 Correct the column names

In [5]:
control_data.columns = ["name", "date", "spend", "impressions", "reach", "clicks", "searches", "views", "cart", "purchase"]
test_data.columns = ["name", "date", "spend", "impressions", "reach", "clicks", "searches", "views", "cart", "purchase"]

#### Double check

In [6]:
print(control_data.head())
print(test_data.head())

               name       date  spend  impressions     reach  clicks  \
0  Control Campaign  1.08.2019   2280      82702.0   56930.0  7016.0   
1  Control Campaign  2.08.2019   1757     121040.0  102513.0  8110.0   
2  Control Campaign  3.08.2019   2343     131711.0  110862.0  6508.0   
3  Control Campaign  4.08.2019   1940      72878.0   61235.0  3065.0   
4  Control Campaign  5.08.2019   1835          NaN       NaN     NaN   

   searches   views    cart  purchase  
0    2290.0  2159.0  1819.0     618.0  
1    2033.0  1841.0  1219.0     511.0  
2    1737.0  1549.0  1134.0     372.0  
3    1042.0   982.0  1183.0     340.0  
4       NaN     NaN     NaN       NaN  
            name       date  spend  impressions  reach  clicks  searches  \
0  Test Campaign  1.08.2019   3008        39550  35820    3038      1946   
1  Test Campaign  2.08.2019   2542       100719  91236    4657      2359   
2  Test Campaign  3.08.2019   2365        70263  45198    7885      2572   
3  Test Campaign  4.08.

#### 3.2 Check the data type of each columns for both datasets

In [7]:
print(control_data.dtypes)
print(test_data.dtypes)

name            object
date            object
spend            int64
impressions    float64
reach          float64
clicks         float64
searches       float64
views          float64
cart           float64
purchase       float64
dtype: object
name           object
date           object
spend           int64
impressions     int64
reach           int64
clicks          int64
searches        int64
views           int64
cart            int64
purchase        int64
dtype: object


#### 3.3 Change the data type

In [8]:
control_data["spend"]=control_data["spend"].astype(float)
test_data["spend"]=test_data["spend"].astype(float)
test_data["impressions"] = test_data["impressions"].astype(float)
test_data["reach"] = test_data["reach"].astype(float)
test_data["clicks"] = test_data["clicks"].astype(float)
test_data["searches"] = test_data["searches"].astype(float)
test_data["views"] = test_data["views"].astype(float)
test_data["cart"] = test_data["cart"].astype(float)
test_data["purchase"] = test_data["purchase"].astype(float)

#### Double check

In [9]:
print(control_data.dtypes)
print(test_data.dtypes)

name            object
date            object
spend          float64
impressions    float64
reach          float64
clicks         float64
searches       float64
views          float64
cart           float64
purchase       float64
dtype: object
name            object
date            object
spend          float64
impressions    float64
reach          float64
clicks         float64
searches       float64
views          float64
cart           float64
purchase       float64
dtype: object


#### 3.4 Check for null values

In [10]:
print(control_data.isnull().sum())

name           0
date           0
spend          0
impressions    1
reach          1
clicks         1
searches       1
views          1
cart           1
purchase       1
dtype: int64


In [11]:
print(test_data.isnull().sum())

name           0
date           0
spend          0
impressions    0
reach          0
clicks         0
searches       0
views          0
cart           0
purchase       0
dtype: int64


#####  The control_data has some null values. Let's fill the null values by the mean value of each column: 
##### (Filling missing values with the mean helps preserve the statistical properties of the dataset. It prevents large disruptions in the distribution of data, especially in cases where the missing values might bias the results if removed entirely.)

In [12]:
control_data["impressions"].fillna(value=control_data["impressions"].mean(),
                                  inplace=True) #inplace: Optional, default False. If True: the replacing is done on the current DataFrame. If False: returns a copy where the replacing is done
control_data["reach"].fillna(value=control_data["reach"].mean(),
                                  inplace=True)
control_data["clicks"].fillna(value=control_data["clicks"].mean(),
                                  inplace=True)
control_data["searches"].fillna(value=control_data["searches"].mean(),
                                  inplace=True)
control_data["views"].fillna(value=control_data["views"].mean(),
                                  inplace=True)
control_data["cart"].fillna(value=control_data["cart"].mean(),
                                  inplace=True)
control_data["purchase"].fillna(value=control_data["purchase"].mean(),
                                  inplace=True)

#### Double Check

In [13]:
print(control_data.isnull().sum())

name           0
date           0
spend          0
impressions    0
reach          0
clicks         0
searches       0
views          0
cart           0
purchase       0
dtype: int64


#### 3.5 Merge two datasets into one

In [14]:
ab_data = control_data.merge(test_data, 
                             how="outer").sort_values(["date"])
ab_data = ab_data.reset_index(drop=True)
print(ab_data.head())

               name        date   spend  impressions    reach  clicks  \
0  Control Campaign   1.08.2019  2280.0      82702.0  56930.0  7016.0   
1     Test Campaign   1.08.2019  3008.0      39550.0  35820.0  3038.0   
2     Test Campaign  10.08.2019  2790.0      95054.0  79632.0  8125.0   
3  Control Campaign  10.08.2019  2149.0     117624.0  91257.0  2277.0   
4     Test Campaign  11.08.2019  2420.0      83633.0  71286.0  3750.0   

   searches   views    cart  purchase  
0    2290.0  2159.0  1819.0     618.0  
1    1946.0  1069.0   894.0     255.0  
2    2312.0  1804.0   424.0     275.0  
3    2475.0  1984.0  1629.0     734.0  
4    2893.0  2617.0  1075.0     668.0  


#### Double check to make sure the dataset has an equal number of samples about both campaigns

In [15]:
print(ab_data["name"].value_counts())

name
Control Campaign    30
Test Campaign       30
Name: count, dtype: int64


### 4. Analyze

#### 4.1 Compare the number of searches for both campaigns

In [16]:
label = ["Total Searches for Control Campaign", "Total Searches for Test Campaign"]
counts = [round(sum(control_data["searches"]), 2), 
          round(sum(test_data["searches"]), 2)]
colors = ["mediumturquoise", "lightgreen"]

fig = go.Figure(data=[go.Pie(values=counts, labels=label, sort=False)])
fig.update_layout(title_text='Control VS Test: Searches', title_x=0.5)
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
                  marker=dict(colors=colors, line=dict(color='#000000', width=2)))
fig.show()

##### From the graph, we can see that the test campaign resulted in more searches (52.1%).  

#### 4.2 Compare the website clicks for both campaigns

In [17]:
label = ["Total clicks for Control Campaign", "Total clicks for Test Campaign"]
counts = [round(sum(control_data["clicks"]), 2), 
          round(sum(test_data["clicks"]), 2)]
colors = ["mediumturquoise", "lightgreen"]

fig = go.Figure(data=[go.Pie(values=counts, labels=label, sort=False)])
fig.update_layout(title_text='Control VS Test: Clicks', title_x=0.5)
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
                  marker=dict(colors=colors, line=dict(color='#000000', width=2)))
fig.show()

##### From the graph, we can see that test campaign resulted in more website clicks (53.1%).

#### 4.3 Compare the Content Viewed for both campaigns

In [18]:
label = ["Content viewd for Control Campaign", "Content view for Test Campaign"]
counts = [round(sum(control_data["views"])),
         round(sum(test_data["views"]))]
colors = ["mediumturquoise", "lightgreen"]

fig = go.Figure(data=[go.Pie(values=counts, labels=label, sort=False)])
fig.update_layout(title_text='Control VS Test: Content Viewed', title_x=0.5)
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
                  marker=dict(colors=colors, line=dict(color='#000000', width=2)))
fig.show()

##### From the graph, we can see that control campaign resulted in more content viewed (51.1%).

#### 4.4 Compare the Number of Products Add to Cart for both campaigns

In [19]:
label = ["Products added to cart for Control Campaign", "Products added to cart for Test Campaign"]
counts = [round(sum(control_data["cart"])),
         round(sum(test_data["cart"]))]
colors = ["mediumturquoise", "lightgreen"]

fig = go.Figure(data=[go.Pie(values=counts, labels=label, sort=False)])
fig.update_layout(title_text='Control VS Test: Number of Products Add to Cart', title_x=0.5)
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
                  marker=dict(colors=colors, line=dict(color='#000000', width=2)))
fig.show()

##### From the graph, we can see that control campaign resulted in more products added to cart (59.6%).

#### 4.5 Compare the Products Purchased for both campaigns

In [20]:
label = ["Products Purchased for Control Campaign", "Products Purchased for Test Campaign"]
counts = [round(sum(control_data["purchase"])),
         round(sum(test_data["purchase"]))]
colors = ["mediumturquoise", "lightgreen"]

fig = go.Figure(data=[go.Pie(values=counts, labels=label, sort=False)])
fig.update_layout(title_text='Control VS Test: Products Purchased', title_x=0.5)
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
                  marker=dict(colors=colors, line=dict(color='#000000', width=2)))
fig.show()

##### From the graph, we can see that the products purchased are about the same for both test and control campaign, the difference is 0.2%. However, control campaign still resulted in more products purchased (50.1%).

#### 4.6 Relationship between the Number of Impressions and the Amount Spent on both campaigns

In [21]:
fig=px.scatter(data_frame = ab_data, 
                    x="impressions",
                    y="spend", 
                    size="spend", 
                    color= "name",
                    trendline="ols") #"ols" stands for "Ordinary Least Squares"
fig.update_layout(title_text="Number of Impressions VS Amount Spent", title_x=0.5,
                  xaxis_title="Number of Impressions",
                  yaxis_title="Amount Spent",
                  legend_title="Campaign Name")
fig.show()

##### From the graph, we can see that the control campaign resulted in more impressions according to the amount spent.

#### 4.7 Relationship between the Content Viewed and the Product Added to Cart on both campaigns

In [22]:
fig=px.scatter(data_frame = ab_data, 
                    x="views",
                    y="cart", 
                    size="cart", 
                    color= "name",
                    trendline="ols") 
fig.update_layout(title_text="Content Viewed VS Products Added to Cart", title_x=0.5,
                  xaxis_title="Content Viewed",
                  yaxis_title="Products Added to Cart",
                  legend_title="Campaign Name")
fig.show()

##### From the graph, we can see that the control campaign has bigger sizes, this indicates that for the control campaign, there are more products added to the cart. 
##### However, despite having smaller-sized markers, the test campaign demonstrates a more positive correlation between content views and products added to the cart, as indicated by the steeper trendline.

#### 4.8 Relationship between the Number of Products Added to Cart and the Number of Products Purchased on both campaigns

In [23]:
fig=px.scatter(data_frame = ab_data, 
                    x="cart",
                    y="purchase", 
                    size="purchase", 
                    color= "name",
                    trendline="ols") 
fig.update_layout(title_text="Products Added to Cart VS Products Purchased", title_x=0.5,
                  xaxis_title="Products Added to Cart",
                  yaxis_title="Products Purchased",
                  legend_title="Campaign Name")
fig.show()

##### From the graph, we can see that the test campaign has better convert rate, the more products added to the cart, the more products are purchased. 

### 5. Conclusion
Findings: 
1. Test campaign resulted in more searches.
2. Test campaign resulted in more website clicks.
3. Control campaign resulted in more content viewed.
4. Control campaign resulted in more products added to cart.
5. Control campaign resulted in more products purchased. 
6. Control campaign resulted in more impressions according to the amount spent.
7. Control campaign has more products added to the cart even with a smaller number of content views.
    Test campaign demonstrates a more positive correlation between content views and products added to the cart.
8. Test campaign has better convert rate. 

Conclusion:

The A/B test showed interesting contrasts between our control and test campaigns. While the test campaign had more user engagement in searches and clicks, the control campaign performed better in content views, cart additions, and purchases. Each campaign demonstrated unique strengths, with the test campaign emphasizing content views and the control campaign achieving higher conversions for specific actions. These insights shed light on the differing user behaviors and highlight opportunities for optimizing future strategies.