# Dataset on kaggle our any data source that illustrate two versions
### A/B Testing DataSet
### https://www.kaggle.com/datasets/amirmotefaker/ab-testing-dataset

# Features of the Dataset
| **Feature**             | **Description**                                                 |
|-------------------------|-----------------------------------------------------------------|
| Campaign Name           | Name of the campaign (test or control)                          |
| Date                    | Date of the record                                              |
| Spend                   | Amount spent on the campaign (in dollars)                       |
| # of Impressions        | Total number of times the ad was shown                          |
| Reach                   | Number of unique users who saw the ad                           |
| # of Website Clicks     | Number of clicks leading to the website                         |
| # of Searches           | Number of users performing searches on the website              |
| # of View Content       | Number of users viewing content/products on the website         |
| # of Add to Cart        | Number of users adding products to their cart                   |
| # of Purchase           | Number of purchases made 

# Context of the A/B testing

In [None]:
# The goal of the A/B is to determine which campaign performed better, an assess which marketing strategy was more effective

## Hypothesis
# NULL HYPOTHESIS - None of the campaigns performed differently
# ALTERNATE HYPOTHESIS - one of the campaigns performed better

In [20]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
control_df = pd.read_csv("/Users/strangemax/Library/Mobile Documents/iCloud~AsheKube~Carnets/Documents/Alt/alt/data/A:B testing/control_group.csv", sep=';', parse_dates=['Date']) # read control csv
test_df = pd.read_csv("/Users/strangemax/Library/Mobile Documents/iCloud~AsheKube~Carnets/Documents/Alt/alt/data/A:B testing/test_group.csv", sep=';', parse_dates=['Date']) # read test csv

In [None]:
control_df.isna().sum() #check number of missing values

Campaign Name          0
Date                   0
Spend [USD]            0
# of Impressions       1
Reach                  1
# of Website Clicks    1
# of Searches          1
# of View Content      1
# of Add to Cart       1
# of Purchase          1
dtype: int64

In [None]:
control_df.interpolate(method='linear', inplace=True) # fill in missing value

  control_df.interpolate(method='linear', inplace=True)


In [None]:
test_df.isna().sum() # check number of missing value in test df

Campaign Name          0
Date                   0
Spend [USD]            0
# of Impressions       0
Reach                  0
# of Website Clicks    0
# of Searches          0
# of View Content      0
# of Add to Cart       0
# of Purchase          0
dtype: int64

In [30]:
# give summary statistic
summary_stats = pd.DataFrame({
    'Campaign': ['Control', 'Test'],
    'Average Spend': [control_df['Spend [USD]'].mean(), test_df['Spend [USD]'].mean()],
    'Total Spend': [control_df['Spend [USD]'].sum(), test_df['Spend [USD]'].sum()],
    'Total Purchases': [control_df['# of Purchase'].sum(), test_df['# of Purchase'].sum()]
})

print(summary_stats)

  Campaign  Average Spend  Total Spend  Total Purchases
0  Control    2288.433333        68653          15713.0
1     Test    2563.066667        76892          15637.0


In [None]:
# conclusion
# The Test campaign had a higher average spend and total spend than the control campaign but the control did have a higher total purchase than the test
# This means that the control did better in converting the amount spent into purchases than the test


In [31]:
# Add cost per purchase to summary

control_cost_per_purchase = control_df['Spend [USD]'].sum() / control_df['# of Purchase'].sum()
test_cost_per_purchase = test_df['Spend [USD]'].sum() / test_df['# of Purchase'].sum()

summary_stats = pd.DataFrame({
    'Campaign': ['Control', 'Test'],
    'Average Spend': [control_df['Spend [USD]'].mean(), test_df['Spend [USD]'].mean()],
    'Total Spend': [control_df['Spend [USD]'].sum(), test_df['Spend [USD]'].sum()],
    'Total Purchases': [control_df['# of Purchase'].sum(), test_df['# of Purchase'].sum()],
    'Cost per Purchase': [control_cost_per_purchase, test_cost_per_purchase]  # Add the new column
})

print(summary_stats)

  Campaign  Average Spend  Total Spend  Total Purchases  Cost per Purchase
0  Control    2288.433333        68653          15713.0           4.369185
1     Test    2563.066667        76892          15637.0           4.917312


In [None]:
# The cost per purchase as a KPI indicated that that it cost the control campaign is more efficient in generating sales in terms of cost per purchase