## Main Goal
A company recently introduced a new bidding type, “average bidding”, as an alternative to its exisiting bidding type, called “maximum bidding”!  
One of our clients, TEST.com, has decided to test this new feature and wants to conduct an A/B test to understand if average bidding brings more conversions than maximum bidding.

The A/B test has run for 1 month and TEST.com now expects you to analyze and present the results of this A/B test.

> **A/B testing** helps in finding a better approach to finding customers, marketing products, getting a higher reach, or anything that helps a business convert most of its target customers into actual customers.

#### Hypotheses:

**Null Hypothesis (H₀)**: There is no significant difference in conversions between average bidding and maximum bidding.  
**Alternative Hypothesis (H₁)**: Average bidding leads to significantly more conversions than maximum bidding.

### Data Understanding
The dataset has the following features:

1. **Campaign Name**: The name of the campaign
2. **Date**: Date of the record
3. **Spend**: Amount spent on the campaign in dollars
4. **of Impressions**: Number of impressions the ad crossed through the campaign
5. **Reach**: The number of unique impressions received in the ad
6. **of Website Clicks**: Number of website clicks received through the ads
7. **of Searches**: Number of users who performed searches on the website 
8. **of View Content**: Number of users who viewed content and products on the website
9. **of Add to Cart**: Number of users who added products to the cart
10. **of Purchase**: Number of purchases

The dataset can be found at: [https://www.kaggle.com/datasets/ilkeryildiz/example-dataset-for-ab-test]

In [1]:
import pandas as pd

In [5]:
control_data = pd.read_csv("datasets/ab/control_group.csv", sep=";")
test_data = pd.read_csv("datasets/ab/test_group.csv", sep=";")

In [6]:
control_data.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [7]:
test_data.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


### Data Preprocessing
Some feature's names are incorrect, let's fix them first!

In [8]:
new_columnNames = ['Campaign Name', 'Date', 'Amount Spent', 'Number of Impressions', 'Reach', 'Number of Website Clicks', 'Number of Searches', 
                        'Number of View Content', 'Number of Add to Cart', 'Number of Purchase']
control_data.columns = new_columnNames
test_data.columns = new_columnNames

In [None]:
control_data.head(2)

Unnamed: 0,Campaign Name,Date,Amount Spent,Number of Impressions,Reach,Number of Website Clicks,Number of Searches,Number of View Content,Number of Add to Cart,Number of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677


In [15]:
# Checking Null Values
test_data.isnull().sum()

Campaign Name               0
Date                        0
Amount Spent                0
Number of Impressions       0
Reach                       0
Number of Website Clicks    0
Number of Searches          0
Number of View Content      0
Number of Add to Cart       0
Number of Purchase          0
dtype: int64

In [16]:
control_data.isnull().sum()

Campaign Name               0
Date                        0
Amount Spent                0
Number of Impressions       1
Reach                       1
Number of Website Clicks    1
Number of Searches          1
Number of View Content      1
Number of Add to Cart       1
Number of Purchase          1
dtype: int64

In [None]:
# Dealing With Outliers by filling them with the average value of each respective column

control_data['Number of Impressions'] = control_data['Number of Impressions'].fillna(value=control_data['Number of Impressions'].mean())
control_data['Reach'] = control_data['Reach'].fillna(value=control_data['Reach'].mean())
control_data['Number of Website Clicks'] = control_data['Number of Website Clicks'].fillna(value=control_data['Number of Website Clicks'].mean())
control_data['Number of Searches'] = control_data['Number of Searches'].fillna(value=control_data['Number of Searches'].mean())
control_data['Number of View Content'] = control_data['Number of View Content'].fillna(value=control_data['Number of View Content'].mean())
control_data['Number of Add to Cart'] = control_data['Number of Add to Cart'].fillna(value=control_data['Number of Add to Cart'].mean())
control_data['Number of Purchase'] = control_data['Number of Purchase'].fillna(value=control_data['Number of Purchase'].mean())

In [19]:
control_data.isnull().sum()

Campaign Name               0
Date                        0
Amount Spent                0
Number of Impressions       0
Reach                       0
Number of Website Clicks    0
Number of Searches          0
Number of View Content      0
Number of Add to Cart       0
Number of Purchase          0
dtype: int64