# Marketing A/B Testing Project

## Introduction
A/B testing, also known as split testing, is a common practice in marketing ad campaigns aimed at optimizing conversion rates and maximizing performance. It involves comparing two versions of a marketing asset, such as an advertisement or a landing page, by randomly assigning users to one of the versions and measuring their response. By analyzing the results, marketers can identify which version performs better in terms of predefined metrics, such as click-through rates, conversion rates, or user engagement. A/B testing provides valuable insights into the effectiveness of different marketing strategies, allowing marketers to make data-driven decisions and refine their campaigns to achieve the desired outcomes.

# Data
In this notebook, I demonstrate an A/B testing using Python for [an example ad campaign dataset on Kaggle](https://www.kaggle.com/datasets/faviovaz/marketing-ab-testing) where  the majority of the people will be exposed to ads (the experimental group). And a small portion of people (the control group) would instead see a Public Service Announcement (PSA) (or nothing) in the exact size and place the ad would normally be.

The idea of the dataset is to analyze the groups, find if the ads were successful, how much the company can make from the ads, and if the difference between the groups is statistically significant.

Data dictionary:

- Index: Row index  
- user id: User ID (unique)  
- test group: If "ad" the person saw the advertisement, if "psa" they only saw the public service announcement  
- converted: If a person bought the product then True, else is False  
- total ads: Amount of ads seen by person  
- most ads day: Day that the person saw the biggest amount of ads  
- most ads hour: Hour of day that the person saw the biggest amount of ads  

In [1]:
#import libraries
import numpy as np
import pandas as pd
from scipy.stats import ttest_ind

import matplotlib.pyplot as plt

In [2]:
#load the dataset
df = pd.read_csv('/kaggle/input/marketing-ab-testing/marketing_AB.csv', index_col=0)
df.head(10)

Unnamed: 0,user id,test group,converted,total ads,most ads day,most ads hour
0,1069124,ad,False,130,Monday,20
1,1119715,ad,False,93,Tuesday,22
2,1144181,ad,False,21,Tuesday,18
3,1435133,ad,False,355,Tuesday,10
4,1015700,ad,False,276,Friday,14
5,1137664,ad,False,734,Saturday,10
6,1116205,ad,False,264,Wednesday,13
7,1496843,ad,False,17,Sunday,18
8,1448851,ad,False,21,Tuesday,19
9,1446284,ad,False,142,Monday,14


In [3]:
#show dataframe info
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 588101 entries, 0 to 588100
Data columns (total 6 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   user id        588101 non-null  int64 
 1   test group     588101 non-null  object
 2   converted      588101 non-null  bool  
 3   total ads      588101 non-null  int64 
 4   most ads day   588101 non-null  object
 5   most ads hour  588101 non-null  int64 
dtypes: bool(1), int64(3), object(2)
memory usage: 27.5+ MB


In this example, we have a dataset with 6 columns, where the main outcome variable`converted` has boolian values of "True" or "False"

## Data Analysis

### Contingency table

In [4]:
# Create a summary table (contingency table) using pd.crosstab()
summary_table = pd.crosstab(df['test group'], df['converted'], margins=True)
summary_table

converted,False,True,All
test group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ad,550154,14423,564577
psa,23104,420,23524
All,573258,14843,588101


### t-test

In [5]:
# Perform t-test
group_A = df[df['test group'] == 'ad']['converted']
group_B = df[df['test group'] == 'psa']['converted']
t_statistic, p_value = ttest_ind(group_A, group_B)

# Print t-test results
print("T-test results:")
print("Group 'ad' convertion rate:", round(sum(group_A) / group_A.shape[0], 4))
print("Group 'psa' convertion rate:", round(sum(group_B) / group_B.shape[0], 4))
print("T-statistic: {:.2f}".format(t_statistic))
print("P-value: {:.2e}".format(p_value))

T-test results:
Group 'ad' convertion rate: 0.0255
Group 'psa' convertion rate: 0.0179
T-statistic: 7.37
P-value: 1.70e-13
