<a href="https://colab.research.google.com/github/RajkumarShenigaram/Marketing-Campaign-Effectiveness/blob/main/AB_Testing_Marketing_Campaign_Effectiveness.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#A/B testing helps in finding a better approach to finding customers, marketing products, getting a higher reach, or anything that helps a business convert most of its target customers into actual customers.

Below are all the features in the dataset:

Campaign Name: The name of the campaign  
Date: Date of the record
Spend: Amount spent on the campaign in dollars
of Impressions: Number of impressions the ad crossed through the campaign  
Reach: The number of unique impressions received in the ad  
of Website Clicks: Number of website clicks received through the ads  
of Searches: Number of users who performed searches on the website  
of View Content: Number of users who viewed content and products on the website  
of Add to Cart: Number of users who added products to the cart  
of Purchase: Number of purchases  

Two campaigns were performed by the company:

Control Campaign  
Test Campaign

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import mannwhitneyu, ttest_ind, levene, shapiro

import warnings
warnings.simplefilter('ignore')


#Importing and combining Control and Test data sets
- using a split ddelimiter will change the data types which is verybeneficial. Need to quality assure the data types

In [None]:
import pandas as pd
import gdown

# File ID from Google Drive link
file_id = "1TSMfQt0yGswOzf4gKn1MitZ21G_j1xml"
url = f"https://drive.google.com/uc?id={file_id}"

# Download the file
output = "control_group.csv"
gdown.download(url, output, quiet=False)

# Read the CSV file into Pandas
control_group = pd.read_csv(output, sep =";")
control_group.info()

The data types Int and float is as we expect them to be in our analysis, except the date which we will modify in the further steps

In [None]:
control_group.head()

In [None]:
# File ID from Google Drive link
file_id = "10SANed-I7S1eiW-29u2q3np823r6zo-x"
url = f"https://drive.google.com/uc?id={file_id}"

# Download the file
output = "test_group.csv"
gdown.download(url, output, quiet=False)

# Read the CSV file into Pandas
test_group = pd.read_csv(output, sep =";")
test_group.head()

In [None]:
full_df = pd.concat([control_group, test_group], axis=0)

#View full dataset

In [None]:
full_df.sample(n=5)

#Renaming the columns for ease of use

In [None]:
# Renaming the columns for ease of use
df_renamed = full_df.rename(
    columns=lambda x: x.replace("# of ", "") if x.startswith("# of ") else x
)
df_renamed = df_renamed.rename(columns={"Campaign Name": "Campaign"})

In [None]:
df_renamed[df_renamed.isnull().any(axis=1)]

We have only one row with a missing value, we can delete this as this is as low as 3% in the data. As a best practice 5-10 % is tolerable else we have to go for imputation methods.

In [None]:
df_cleaned = df_renamed.dropna()
df_cleaned["Campaign"].value_counts()

In [None]:
df_cleaned.dtypes

Converting date field type to date format

In [None]:
df = df_cleaned.copy()
df["Date"] = pd.to_datetime(df["Date"], format="%d.%m.%Y")
df[df.select_dtypes(include=["float64"]).columns] = df.select_dtypes(
    include=["float64"]
).astype("int")

df.dtypes

Checking the data set date range

In [None]:
df["Date"].dt.year.value_counts(), df["Date"].dt.month.value_counts()

All of the date is in 8th month of 2019.

#Summary of the data

In [None]:
pd.set_option("display.max_columns", None)
df.drop(columns=["Date"]).groupby("Campaign").describe()

Correlation Analysis

In [None]:
corr_matrix = df.drop(columns=["Campaign", "Date"]).corr()
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm")

1. Ad Spend Has a Weak Impact on Purchases (0.031 correlation) : Increasing spend does not significantly boost sales, suggesting inefficient budget allocation
2. Strong Correlation Between Searches & View Content (0.89 correlation) : Users who search engage with content, but many don’t convert to buyers
3. Moderate Correlation Between "Add to Cart" & Purchases (0.39 correlation) : Cart abandonment is an issue, requiring checkout optimization and retargeting
4. Website Clicks Don’t Strongly Convert to Purchases (~0.03 correlation) : Traffic is not leading to sales, so landing pages and CTAs need improvement

Distribution Analysis

In [None]:
def plot_numeric(func):
    fig, axes = plt.subplots(2, 4, figsize=(15, 10))
    axes = axes.flatten()

    numeric_cols = df.select_dtypes(include=["int"]).columns.to_list()


    for i, col in enumerate(numeric_cols):
        func(data=df, x="Campaign", y=col, ax=axes[i], hue="Campaign")
        axes[i].set_title(col)
        axes[i].set_ylabel("")

    plt.tight_layout()
    plt.show()


plot_numeric(sns.violinplot)

1️. Test Campaign Shows Lower Variability in Searches & View Content → Users in the Test group are more consistent in engagement, but the range is smaller, suggesting fewer outliers.

2️. Higher Spend in the Test Campaign Does Not Clearly Translate to More Purchases → Despite spending more consistently, purchase distributions remain similar, indicating potential inefficiencies in conversion.

3️. Control Campaign Has a Higher Spread in Key Metrics (Impressions, Clicks, Add to Cart) → The wider distributions suggest that the Control campaign may be reaching a broader audience, but engagement consistency is lower.

#Normality tests

We use shapiro test since it works well on small data (n < 50)

In [None]:
def check_normality(col, alpha=0.05):
    stat, p = shapiro(col)

    if p < alpha:
        return "Not Normal"
    return "Normal"


numeric_cols = df.select_dtypes(include=["int"]).columns.to_list()


for col in numeric_cols:
    control = df[df["Campaign"] == "Control Campaign"][col]
    test = df[df["Campaign"] == "Test Campaign"][col]
    print(
        f"{col}: Control is {check_normality(control)} and Test is {check_normality(test)}"
    )

#Hypothesis Testing

In [None]:
def ab_test(col, alpha=0.05, alt="two-sided"):
    control = df[df["Campaign"] == "Control Campaign"][col]
    test = df[df["Campaign"] == "Test Campaign"][col]

    if check_normality(control) == "Normal" and check_normality(test) == "Normal":
        stat, p_lev = levene(control, test)
        equal_var = p_lev > alpha

        stat, p = ttest_ind(control, test, equal_var=equal_var, alternative=alt)
    else:
        stat, p = mannwhitneyu(control, test, alternative=alt)

    conclusion = "Reject H0" if p < 0.05 else "Fail to Reject H0"
    print(f"{col}: {p} and we {conclusion}")


###Spend

*Interpretation*: The spend on the test group isn't significantly different from the control.

- **Null Hypothesis:** The means of the two campaigns are equal.
- **Alternative Hypothesis:** The means of the two campaigns are not equal.

In [None]:
ab_test("Spend [USD]", alt="two-sided")

At the 5% significance level, which is greater than the p value, we reject our assumption and the null hypothesis and conclude that **the two campaigns DO NOT have equal means**.

It seemed like the test campaign could've had more Spend [USD] values, let's make sure that it does.

- **Null Hypothesis:** The means of the two campaigns are equal.
- **Alternative Hypothesis:** The mean of the test campaign is higher than the control.

In [None]:
ab_test("Spend [USD]", alt="less")  # control < test

####Conclusion

The spend on the test campaign is statistically higher than the control campaign.

###Purchase

*Interpretation*: There doesn't seem to be a significant difference between the campaigns.

- **Null Hypothesis:** There is no difference between the two campaigns with respect to Purchases.
- **Alternative Hypothesis:** There Purchases differ between the two campaigns.

In [None]:
ab_test("Purchase", alt="two-sided")

####Conclusion

As expected, there is no difference between the two campaigns with respect to Purchases.

### Impressions and Reach

In [None]:
ab_test("Reach", alt="greater")  # control > test

In [None]:
ab_test("Website Clicks", alt="two-sided")

In [None]:
ab_test("Searches", alt="two-sided")

In [None]:
ab_test("Add to Cart", alt="greater")