Visual Clash: A/B Testing Watch Screens to Elevate Marketing Strategies
The watch company 'Timetrends' is conducting a study to boost its sales by investigating changes to the watch screen design. Two types of campaigns have been set up: a control campaign using the existing watch screen design (Design A) and a test campaign incorporating a slightly different design with a different CTA at the bottom of the page. Instead of a complete redesign, the best practice is to make targeted changes to specific parts of the screen to see how these adjustments impact profit and engagement.

Dataset Features:

Campaign Name: Identifies each advertising campaign in the dataset.

Date: Records the date of each entry, providing a temporal dimension to the analysis.

Spend: Reflects the amount invested in each campaign in dollars.

Impressions and Reach: Measure the visibility and unique impressions achieved by the ads.

Website Clicks, Searches, View Content, Add to Cart, Purchase: User interactions captured during the campaigns, providing insights into engagement and conversion.

Hypothesis: Our project hinges on a clear hypothesis:

Null Hypothesis (H0): The new watch screen design does not yield a significant improvement in marketing campaign performance, measured by user engagement and sales metrics, compared to the existing design.

Alternative Hypothesis (H1): The new watch screen design outperforms the existing design, resulting in a notable enhancement of user engagement and sales metrics in marketing campaigns.

Key Performance Indicators (KPIs):

In selecting KPIs for this test, high-level metrics are prioritized to ensure a comprehensive evaluation of campaign success. The primary KPI should be a measure of profitability, such as ROI, to reflect the overall impact of the campaign. While CTR is also an important metric, it should serve as a secondary KPI. If the test design yields a higher CTR but does not improve ROI, it may not be worth implementing.

Primary KPI:

Return on Investment (ROI): This metric reflects the profitability of the campaign and should be the main focus.

Secondary KPIs:

Click-Through Rate (CTR): Percentage of impressions that resulted in a click.

Conversion Rate: Percentage of clicks that resulted in a purchase.

Your company has tasked you with determining which watch screen design changes work better for boosting sales and engagement. To achieve this, an A/B test will be conducted. Your job is to be the data detective, analyzing the numbers and providing a clear verdict on which watch screen design is the superior choice. Your findings will guide future decisions on how Timetrends showcases its watches to the world.

Get ready to dig into the data and let the company know which watch screen design is the real superstar for our ads. Your analysis will shape the path for our marketing strategies moving forward.

Task 1: Loading Control Campaign Data
Before we start our analysis, we need to view the control campaign dataset. It's essential to analyze the data and identify the columns it contains. Let's take a look.
​
​

In [None]:
#--- Import Pandas ---
import pandas as pd

#--- Read in dataset (control_group.csv) ----
# ---WRITE YOUR CODE FOR TASK 1 ---
# Define the file path
file_path = './control_group.csv'

# Load the data into a DataFrame using ';' as the separator
control_df = pd.read_csv(file_path, sep=';')

# Display the first few rows of the DataFrame to confirm it's loaded correctly
control_df.head()

In [None]:
Task 2: Loading Test Campaign Data
Excellent work on reviewing the control campaign data! Now, let's continue the progress. We're loading the data from the test campaign.

In [None]:
#--- Read in dataset (test_group.csv) ----
# ---WRITE YOUR CODE FOR TASK 2 ---
import pandas as pd

# Define the file path
file_path = './test_group.csv'

# Load the data into a DataFrame using ';' as the separator
test_df = pd.read_csv(file_path, sep=';')

# Display the first few rows of the DataFrame to confirm it's loaded correctly
test_df.head()


In [None]:
Task 3: Simplifying Column Names for Better Understanding
Good job with the test campaign data! The column names in the control campaign are a bit tricky. Let's make them simpler and easier to understand. Ready for the next step?

In [None]:
# --- WRITE YOUR CODE FOR TASK 3 ---
#control_df = # Rename the columns
control_df.columns = control_df.columns = ['Campaign Name', 'Date', 'Amount Spent', 'Impressions', 'Reach', 'Number of Clicks', 'Number of Searches', 'Number of views', 'Number Added to cart', 'Purchase Number']

# Display the first few rows of the DataFrame to confirm the column names are updated
control_df.head()

#--- Inspect data ---

In [None]:
Task 4: Friendly Names for the Test Campaign Data
Awesome work organizing the control campaign columns! Now, let's do the same for the test campaign. We need simpler and clearer names for better understanding. Can you assist by suggesting clearer names? Your input is valuable in making our data more user-friendly!



In [None]:
# --- WRITE YOUR CODE FOR TASK 4 ---

# Rename the columns
test_df.columns = ['Campaign Name', 'Date', 'Amount Spent', 'Impressions', 'Reach', 'Number of Clicks', 'Number of Searches', 'Number of views', 'Number Added to cart', 'Purchase Number']
# Display the first few rows of the DataFrame to confirm the column names are updated
test_df.head()

#--- Inspect data ---

In [None]:
Task 5: Checking for Missing Values in Test Campaign Data
Fantastic progress! Now, let's check for any missing values in our test campaign data. We need to count how many blanks we've got. Let's check it.

In [None]:
# --- WRITE YOUR CODE FOR TASK 5 ---

# Count the number of null values in each column
null_sum_controldf = control_df.isnull().sum()

# Display the result
null_sum_controldf
#--- Inspect data ---

In [None]:
Task 6: Handling Missing Values in Control Campaign Data
Excellent job spotting missing values in the control campaign data! Now, let's ensure our data is complete. We're filling in missing values using the average for each column. Let's make our data more complete.

In [None]:
# --- WRITE YOUR CODE FOR TASK 6 ---
# Fill missing values with the mean of their respective columns
control_df.fillna(control_df.mean(numeric_only=True), inplace=True)

# Display the first few rows of the DataFrame to confirm the missing values are filled
control_df.head()

#--- Inspect data ---

In [None]:
Task 7: Checking for Missing Values in Test Campaign Data
Great work so far! Now, let's do the same for the test campaign. We're looking for any missing values. Ready to ensure our test data is complete and good to go?

In [None]:
# --- WRITE YOUR CODE FOR TASK 7 ---
# Count the number of null values in each column
null_sum_testdf = test_df.isnull().sum()

# Display the result
null_sum_testdf

#--- Inspect data ---

In [None]:
Task 8: Understanding Control Campaign Data Numbers
Good work! Now, let's look into the control campaign data. We want to find out important details like average spending, ad reach, clicks, and more. Your task is to get these insights from the numbers. Let's dig in and guide our marketing strategy. Let's go!

In [None]:
# --- WRITE YOUR CODE FOR TASK 8 ---
# Generate descriptive statistics for the control group data
control_describe = control_df.describe()

# Display the result
control_describe

#--- Inspect data ---

In [None]:
Task 9: Understanding Test Campaign Numbers
Making good progress! Now, let's focus on our test campaign. We're interested in basics like average spending, reach, and more. Your task is to dive into these details and simplify the numbers for us. Let's explore the insights that will guide our marketing strategy. Let's get started!

In [None]:
# --- WRITE YOUR CODE FOR TASK 9 ---
test_describe =test_df.describe()

# Display the result
test_describe

#--- Inspect data ---

In [None]:
Task 10: Understanding Purchase Numbers Distribution
Fantastic! Now, let's look into the distribution of our key metric, the 'Purchase Number.' We aim to understand how it's spread out. Your task is to conduct a Normality check on this metric for both groups and jot down the results. Let's study.

In [None]:
# #--- Import shapiro from scipy.stats ---
from scipy.stats import shapiro
# # --- WRITE YOUR CODE FOR TASK 10 ---
# Perform Shapiro-Wilk tests for 'Purchase Number' column
shapiro_control = shapiro(control_df['Purchase Number'])
shapiro_test = shapiro(test_df['Purchase Number'])

# Create DataFrame for Shapiro-Wilk test results
shapiro_results = pd.DataFrame({
    'Group': ['Control', 'Test'],
    'Test Statistic': [shapiro_control.statistic, shapiro_test.statistic],
    'P-value': [shapiro_control.pvalue, shapiro_test.pvalue]
})

# Display the Shapiro-Wilk test results
shapiro_results

# #--- Inspect data ---

In [None]:
When the p-value is higher than 0.05 (common significance level), it means we don't have enough evidence to say the data significantly deviates from a normal distribution. On the other hand, if the p-value is lower than 0.05, it suggests the data significantly deviates from a normal distribution.

In [None]:
Task 11: Comparing Purchase Numbers with a T-Test
Great job so far! Now, let's compare the 'Purchase Number' between our control and test groups. We'll use a t-test to check for a significant difference. Your task is to run this test and share what the numbers reveal. Ready to see if there's a standout performer between the groups? Let's find out!

In [None]:
#--- import stats from scipy ---
from scipy.stats import ttest_ind
# --- WRITE YOUR CODE FOR TASK 11 ---
#t_stat, p_value = ...
# Perform t-test for 'Purchase Number' column
t_stat, p_value = ttest_ind(control_df['Purchase Number'], test_df['Purchase Number'])

# Display the t-statistic and p-value
t_stat
p_value

#--- Inspect data ---

In [None]:
Task 12: Comparing Cost per Conversion in Control and Test Campaigns
Good work! Now, let's discuss cost-effectiveness. We've figured out the 'Cost per Conversion' for both control and test groups. Your task is to determine the average cost for each. Let's see which campaign is more budget-friendly for turning clicks into purchases. Let's analyze the numbers!

In [None]:
# --- WRITE YOUR CODE FOR TASK 12 ---
#average_cost_control,average_cost_test = ...
#Compute the mean of 'Cost per Conversion' for both control group and test group
# Calculate 'Cost per Conversion' for both control group and test group
control_df['Cost per Conversion'] = control_df['Amount Spent'] / control_df['Purchase Number']
test_df['Cost per Conversion'] = test_df['Amount Spent'] / test_df['Purchase Number']

# Compute the mean of 'Cost per Conversion' for both control group and test group
average_cost_control = control_df['Cost per Conversion'].mean()
average_cost_test = test_df['Cost per Conversion'].mean()


#--- Inspect data ---
average_cost_control 
average_cost_test

In [None]:
Task 13: Merging Datasets for In-Depth Analysis
Fantastic job! Now, to calculate the primary (ROI) and secondary KPIs (CTR, Conversion Rate, CPC), we need to merge the control and test datasets. Let's proceed with merging the datasets to prepare for deeper analysis.

# --- WRITE YOUR CODE FOR TASK 13

In [None]:
# --- WRITE YOUR CODE FOR TASK 13 ---
# Add a column to identify the group

# Concatenate the control group and test group DataFrames
merged = pd.concat([control_df, test_df], ignore_index=True)

# Display the first few rows of the merged DataFrame
merged.head()

#--- Inspect data ---

In [None]:
Task 14: Enhancing Dataset with ROI,CTR, Conversion Rate, and CPC
Great progress! Now, let's take it a step further by incorporating additional metrics such as Click-Through Rate (CTR), Conversion Rate, Cost Per Click (CPC), and Return On Investment (ROI). These metrics are pivotal in assessing campaign effectiveness. Calculate the primary KPI, Return On Investment (ROI), and the secondary KPIs, Conversion Rate, Click-Through Rate (CTR), and Cost Per Click (CPC), to provide comprehensive insights into our campaign strategies.

Primary KPI:

Return On Investment (ROI): reflects the profitability of our campaigns.

Secondary KPIs:

Click-Through Rate (CTR): This measures how successful an ad has been in capturing users' attention. The higher the CTR, the more successful the ad has been in generating interest.

Conversion Rate: This is the ratio of users who take a desired action (e.g., making a purchase) to the total number of users who clicked on the ad.

CPC (Cost Per Click): This metric determines how much advertisers pay for ads based on the number of clicks. It's essential for marketers to understand the price for their paid advertising campaigns.

Your next task is to smoothly incorporate these metrics into our dataset. Let's go ahead and code it!

In [None]:
# --- WRITE YOUR CODE FOR TASK 14 ---
#merged = ...
merged['CTR'] = (merged['Number of Clicks'] / merged['Impressions']) * 100
merged['Conversion Rate'] = (merged['Purchase Number'] / merged['Number of Clicks']) * 100
merged['CPC'] = merged['Amount Spent'] / merged['Number of Clicks']
merged['ROI'] = ((merged['Purchase Number'] - merged['Amount Spent']) / merged['Amount Spent']) * 100

# Display the first few rows of the merged DataFrame with new metrics
merged.head()
#--- Inspect data ---

In [None]:
Task 15: Comparing Metrics in Control vs. Test Campaigns
Congratulations on reaching the final task! We're now analyzing key metrics—Return On Investment (ROI) as our primary KPI, and Click-Through Rate (CTR), Conversion Rate, and Cost Per Click (CPC) as secondary KPIs—to compare our control and test campaigns.

Using t-tests, we'll identify differences in these metrics and interpret results based on statistical significance (p-values) and effect sizes (t-statistics). By comparing Cost per Conversion in Control and Test Campaigns, we'll gain insights into overall campaign performance.

With these insights, we'll conclude our A/B project by addressing the initial Null Hypothesis (H0) and Alternative Hypothesis (H1), guiding future marketing strategies for optimal results.

Your task is to conduct t-tests for each metric and provide insights into whether these differences are significant.

Let's wrap up this journey and determine the winning campaign strategy. Your efforts have been truly commendable!

In [None]:
if 'CTR' not in control_df.columns:
    control_df['CTR'] = (control_df['Number of Clicks'] / control_df['Impressions']) * 100
    control_df['Conversion Rate'] = (control_df['Purchase Number'] / control_df['Number of Clicks']) * 100
    control_df['CPC'] = control_df['Amount Spent'] / control_df['Number of Clicks']
    control_df['ROI'] = ((control_df['Purchase Number'] - control_df['Amount Spent']) / control_df['Amount Spent']) * 100

if 'CTR' not in test_df.columns:
    test_df['CTR'] = (test_df['Number of Clicks'] / test_df['Impressions']) * 100
    test_df['Conversion Rate'] = (test_df['Purchase Number'] / test_df['Number of Clicks']) * 100
    test_df['CPC'] = test_df['Amount Spent'] / test_df['Number of Clicks']
    test_df['ROI'] = ((test_df['Purchase Number'] - test_df['Amount Spent']) / test_df['Amount Spent']) * 100

# Perform t-tests for ROI, CTR, Conversion Rate, and CPC
metrics = ['ROI', 'CTR', 'Conversion Rate', 'CPC']
t_test_results = []

for metric in metrics:
    t_stat, p_value = ttest_ind(control_df[metric], test_df[metric])
    t_test_results.append({
        'Metric': metric,
        'T-Statistic': t_stat,
        'P-Value': p_value
    })

# Convert results to DataFrame
t_test_results = pd.DataFrame(t_test_results)

# Display the t-test results
t_test_results

In [None]:
Understanding p-values in Statistical Testing
In statistical hypothesis testing, the p-value is a measure of the evidence against a null hypothesis. A small p-value (typically ≤ 0.05) suggests strong evidence against the null hypothesis, leading to its rejection. Conversely, a larger p-value suggests weak evidence against the null hypothesis.