<a href="https://colab.research.google.com/github/buriro-ezekia/Data-Science-Project-Portfolios/blob/main/Phase_2_Campaign_Optimization_and_Strategy_Development.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task 4: A/B Testing Framework

In [None]:
# Step-by-Step A/B Testing Framework Implementation
# 1. Load and Explore the Data

import pandas as pd

# Load the dataset
df = pd.read_csv('/content/Apple Search Ads Campaigns Cleaned_Dataset.csv')

# Display the first few rows to understand the structure
print(df.head())


   campaign_id                                  campaign_name  start_date  \
0  470311161.0                         0.54 - May 2022 - Arab  31/08/2020   
1  481689442.0                   0.54 - Tier 1 & 2 - May 2022  04/10/2020   
2  469927825.0                       0.54 - Tier 1 - May 2022  29/08/2020   
3  474269632.0                             1.00 - My Keywords  14/09/2020   
4  484006307.0  1.01 - US/OZ - May 2022 - Performing Keywords  12/10/2020   

   status                    app_name              ad_placement  \
0  paused  Sleep Habits: Sleep Better  App Store Search Results   
1  paused  Sleep Habits: Sleep Better  App Store Search Results   
2  paused  Sleep Habits: Sleep Better  App Store Search Results   
3  paused  Sleep Habits: Sleep Better  App Store Search Results   
4  paused  Sleep Habits: Sleep Better  App Store Search Results   

   lifetime_budget  daily_budget   spend  average_cost_per_tap_(cpt)  ...  \
0           5000.0         500.0  103.98                 

# 2. Define Objectives and Hypotheses
# **Objective**

- Improve key performance metrics (e.g., conversion rate, CPA) by testing different ad variations.

**Hypotheses Examples**

- Ad copy A with a stronger call-to-action will result in a higher conversion rate than ad copy B.
- A budget increase for keyword X will lead to more installs compared to keyword Y.

# 3. Identify Variables for Testing
Select the variables to test. For instance,

- Ad Creatives: campaign_name
- Targeting Options: impressions, taps, installs, conversion_rate_(cr)
- Audience Segments: lat_on_installs, lat_off_installs, new_downloads, redownloads

# 4. Create Control and Variation Groups
**a. Define Control and Variation Groups**

- **Control Group**: Current campaign setup or the baseline version of an ad.
- **Variation Group**: New campaign setup or the modified version of an ad

In [None]:
# b. Set Up Test and Control Campaigns in the Dataset

# Splitting campaigns for A/B testing
# Apply a function to handle NaNs separately
df['test_group'] = df['campaign_name'].apply(lambda x: 'NaN' if pd.isna(x) else ('Control' if 'A' in x else 'Variation'))

# Display the updated dataset
print(df[['campaign_name', 'test_group']].head())



                                   campaign_name test_group
0                         0.54 - May 2022 - Arab    Control
1                   0.54 - Tier 1 & 2 - May 2022  Variation
2                       0.54 - Tier 1 - May 2022  Variation
3                             1.00 - My Keywords  Variation
4  1.01 - US/OZ - May 2022 - Performing Keywords  Variation


In [None]:
# Alternatively
# Load the DataFrame from the CSV file
# df = pd.read_csv('/content/Apple Search Ads Campaigns Cleaned_Dataset.csv')

# Ensure that NaN values in 'campaign_name' are handled and converted to empty strings
df['campaign_name'] = df['campaign_name'].fillna('')

# Apply the function to split into test groups
df['test_group'] = df['campaign_name'].apply(lambda x: 'Control' if 'A' in x else 'Variation')

# Display the updated dataset with the 'campaign_name' and 'test_group' columns
print(df[['campaign_name', 'test_group']].head())


                                   campaign_name test_group
0                         0.54 - May 2022 - Arab    Control
1                   0.54 - Tier 1 & 2 - May 2022  Variation
2                       0.54 - Tier 1 - May 2022  Variation
3                             1.00 - My Keywords  Variation
4  1.01 - US/OZ - May 2022 - Performing Keywords  Variation


In [None]:
# 5. Sample Size Calculation
# Calculate the required sample size for statistically significant results

from statsmodels.stats.power import TTestIndPower

# Parameters for sample size calculation
effect_size = 0.2  # Small effect size
alpha = 0.05  # Significance level
power = 0.8  # Power of the test

# Calculate the sample size per group
analysis = TTestIndPower()
sample_size = analysis.solve_power(effect_size, power=power, alpha=alpha)
print(f'Required sample size per group: {sample_size}')


Required sample size per group: 393.4056989990335


In [None]:
# 6. Implement Tracking and Measurement
# a. Track Metrics for Each Group

# Calculate metrics for each group
group_metrics = df.groupby('test_group').agg({
    'impressions': 'sum',
    'taps': 'sum',
    'installs': 'sum',
    'spend': 'sum',
    'average_cost_per_acquisition_(cpa)': 'mean'
}).reset_index()

print(group_metrics)


  test_group  impressions   taps  installs    spend  \
0    Control       105351   2675       882   641.08   
1  Variation      1012141  23797      5392  2406.45   

   average_cost_per_acquisition_(cpa)  
0                            2.170000  
1                            0.460909  


In [None]:
# Install the schedule library
# !pip install schedule

In [None]:
# b. Set Up Automated Tracking
# Automate data extraction and performance tracking
import schedule
import time

def track_performance():
    # Example function to extract and log performance data
    current_metrics = df.groupby('test_group').agg({
        'impressions': 'sum',
        'taps': 'sum',
        'installs': 'sum',
        'spend': 'sum',
        'average_cost_per_acquisition_(cpa)': 'mean'
    }).reset_index()
    print(current_metrics)
    # Log or save the current_metrics for future analysis

# Schedule the tracking to run daily
schedule.every().day.at("09:00").do(track_performance)

while True:
    schedule.run_pending()
    time.sleep(60)  # wait one minute


# In Progress