# Google Ads Device Data Segment Validation with 2% Threshold

This notebook performs **segment-level validation** for Google Ads device data with a **2% tolerance threshold**.

**Files:**
- Growth: `growth/merged_ads_device_google(growth).csv` (488 rows)
- Gold: `gold/merged_ads_device_google(gold).xlsx` (974 rows)

**Column Mapping:**
- `Campaign` → campaign column
- `Day` → date column
- `Device` → device segment (needs normalization)
- `Cost` → cost metric
- `Impr.`/`Impr` → impressions metric
- `Clicks` → clicks metric

**Device Mapping:**
- Growth "Computers" → Gold "DESKTOP"
- Growth "Mobile phones" → Gold "MOBILE"
- Growth "Tablets" → Gold "TABLET"
- Growth "Other" → Gold "OTHER"

**Validation Segments:**
- Overall Totals
- By Date
- By Campaign
- By Device
- By Campaign + Date

## Configuration: Set Threshold

In [1]:
# CONFIGURATION: Set your threshold here
THRESHOLD_PERCENT = 2.0  # Accept differences up to 2%

print("="*80)
print("GOOGLE ADS DEVICE DATA VALIDATION CONFIGURATION")
print("="*80)
print(f"\nThreshold: {THRESHOLD_PERCENT}%")
print(f"Differences under {THRESHOLD_PERCENT}% will be marked as MATCHED")
print("\nYou can change THRESHOLD_PERCENT above to adjust tolerance")

GOOGLE ADS DEVICE DATA VALIDATION CONFIGURATION

Threshold: 2.0%
Differences under 2.0% will be marked as MATCHED

You can change THRESHOLD_PERCENT above to adjust tolerance


## Step 1: Import Libraries

In [2]:
# Install openpyxl if needed
import sys
!{sys.executable} -m pip install openpyxl -q

import pandas as pd
import numpy as np
from datetime import datetime

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

print("✓ Libraries imported successfully")
print(f"Analysis started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

✓ Libraries imported successfully
Analysis started: 2025-12-21 19:13:46


## Step 2: Load and Prepare Data

In [3]:
# Load Growth CSV (skip 2 header rows)
print("Loading Growth CSV...")
growth_df = pd.read_csv("growth/merged_ads_device_google(growth).csv", skiprows=2)

# Clean data
growth_df['Impr.'] = growth_df['Impr.'].astype(str).str.replace(',', '').astype(int)
growth_df['Clicks'] = pd.to_numeric(growth_df['Clicks'].astype(str).str.replace(',', ''), errors='coerce').fillna(0).astype(int)
growth_df['Cost'] = pd.to_numeric(growth_df['Cost'], errors='coerce')

# Normalize device names to match Gold format
device_mapping = {
    'Computers': 'DESKTOP',
    'Mobile phones': 'MOBILE',
    'Tablets': 'TABLET',
    'Other': 'OTHER',
    'TV screens': 'OTHER'
}
growth_df['Device'] = growth_df['Device'].map(device_mapping).fillna('OTHER')

# Rename columns to match Gold
growth_df = growth_df.rename(columns={
    'Impr.': 'Impr',
    'Currency code': 'Currency_code'
})

# Convert Day to datetime format
growth_df['Day'] = pd.to_datetime(growth_df['Day'], format='%d-%m-%Y').dt.strftime('%Y-%m-%d')

print(f"✓ Growth loaded: {len(growth_df):,} rows")
print(f"  Columns: {growth_df.columns.tolist()}")

# Load Gold Excel
print("\nLoading Gold Excel...")
gold_df = pd.read_excel("gold/merged_ads_device_google(gold).xlsx")

# Convert Day to string format for consistent comparison
gold_df['Day'] = pd.to_datetime(gold_df['Day']).dt.strftime('%Y-%m-%d')

# Ensure Device is uppercase and handle NaN
gold_df['Device'] = gold_df['Device'].fillna('OTHER').str.upper()

print(f"✓ Gold loaded: {len(gold_df):,} rows")
print(f"  Columns: {gold_df.columns.tolist()}")

print("\n" + "="*80)
print("DATA SUMMARY")
print("="*80)
print(f"\nGrowth Date Range: {growth_df['Day'].min()} to {growth_df['Day'].max()}")
print(f"Gold Date Range: {gold_df['Day'].min()} to {gold_df['Day'].max()}")
print(f"\nGrowth Unique Campaigns: {growth_df['Campaign'].nunique()}")
print(f"Gold Unique Campaigns: {gold_df['Campaign'].nunique()}")
print(f"\nGrowth Unique Devices: {sorted(growth_df['Device'].unique())}")
print(f"Gold Unique Devices: {sorted(gold_df['Device'].unique())}")

Loading Growth CSV...
✓ Growth loaded: 488 rows
  Columns: ['Campaign', 'Day', 'Device', 'Currency_code', 'Cost', 'Impr', 'Clicks']

Loading Gold Excel...
✓ Gold loaded: 974 rows
  Columns: ['Campaign', 'Day', 'Device', 'Currency_code', 'Cost', 'Impr', 'Clicks']

DATA SUMMARY

Growth Date Range: 2025-11-01 to 2025-11-30
Gold Date Range: 2025-11-01 to 2025-11-30

Growth Unique Campaigns: 7
Gold Unique Campaigns: 7

Growth Unique Devices: ['DESKTOP', 'MOBILE', 'OTHER', 'TABLET']
Gold Unique Devices: ['DESKTOP', 'MOBILE', 'OTHER', 'TABLET']


## Step 3: Overall Totals Comparison

In [4]:
print("="*80)
print(f"OVERALL TOTALS COMPARISON (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Calculate totals
growth_totals = growth_df[['Cost', 'Impr', 'Clicks']].sum()
gold_totals = gold_df[['Cost', 'Impr', 'Clicks']].sum()

# Create comparison dataframe
overall_comparison = pd.DataFrame({
    'Metric': ['Cost', 'Impressions', 'Clicks'],
    'Growth': [growth_totals['Cost'], growth_totals['Impr'], growth_totals['Clicks']],
    'Gold': [gold_totals['Cost'], gold_totals['Impr'], gold_totals['Clicks']],
})

overall_comparison['Difference'] = overall_comparison['Growth'] - overall_comparison['Gold']
overall_comparison['Diff %'] = (overall_comparison['Difference'] / overall_comparison['Gold'] * 100).round(2)
overall_comparison['Match'] = overall_comparison['Diff %'].abs() <= THRESHOLD_PERCENT
overall_comparison['Status'] = overall_comparison['Match'].apply(lambda x: '✓ PASS' if x else '✗ FAIL')

display(overall_comparison)

# Summary
matches = overall_comparison['Match'].sum()
print(f"\n✓ Matches (within {THRESHOLD_PERCENT}%): {matches}/3 metrics")
if matches == 3:
    print(f"✓✓✓ ALL OVERALL TOTALS MATCH (within {THRESHOLD_PERCENT}% threshold)! ✓✓✓")
else:
    print(f"⚠ {3-matches} metric(s) exceed {THRESHOLD_PERCENT}% threshold")

OVERALL TOTALS COMPARISON (with 2.0% threshold)


Unnamed: 0,Metric,Growth,Gold,Difference,Diff %,Match,Status
0,Cost,401869.07,451154.45,-49285.38,-10.92,False,✗ FAIL
1,Impressions,3324955.0,3561483.0,-236528.0,-6.64,False,✗ FAIL
2,Clicks,101520.0,120558.0,-19038.0,-15.79,False,✗ FAIL



✓ Matches (within 2.0%): 0/3 metrics
⚠ 3 metric(s) exceed 2.0% threshold


## Step 4: Validation by Date

In [5]:
print("="*80)
print(f"SEGMENT VALIDATION: BY DATE (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Aggregate by date
growth_by_date = growth_df.groupby('Day').agg({
    'Cost': 'sum',
    'Impr': 'sum',
    'Clicks': 'sum'
}).reset_index()
growth_by_date.columns = ['Day', 'cost_growth', 'impr_growth', 'clicks_growth']

gold_by_date = gold_df.groupby('Day').agg({
    'Cost': 'sum',
    'Impr': 'sum',
    'Clicks': 'sum'
}).reset_index()
gold_by_date.columns = ['Day', 'cost_gold', 'impr_gold', 'clicks_gold']

# Merge and compare
date_comparison = pd.merge(growth_by_date, gold_by_date, on='Day', how='inner')

# Calculate percentage differences
date_comparison['cost_diff_pct'] = ((date_comparison['cost_growth'] - date_comparison['cost_gold']) / date_comparison['cost_gold'] * 100).round(2)
date_comparison['impr_diff_pct'] = ((date_comparison['impr_growth'] - date_comparison['impr_gold']) / date_comparison['impr_gold'] * 100).round(2)
date_comparison['clicks_diff_pct'] = ((date_comparison['clicks_growth'] - date_comparison['clicks_gold']) / date_comparison['clicks_gold'] * 100).round(2)

# Apply threshold matching
date_comparison['perfect_match'] = (
    (date_comparison['cost_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (date_comparison['impr_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (date_comparison['clicks_diff_pct'].abs() <= THRESHOLD_PERCENT)
)
date_comparison['status'] = date_comparison['perfect_match'].apply(lambda x: '✓ PASS' if x else '✗ FAIL')

print(f"\nTotal dates compared: {len(date_comparison)}")
print(f"✓ Matches (within {THRESHOLD_PERCENT}%): {date_comparison['perfect_match'].sum()}")
print(f"✗ Exceeds threshold: {(~date_comparison['perfect_match']).sum()}")

print("\nDetailed comparison:")
display(date_comparison[['Day', 'cost_growth', 'cost_gold', 'cost_diff_pct',
                          'impr_growth', 'impr_gold', 'impr_diff_pct',
                          'clicks_growth', 'clicks_gold', 'clicks_diff_pct', 'status']].sort_values('Day'))

SEGMENT VALIDATION: BY DATE (with 2.0% threshold)

Total dates compared: 29
✓ Matches (within 2.0%): 24
✗ Exceeds threshold: 5

Detailed comparison:


Unnamed: 0,Day,cost_growth,cost_gold,cost_diff_pct,impr_growth,impr_gold,impr_diff_pct,clicks_growth,clicks_gold,clicks_diff_pct,status
0,2025-11-01,10925.15,10925.15,-0.0,236435,236435,0.0,7798,7798,0.0,✓ PASS
1,2025-11-03,1096.78,1096.78,0.0,10160,10160,0.0,254,254,0.0,✓ PASS
2,2025-11-04,11854.51,11854.49,0.0,131891,131891,0.0,3100,3100,0.0,✓ PASS
3,2025-11-05,1770.34,1770.34,-0.0,20631,20631,0.0,930,930,0.0,✓ PASS
4,2025-11-06,5881.03,5881.83,-0.01,85302,85302,0.0,3149,3150,-0.03,✓ PASS
5,2025-11-07,12110.28,12113.55,-0.03,147661,147661,0.0,6123,6129,-0.1,✓ PASS
6,2025-11-08,14108.5,14108.91,-0.0,147160,147166,-0.0,4239,4241,-0.05,✓ PASS
7,2025-11-09,22861.73,22876.05,-0.06,192827,192866,-0.02,6082,6091,-0.15,✓ PASS
8,2025-11-10,20059.33,20059.34,-0.0,136257,136257,0.0,2761,2761,0.0,✓ PASS
9,2025-11-11,15728.38,15728.39,-0.0,140885,140885,0.0,2570,2570,0.0,✓ PASS


## Step 5: Validation by Campaign

In [6]:
print("="*80)
print(f"SEGMENT VALIDATION: BY CAMPAIGN (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Aggregate by campaign
growth_by_campaign = growth_df.groupby('Campaign').agg({
    'Cost': 'sum',
    'Impr': 'sum',
    'Clicks': 'sum'
}).reset_index()
growth_by_campaign.columns = ['Campaign', 'cost_growth', 'impr_growth', 'clicks_growth']

gold_by_campaign = gold_df.groupby('Campaign').agg({
    'Cost': 'sum',
    'Impr': 'sum',
    'Clicks': 'sum'
}).reset_index()
gold_by_campaign.columns = ['Campaign', 'cost_gold', 'impr_gold', 'clicks_gold']

# Merge and compare
campaign_comparison = pd.merge(growth_by_campaign, gold_by_campaign, on='Campaign', how='inner')

# Calculate percentage differences
campaign_comparison['cost_diff_pct'] = ((campaign_comparison['cost_growth'] - campaign_comparison['cost_gold']) / campaign_comparison['cost_gold'] * 100).round(2)
campaign_comparison['impr_diff_pct'] = ((campaign_comparison['impr_growth'] - campaign_comparison['impr_gold']) / campaign_comparison['impr_gold'] * 100).round(2)
campaign_comparison['clicks_diff_pct'] = ((campaign_comparison['clicks_growth'] - campaign_comparison['clicks_gold']) / campaign_comparison['clicks_gold'] * 100).round(2)

# Apply threshold matching
campaign_comparison['perfect_match'] = (
    (campaign_comparison['cost_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (campaign_comparison['impr_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (campaign_comparison['clicks_diff_pct'].abs() <= THRESHOLD_PERCENT)
)
campaign_comparison['status'] = campaign_comparison['perfect_match'].apply(lambda x: '✓ PASS' if x else '✗ FAIL')

print(f"\nTotal campaigns compared: {len(campaign_comparison)}")
print(f"✓ Matches (within {THRESHOLD_PERCENT}%): {campaign_comparison['perfect_match'].sum()}")
print(f"✗ Exceeds threshold: {(~campaign_comparison['perfect_match']).sum()}")

print("\nDetailed comparison:")
display(campaign_comparison[['Campaign', 'cost_growth', 'cost_gold', 'cost_diff_pct',
                              'impr_growth', 'impr_gold', 'impr_diff_pct',
                              'clicks_growth', 'clicks_gold', 'clicks_diff_pct', 'status']].sort_values('Campaign'))

SEGMENT VALIDATION: BY CAMPAIGN (with 2.0% threshold)

Total campaigns compared: 7
✓ Matches (within 2.0%): 2
✗ Exceeds threshold: 5

Detailed comparison:


Unnamed: 0,Campaign,cost_growth,cost_gold,cost_diff_pct,impr_growth,impr_gold,impr_diff_pct,clicks_growth,clicks_gold,clicks_diff_pct,status
0,Cadiveu_Instamart_External_20th_Nov_2025,5499.5,7830.87,-29.77,342,463,-26.13,26,39,-33.33,✗ FAIL
1,IKONIC-AMZ-Glide-Peach-14-Oct-2025,30429.55,35091.47,-13.29,287835,324314,-11.25,10622,12189,-12.86,✗ FAIL
2,ME_Search_|_Oct_25,111296.43,131696.03,-15.49,646629,665130,-2.78,13091,15462,-15.33,✗ FAIL
3,Me_Sales_P-Max_Oct25,58665.02,58677.53,-0.02,950096,950106,-0.0,24400,24411,-0.05,✓ PASS
4,Nykaa_Black_Friday_Traffic,3499.35,6058.63,-42.24,216816,376666,-42.44,16089,28485,-43.52,✗ FAIL
5,PRO_Search_|_Oct_25,109587.91,128901.3,-14.98,186906,208435,-10.33,12231,14904,-17.93,✗ FAIL
6,Pro_Sales_P-Max_Oct25,82891.31,82898.63,-0.01,1036331,1036369,-0.0,25061,25068,-0.03,✓ PASS


## Step 6: Validation by Device

In [7]:
print("="*80)
print(f"SEGMENT VALIDATION: BY DEVICE (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Aggregate by device
growth_by_device = growth_df.groupby('Device').agg({
    'Cost': 'sum',
    'Impr': 'sum',
    'Clicks': 'sum'
}).reset_index()
growth_by_device.columns = ['Device', 'cost_growth', 'impr_growth', 'clicks_growth']

gold_by_device = gold_df.groupby('Device').agg({
    'Cost': 'sum',
    'Impr': 'sum',
    'Clicks': 'sum'
}).reset_index()
gold_by_device.columns = ['Device', 'cost_gold', 'impr_gold', 'clicks_gold']

# Merge and compare
device_comparison = pd.merge(growth_by_device, gold_by_device, on='Device', how='inner')

# Calculate percentage differences
device_comparison['cost_diff_pct'] = ((device_comparison['cost_growth'] - device_comparison['cost_gold']) / device_comparison['cost_gold'] * 100).round(2)
device_comparison['impr_diff_pct'] = ((device_comparison['impr_growth'] - device_comparison['impr_gold']) / device_comparison['impr_gold'] * 100).round(2)
device_comparison['clicks_diff_pct'] = ((device_comparison['clicks_growth'] - device_comparison['clicks_gold']) / device_comparison['clicks_gold'] * 100).round(2)

# Apply threshold matching
device_comparison['perfect_match'] = (
    (device_comparison['cost_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (device_comparison['impr_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (device_comparison['clicks_diff_pct'].abs() <= THRESHOLD_PERCENT)
)
device_comparison['status'] = device_comparison['perfect_match'].apply(lambda x: '✓ PASS' if x else '✗ FAIL')

print(f"\nTotal device types compared: {len(device_comparison)}")
print(f"✓ Matches (within {THRESHOLD_PERCENT}%): {device_comparison['perfect_match'].sum()}")
print(f"✗ Exceeds threshold: {(~device_comparison['perfect_match']).sum()}")

print("\nDetailed comparison:")
display(device_comparison)

SEGMENT VALIDATION: BY DEVICE (with 2.0% threshold)

Total device types compared: 4
✓ Matches (within 2.0%): 0
✗ Exceeds threshold: 4

Detailed comparison:


Unnamed: 0,Device,cost_growth,impr_growth,clicks_growth,cost_gold,impr_gold,clicks_gold,cost_diff_pct,impr_diff_pct,clicks_diff_pct,perfect_match,status
0,DESKTOP,20084.7,37213,1754,14687.81,8870,1028,36.74,319.54,70.62,False,✗ FAIL
1,MOBILE,380007.3,3269015,99266,244555.11,1320618,50730,55.39,147.54,95.68,False,✗ FAIL
2,OTHER,584.23,7511,186,191311.2,2229720,68681,-99.69,-99.66,-99.73,False,✗ FAIL
3,TABLET,1192.84,11216,314,600.33,2275,119,98.7,393.01,163.87,False,✗ FAIL


## Step 7: Validation by Campaign + Date

In [8]:
print("="*80)
print(f"SEGMENT VALIDATION: BY CAMPAIGN + DATE (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Aggregate by campaign and date
growth_by_camp_date = growth_df.groupby(['Campaign', 'Day']).agg({
    'Cost': 'sum',
    'Impr': 'sum',
    'Clicks': 'sum'
}).reset_index()
growth_by_camp_date.columns = ['Campaign', 'Day', 'cost_growth', 'impr_growth', 'clicks_growth']

gold_by_camp_date = gold_df.groupby(['Campaign', 'Day']).agg({
    'Cost': 'sum',
    'Impr': 'sum',
    'Clicks': 'sum'
}).reset_index()
gold_by_camp_date.columns = ['Campaign', 'Day', 'cost_gold', 'impr_gold', 'clicks_gold']

# Merge and compare
camp_date_comparison = pd.merge(growth_by_camp_date, gold_by_camp_date, on=['Campaign', 'Day'], how='inner')

# Calculate percentage differences
camp_date_comparison['cost_diff_pct'] = ((camp_date_comparison['cost_growth'] - camp_date_comparison['cost_gold']) / camp_date_comparison['cost_gold'] * 100).round(2)
camp_date_comparison['impr_diff_pct'] = ((camp_date_comparison['impr_growth'] - camp_date_comparison['impr_gold']) / camp_date_comparison['impr_gold'] * 100).round(2)
camp_date_comparison['clicks_diff_pct'] = ((camp_date_comparison['clicks_growth'] - camp_date_comparison['clicks_gold']) / camp_date_comparison['clicks_gold'] * 100).round(2)

# Apply threshold matching
camp_date_comparison['perfect_match'] = (
    (camp_date_comparison['cost_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (camp_date_comparison['impr_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (camp_date_comparison['clicks_diff_pct'].abs() <= THRESHOLD_PERCENT)
)

print(f"\nTotal campaign+date segments: {len(camp_date_comparison)}")
print(f"✓ Matches (within {THRESHOLD_PERCENT}%): {camp_date_comparison['perfect_match'].sum()}")
print(f"✗ Exceeds threshold: {(~camp_date_comparison['perfect_match']).sum()}")

# Show mismatches if any
if (~camp_date_comparison['perfect_match']).sum() > 0:
    print("\nSample mismatches (first 10):")
    mismatches = camp_date_comparison[~camp_date_comparison['perfect_match']]
    display(mismatches[['Campaign', 'Day', 'cost_diff_pct', 'impr_diff_pct', 'clicks_diff_pct']].head(10))
else:
    print("\n✓✓✓ ALL CAMPAIGN+DATE SEGMENTS MATCH! ✓✓✓")

SEGMENT VALIDATION: BY CAMPAIGN + DATE (with 2.0% threshold)

Total campaign+date segments: 162
✓ Matches (within 2.0%): 136
✗ Exceeds threshold: 26

Sample mismatches (first 10):


Unnamed: 0,Campaign,Day,cost_diff_pct,impr_diff_pct,clicks_diff_pct
0,Cadiveu_Instamart_External_20th_Nov_2025,2025-11-20,,0.0,
6,Cadiveu_Instamart_External_20th_Nov_2025,2025-11-26,-50.0,-50.0,-50.0
7,Cadiveu_Instamart_External_20th_Nov_2025,2025-11-27,-50.0,-50.0,-50.0
8,Cadiveu_Instamart_External_20th_Nov_2025,2025-11-28,-50.0,-50.0,-50.0
9,Cadiveu_Instamart_External_20th_Nov_2025,2025-11-29,-50.0,-50.0,-50.0
10,Cadiveu_Instamart_External_20th_Nov_2025,2025-11-30,-50.0,-50.0,-50.0
35,IKONIC-AMZ-Glide-Peach-14-Oct-2025,2025-11-26,-50.0,-50.0,-50.0
36,IKONIC-AMZ-Glide-Peach-14-Oct-2025,2025-11-27,-50.0,-50.0,-50.0
37,IKONIC-AMZ-Glide-Peach-14-Oct-2025,2025-11-28,-50.0,-50.01,-50.0
38,IKONIC-AMZ-Glide-Peach-14-Oct-2025,2025-11-29,-50.0,-50.02,-50.0


## Step 8: Final Summary Report

In [9]:
print("="*80)
print(f"GOOGLE ADS DEVICE DATA VALIDATION SUMMARY (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)
print(f"\nAnalysis completed: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

# Create summary table
summary_data = [
    ['Overall Totals', 3, overall_comparison['Match'].sum(), 3 - overall_comparison['Match'].sum()],
    ['By Date', len(date_comparison), date_comparison['perfect_match'].sum(), 
     (~date_comparison['perfect_match']).sum()],
    ['By Campaign', len(campaign_comparison), campaign_comparison['perfect_match'].sum(), 
     (~campaign_comparison['perfect_match']).sum()],
    ['By Device', len(device_comparison), device_comparison['perfect_match'].sum(), 
     (~device_comparison['perfect_match']).sum()],
    ['By Campaign+Date', len(camp_date_comparison), camp_date_comparison['perfect_match'].sum(), 
     (~camp_date_comparison['perfect_match']).sum()]
]

summary_df = pd.DataFrame(summary_data, 
                         columns=['Segment Type', 'Total Segments', 'Matches', 'Exceeds Threshold'])
summary_df['Match %'] = (summary_df['Matches'] / summary_df['Total Segments'] * 100).round(2)

print("\n")
display(summary_df)

# Overall assessment
total_segments = summary_df['Total Segments'].sum()
total_matches = summary_df['Matches'].sum()
overall_match_pct = (total_matches / total_segments * 100)

print("\n" + "="*80)
print(f"OVERALL MATCH RATE (within {THRESHOLD_PERCENT}%): {total_matches}/{total_segments} ({overall_match_pct:.1f}%)")
print("="*80)

if overall_match_pct == 100:
    print(f"\n✓✓✓ PERFECT VALIDATION! All segments within {THRESHOLD_PERCENT}% threshold! ✓✓✓")
elif overall_match_pct >= 95:
    print(f"\n✓ EXCELLENT! {overall_match_pct:.1f}% of segments within {THRESHOLD_PERCENT}% threshold")
elif overall_match_pct >= 80:
    print(f"\n⚠ GOOD: {overall_match_pct:.1f}% within threshold. Some segments need review.")
else:
    print(f"\n⚠ ATTENTION: Only {overall_match_pct:.1f}% within {THRESHOLD_PERCENT}% threshold. Review required.")

print("\n" + "-"*80)
print("KEY INSIGHTS:")
print("-"*80)
print(f"• Threshold used: {THRESHOLD_PERCENT}%")
print(f"• Segments passing: {total_matches}/{total_segments}")
print(f"• Segments exceeding threshold: {total_segments - total_matches}")
print(f"• Growth rows: {len(growth_df):,}")
print(f"• Gold rows: {len(gold_df):,}")
print(f"• Device types validated: {sorted(device_comparison['Device'].unique())}")

print("\n" + "="*80)
print("VALIDATION COMPLETE")
print("="*80)

GOOGLE ADS DEVICE DATA VALIDATION SUMMARY (with 2.0% threshold)

Analysis completed: 2025-12-21 19:13:48




Unnamed: 0,Segment Type,Total Segments,Matches,Exceeds Threshold,Match %
0,Overall Totals,3,0,3,0.0
1,By Date,29,24,5,82.76
2,By Campaign,7,2,5,28.57
3,By Device,4,0,4,0.0
4,By Campaign+Date,162,136,26,83.95



OVERALL MATCH RATE (within 2.0%): 162/205 (79.0%)

⚠ ATTENTION: Only 79.0% within 2.0% threshold. Review required.

--------------------------------------------------------------------------------
KEY INSIGHTS:
--------------------------------------------------------------------------------
• Threshold used: 2.0%
• Segments passing: 162/205
• Segments exceeding threshold: 43
• Growth rows: 488
• Gold rows: 974
• Device types validated: ['DESKTOP', 'MOBILE', 'OTHER', 'TABLET']

VALIDATION COMPLETE


In [None]:
print("="*80)
print("EXPORTING HTML REPORT")
print("="*80)

import webbrowser
import os

# Create HTML report
html_content = f'''
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Google Ads Device Validation Report</title>
    <style>
        * {{
            margin: 0;
            padding: 0;
            box-sizing: border-box;
        }}
        body {{
            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            padding: 20px;
            min-height: 100vh;
        }}
        .container {{
            max-width: 1400px;
            margin: 0 auto;
            background: white;
            border-radius: 15px;
            box-shadow: 0 20px 60px rgba(0,0,0,0.3);
            overflow: hidden;
        }}
        .header {{
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            padding: 40px;
            text-align: center;
        }}
        .header h1 {{
            font-size: 2.5em;
            margin-bottom: 10px;
            text-shadow: 2px 2px 4px rgba(0,0,0,0.2);
        }}
        .header p {{
            font-size: 1.2em;
            opacity: 0.9;
        }}
        .metrics-grid {{
            display: grid;
            grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
            gap: 20px;
            padding: 30px;
            background: #f8f9fa;
        }}
        .metric-card {{
            background: white;
            padding: 25px;
            border-radius: 10px;
            box-shadow: 0 4px 6px rgba(0,0,0,0.1);
            text-align: center;
            transition: transform 0.3s ease;
        }}
        .metric-card:hover {{
            transform: translateY(-5px);
            box-shadow: 0 8px 12px rgba(0,0,0,0.15);
        }}
        .metric-card h3 {{
            color: #667eea;
            font-size: 1em;
            margin-bottom: 10px;
            text-transform: uppercase;
            letter-spacing: 1px;
        }}
        .metric-card .value {{
            font-size: 2.5em;
            font-weight: bold;
            color: #2c3e50;
        }}
        .content {{
            padding: 30px;
        }}
        .section {{
            margin-bottom: 40px;
        }}
        .section h2 {{
            color: #667eea;
            font-size: 1.8em;
            margin-bottom: 20px;
            padding-bottom: 10px;
            border-bottom: 3px solid #667eea;
        }}
        table {{
            width: 100%;
            border-collapse: collapse;
            margin-top: 15px;
            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        }}
        th {{
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            padding: 15px;
            text-align: left;
            font-weight: 600;
            text-transform: uppercase;
            font-size: 0.9em;
            letter-spacing: 0.5px;
        }}
        td {{
            padding: 12px 15px;
            border-bottom: 1px solid #ecf0f1;
        }}
        tr:hover {{
            background-color: #f8f9fa;
        }}
        .pass {{
            color: #27ae60;
            font-weight: bold;
        }}
        .fail {{
            color: #e74c3c;
            font-weight: bold;
        }}
        .footer {{
            background: #2c3e50;
            color: white;
            text-align: center;
            padding: 20px;
            font-size: 0.9em;
        }}
    </style>
</head>
<body>
    <div class="container">
        <div class="header">
            <h1>📊 Google Ads Device Validation Report</h1>
            <p>Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
            <p>Threshold: {THRESHOLD_PERCENT}% | Device Normalization: Enabled</p>
        </div>
        
        <div class="metrics-grid">
            <div class="metric-card">
                <h3>Overall Match Rate</h3>
                <div class="value">{overall_match_pct:.1f}%</div>
            </div>
            <div class="metric-card">
                <h3>Dates Matched</h3>
                <div class="value">{date_comparison['perfect_match'].sum()}/{len(date_comparison)}</div>
            </div>
            <div class="metric-card">
                <h3>Campaigns Matched</h3>
                <div class="value">{campaign_comparison['perfect_match'].sum()}/{len(campaign_comparison)}</div>
            </div>
            <div class="metric-card">
                <h3>Devices Matched</h3>
                <div class="value">{device_comparison['perfect_match'].sum()}/{len(device_comparison)}</div>
            </div>
            <div class="metric-card">
                <h3>Threshold</h3>
                <div class="value">±{THRESHOLD_PERCENT}%</div>
            </div>
        </div>
        
        <div class="content">
            <div class="section">
                <h2>📊 Overall Totals Comparison</h2>
                <table>
                    <tr>
                        <th>Metric</th>
                        <th>Growth</th>
                        <th>Gold</th>
                        <th>Difference</th>
                        <th>Diff %</th>
                        <th>Status</th>
                    </tr>
'''

for _, row in overall_comparison.iterrows():
    status_class = 'pass' if row['Match'] else 'fail'
    html_content += f'''
                    <tr>
                        <td>{row['Metric']}</td>
                        <td>{row['Growth']:,.2f}</td>
                        <td>{row['Gold']:,.2f}</td>
                        <td>{row['Difference']:,.2f}</td>
                        <td>{row['Diff %']:.2f}%</td>
                        <td class="{status_class}">{row['Status']}</td>
                    </tr>
'''

html_content += f'''
                </table>
            </div>
            
            <div class="section">
                <h2>💻 Validation by Device (All {len(device_comparison)} Devices)</h2>
                <table>
                    <tr>
                        <th>Device</th>
                        <th>Cost Growth</th>
                        <th>Cost Gold</th>
                        <th>Cost Diff %</th>
                        <th>Impressions Growth</th>
                        <th>Impressions Gold</th>
                        <th>Impr Diff %</th>
                        <th>Clicks Growth</th>
                        <th>Clicks Gold</th>
                        <th>Clicks Diff %</th>
                        <th>Status</th>
                    </tr>
'''

for _, row in device_comparison.sort_values('Device').iterrows():
    status_class = 'pass' if row['perfect_match'] else 'fail'
    html_content += f'''
                    <tr>
                        <td>{row['Device']}</td>
                        <td>{row['cost_growth']:,.2f}</td>
                        <td>{row['cost_gold']:,.2f}</td>
                        <td>{row['cost_diff_pct']:.2f}%</td>
                        <td>{row['impr_growth']:,.0f}</td>
                        <td>{row['impr_gold']:,.0f}</td>
                        <td>{row['impr_diff_pct']:.2f}%</td>
                        <td>{row['clicks_growth']:,.0f}</td>
                        <td>{row['clicks_gold']:,.0f}</td>
                        <td>{row['clicks_diff_pct']:.2f}%</td>
                        <td class="{status_class}">{row['status']}</td>
                    </tr>
'''

html_content += f'''
                </table>
            </div>
            
            <div class="section">
                <h2>🎯 Validation by Campaign (All {len(campaign_comparison)} Campaigns)</h2>
                <table>
                    <tr>
                        <th>Campaign</th>
                        <th>Cost Growth</th>
                        <th>Cost Gold</th>
                        <th>Cost Diff %</th>
                        <th>Impressions Growth</th>
                        <th>Impressions Gold</th>
                        <th>Impr Diff %</th>
                        <th>Clicks Growth</th>
                        <th>Clicks Gold</th>
                        <th>Clicks Diff %</th>
                        <th>Status</th>
                    </tr>
'''

for _, row in campaign_comparison.sort_values('Campaign').iterrows():
    status_class = 'pass' if row['perfect_match'] else 'fail'
    html_content += f'''
                    <tr>
                        <td>{row['Campaign']}</td>
                        <td>{row['cost_growth']:,.2f}</td>
                        <td>{row['cost_gold']:,.2f}</td>
                        <td>{row['cost_diff_pct']:.2f}%</td>
                        <td>{row['impr_growth']:,.0f}</td>
                        <td>{row['impr_gold']:,.0f}</td>
                        <td>{row['impr_diff_pct']:.2f}%</td>
                        <td>{row['clicks_growth']:,.0f}</td>
                        <td>{row['clicks_gold']:,.0f}</td>
                        <td>{row['clicks_diff_pct']:.2f}%</td>
                        <td class="{status_class}">{row['status']}</td>
                    </tr>
'''

html_content += f'''
                </table>
            </div>
            
            <div class="section">
                <h2>📋 Summary</h2>
                <table>
                    <tr>
                        <th>Segment Type</th>
                        <th>Total Segments</th>
                        <th>Matches</th>
                        <th>Exceeds Threshold</th>
                        <th>Match %</th>
                    </tr>
'''

for _, row in summary_df.iterrows():
    html_content += f'''
                    <tr>
                        <td>{row['Segment Type']}</td>
                        <td>{row['Total Segments']}</td>
                        <td>{row['Matches']}</td>
                        <td>{row['Exceeds Threshold']}</td>
                        <td>{row['Match %']:.2f}%</td>
                    </tr>
'''

html_content += f'''
                </table>
            </div>
        </div>
        
        <div class="footer">
            <p>Google Ads Device Validation Report | Generated with Python & Pandas</p>
            <p>Threshold: ±{THRESHOLD_PERCENT}% | Overall Match Rate: {overall_match_pct:.1f}%</p>
            <p>Device Normalization: Computers→DESKTOP, Mobile phones→MOBILE, Tablets→TABLET</p>
        </div>
    </div>
</body>
</html>
'''

# Save HTML report
with open('google_ads_device_validation_report.html', 'w', encoding='utf-8') as f:
    f.write(html_content)

print("\n✓ HTML report saved as 'google_ads_device_validation_report.html'")
print("\n" + "="*80)
print("HTML EXPORT COMPLETE")
print("="*80)
print("\nGenerated files:")
print("  • google_ads_device_validation_report.html")

# Automatically open HTML report in browser
html_path = os.path.abspath('google_ads_device_validation_report.html')
print(f"\n🌐 Opening report in browser...")
webbrowser.open('file://' + html_path)

print("\n✓ Report opened in your default browser!")
print("\nYou can also manually open 'google_ads_device_validation_report.html' anytime.")


EXPORTING HTML REPORT

✓ HTML report saved as 'google_ads_device_validation_report.html'

HTML EXPORT COMPLETE

Generated files:
  • google_ads_device_validation_report.html

🌐 Opening report in browser...

✓ Report opened in your default browser!

You can also manually open 'google_ads_device_validation_report.html' anytime.


: 