# Meta Age/Gender Data Segment Validation with 2% Threshold

This notebook performs **segment-level validation** for Meta age/gender data with a **2% tolerance threshold**.

**Files:**
- Growth: `growth/meta_age_gender(growth).xlsx` (1,680 rows)
- Gold: `gold/meta_age_gender(gold).xlsx` (3,251 rows)

**Column Mapping:**
- `Day` → date column
- `Campaign name` → campaign column
- `Gender` → gender segment
- `Age` → age segment
- `Amount spent (INR)` → cost metric
- `Impressions` → impressions metric
- `Link clicks` → clicks metric

**Validation Segments:**
- Overall Totals
- By Date
- By Campaign
- By Gender
- By Age Group
- By Campaign + Date

## Configuration: Set Threshold

In [9]:
# CONFIGURATION: Set your threshold here
THRESHOLD_PERCENT = 2.0  # Accept differences up to 2%

print("="*80)
print("META AGE/GENDER DATA VALIDATION CONFIGURATION")
print("="*80)
print(f"\nThreshold: {THRESHOLD_PERCENT}%")
print(f"Differences under {THRESHOLD_PERCENT}% will be marked as MATCHED")
print("\nYou can change THRESHOLD_PERCENT above to adjust tolerance")

META AGE/GENDER DATA VALIDATION CONFIGURATION

Threshold: 2.0%
Differences under 2.0% will be marked as MATCHED

You can change THRESHOLD_PERCENT above to adjust tolerance


## Step 1: Import Libraries

In [10]:
# Install openpyxl if needed
import sys
!{sys.executable} -m pip install openpyxl -q

import pandas as pd
import numpy as np
from datetime import datetime

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

print("✓ Libraries imported successfully")
print(f"Analysis started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

✓ Libraries imported successfully
Analysis started: 2025-12-18 04:04:31



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: C:\Users\Krishnadev\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


## Step 2: Load and Prepare Data

In [11]:
# Load Growth Excel
print("Loading Growth Excel...")
growth_df = pd.read_excel("growth/meta_age_gender(growth).xlsx")

# Convert Day to string format for consistent comparison
growth_df['Day'] = pd.to_datetime(growth_df['Day']).dt.strftime('%Y-%m-%d')

print(f"✓ Growth loaded: {len(growth_df):,} rows")
print(f"  Columns: {growth_df.columns.tolist()}")

# Load Gold Excel
print("\nLoading Gold Excel...")
gold_df = pd.read_excel("gold/meta_age_gender(gold).xlsx")

# Convert Day to string format for consistent comparison
gold_df['Day'] = pd.to_datetime(gold_df['Day']).dt.strftime('%Y-%m-%d')

print(f"✓ Gold loaded: {len(gold_df):,} rows")
print(f"  Columns: {gold_df.columns.tolist()}")

print("\n" + "="*80)
print("DATA SUMMARY")
print("="*80)
print(f"\nGrowth Date Range: {growth_df['Day'].min()} to {growth_df['Day'].max()}")
print(f"Gold Date Range: {gold_df['Day'].min()} to {gold_df['Day'].max()}")
print(f"\nGrowth Unique Campaigns: {growth_df['Campaign name'].nunique()}")
print(f"Gold Unique Campaigns: {gold_df['Campaign name'].nunique()}")
print(f"\nGrowth Unique Genders: {growth_df['Gender'].nunique()}")
print(f"Gold Unique Genders: {gold_df['Gender'].nunique()}")
print(f"\nGrowth Unique Age Groups: {growth_df['Age'].nunique()}")
print(f"Gold Unique Age Groups: {gold_df['Age'].nunique()}")

Loading Growth Excel...
✓ Growth loaded: 1,680 rows
  Columns: ['Day', 'Campaign name', 'Gender', 'Age', 'Amount spent (INR)', 'Impressions', 'Link clicks', 'Purchases conversion value', 'Purchases', 'Reporting starts', 'Reporting ends']

Loading Gold Excel...
✓ Gold loaded: 3,251 rows
  Columns: ['Day', 'Campaign name', 'Gender', 'Age', 'Amount spent (INR)', 'Impressions', 'Link clicks', 'Purchases conversion value', 'Purchases', 'Reporting starts', 'Reporting ends']

DATA SUMMARY

Growth Date Range: 2025-11-01 to 2025-11-30
Gold Date Range: 2025-11-01 to 2025-11-30

Growth Unique Campaigns: 4
Gold Unique Campaigns: 4

Growth Unique Genders: 3
Gold Unique Genders: 3

Growth Unique Age Groups: 7
Gold Unique Age Groups: 7


## Step 3: Overall Totals Comparison

In [12]:
print("="*80)
print(f"OVERALL TOTALS COMPARISON (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Calculate totals
growth_totals = growth_df[['Amount spent (INR)', 'Impressions', 'Link clicks']].sum()
gold_totals = gold_df[['Amount spent (INR)', 'Impressions', 'Link clicks']].sum()

# Create comparison dataframe
overall_comparison = pd.DataFrame({
    'Metric': ['Amount Spent (INR)', 'Impressions', 'Link Clicks'],
    'Growth': [growth_totals['Amount spent (INR)'], growth_totals['Impressions'], growth_totals['Link clicks']],
    'Gold': [gold_totals['Amount spent (INR)'], gold_totals['Impressions'], gold_totals['Link clicks']],
})

overall_comparison['Difference'] = overall_comparison['Growth'] - overall_comparison['Gold']
overall_comparison['Diff %'] = (overall_comparison['Difference'] / overall_comparison['Gold'] * 100).round(2)
overall_comparison['Match'] = overall_comparison['Diff %'].abs() <= THRESHOLD_PERCENT
overall_comparison['Status'] = overall_comparison['Match'].apply(lambda x: '✓ PASS' if x else '✗ FAIL')

display(overall_comparison)

# Summary
matches = overall_comparison['Match'].sum()
print(f"\n✓ Matches (within {THRESHOLD_PERCENT}%): {matches}/3 metrics")
if matches == 3:
    print(f"✓✓✓ ALL OVERALL TOTALS MATCH (within {THRESHOLD_PERCENT}% threshold)! ✓✓✓")
else:
    print(f"⚠ {3-matches} metric(s) exceed {THRESHOLD_PERCENT}% threshold")

OVERALL TOTALS COMPARISON (with 2.0% threshold)


Unnamed: 0,Metric,Growth,Gold,Difference,Diff %,Match,Status
0,Amount Spent (INR),566297.06,566308.69,-11.63,-0.0,True,✓ PASS
1,Impressions,5076744.0,5076850.0,-106.0,-0.0,True,✓ PASS
2,Link Clicks,82102.0,82104.0,-2.0,-0.0,True,✓ PASS



✓ Matches (within 2.0%): 3/3 metrics
✓✓✓ ALL OVERALL TOTALS MATCH (within 2.0% threshold)! ✓✓✓


## Step 4: Validation by Date

In [13]:
print("="*80)
print(f"SEGMENT VALIDATION: BY DATE (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Aggregate by date
growth_by_date = growth_df.groupby('Day').agg({
    'Amount spent (INR)': 'sum',
    'Impressions': 'sum',
    'Link clicks': 'sum'
}).reset_index()
growth_by_date.columns = ['Day', 'amount_growth', 'impressions_growth', 'clicks_growth']

gold_by_date = gold_df.groupby('Day').agg({
    'Amount spent (INR)': 'sum',
    'Impressions': 'sum',
    'Link clicks': 'sum'
}).reset_index()
gold_by_date.columns = ['Day', 'amount_gold', 'impressions_gold', 'clicks_gold']

# Merge and compare
date_comparison = pd.merge(growth_by_date, gold_by_date, on='Day', how='inner')

# Calculate percentage differences
date_comparison['amount_diff_pct'] = ((date_comparison['amount_growth'] - date_comparison['amount_gold']) / date_comparison['amount_gold'] * 100).round(2)
date_comparison['impr_diff_pct'] = ((date_comparison['impressions_growth'] - date_comparison['impressions_gold']) / date_comparison['impressions_gold'] * 100).round(2)
date_comparison['clicks_diff_pct'] = ((date_comparison['clicks_growth'] - date_comparison['clicks_gold']) / date_comparison['clicks_gold'] * 100).round(2)

# Apply threshold matching
date_comparison['perfect_match'] = (
    (date_comparison['amount_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (date_comparison['impr_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (date_comparison['clicks_diff_pct'].abs() <= THRESHOLD_PERCENT)
)
date_comparison['status'] = date_comparison['perfect_match'].apply(lambda x: '✓ PASS' if x else '✗ FAIL')

print(f"\nTotal dates compared: {len(date_comparison)}")
print(f"✓ Matches (within {THRESHOLD_PERCENT}%): {date_comparison['perfect_match'].sum()}")
print(f"✗ Exceeds threshold: {(~date_comparison['perfect_match']).sum()}")

print("\nDetailed comparison:")
display(date_comparison[['Day', 'amount_growth', 'amount_gold', 'amount_diff_pct',
                          'impressions_growth', 'impressions_gold', 'impr_diff_pct',
                          'clicks_growth', 'clicks_gold', 'clicks_diff_pct', 'status']].sort_values('Day'))

SEGMENT VALIDATION: BY DATE (with 2.0% threshold)

Total dates compared: 30
✓ Matches (within 2.0%): 30
✗ Exceeds threshold: 0

Detailed comparison:


Unnamed: 0,Day,amount_growth,amount_gold,amount_diff_pct,impressions_growth,impressions_gold,impr_diff_pct,clicks_growth,clicks_gold,clicks_diff_pct,status
0,2025-11-01,13084.64,13084.64,-0.0,345244,345244,0.0,1655.0,1655,0.0,✓ PASS
1,2025-11-02,14197.48,14197.43,0.0,73582,73582,0.0,483.0,483,0.0,✓ PASS
2,2025-11-03,11293.66,11293.67,-0.0,61257,61257,0.0,608.0,608,0.0,✓ PASS
3,2025-11-04,26489.13,26489.09,0.0,275517,275517,0.0,3139.0,3139,0.0,✓ PASS
4,2025-11-05,17204.24,17204.21,0.0,120019,120019,0.0,2056.0,2056,0.0,✓ PASS
5,2025-11-06,16253.66,16253.67,-0.0,111832,111832,0.0,1795.0,1795,0.0,✓ PASS
6,2025-11-07,24867.92,24867.91,0.0,232865,232865,0.0,2887.0,2887,0.0,✓ PASS
7,2025-11-08,22173.71,22173.69,0.0,150184,150184,0.0,1938.0,1938,0.0,✓ PASS
8,2025-11-09,21629.76,21629.76,0.0,180267,180267,0.0,1723.0,1723,0.0,✓ PASS
9,2025-11-10,16802.72,16802.73,-0.0,119316,119316,0.0,1846.0,1846,0.0,✓ PASS


## Step 5: Validation by Campaign

In [14]:
print("="*80)
print(f"SEGMENT VALIDATION: BY CAMPAIGN (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Aggregate by campaign
growth_by_campaign = growth_df.groupby('Campaign name').agg({
    'Amount spent (INR)': 'sum',
    'Impressions': 'sum',
    'Link clicks': 'sum'
}).reset_index()
growth_by_campaign.columns = ['Campaign name', 'amount_growth', 'impressions_growth', 'clicks_growth']

gold_by_campaign = gold_df.groupby('Campaign name').agg({
    'Amount spent (INR)': 'sum',
    'Impressions': 'sum',
    'Link clicks': 'sum'
}).reset_index()
gold_by_campaign.columns = ['Campaign name', 'amount_gold', 'impressions_gold', 'clicks_gold']

# Merge and compare
campaign_comparison = pd.merge(growth_by_campaign, gold_by_campaign, on='Campaign name', how='inner')

# Calculate percentage differences
campaign_comparison['amount_diff_pct'] = ((campaign_comparison['amount_growth'] - campaign_comparison['amount_gold']) / campaign_comparison['amount_gold'] * 100).round(2)
campaign_comparison['impr_diff_pct'] = ((campaign_comparison['impressions_growth'] - campaign_comparison['impressions_gold']) / campaign_comparison['impressions_gold'] * 100).round(2)
campaign_comparison['clicks_diff_pct'] = ((campaign_comparison['clicks_growth'] - campaign_comparison['clicks_gold']) / campaign_comparison['clicks_gold'] * 100).round(2)

# Apply threshold matching
campaign_comparison['perfect_match'] = (
    (campaign_comparison['amount_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (campaign_comparison['impr_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (campaign_comparison['clicks_diff_pct'].abs() <= THRESHOLD_PERCENT)
)
campaign_comparison['status'] = campaign_comparison['perfect_match'].apply(lambda x: '✓ PASS' if x else '✗ FAIL')

print(f"\nTotal campaigns compared: {len(campaign_comparison)}")
print(f"✓ Matches (within {THRESHOLD_PERCENT}%): {campaign_comparison['perfect_match'].sum()}")
print(f"✗ Exceeds threshold: {(~campaign_comparison['perfect_match']).sum()}")

print("\nDetailed comparison:")
display(campaign_comparison[['Campaign name', 'amount_growth', 'amount_gold', 'amount_diff_pct',
                              'impressions_growth', 'impressions_gold', 'impr_diff_pct',
                              'clicks_growth', 'clicks_gold', 'clicks_diff_pct', 'status']].sort_values('Campaign name'))

SEGMENT VALIDATION: BY CAMPAIGN (with 2.0% threshold)

Total campaigns compared: 4
✓ Matches (within 2.0%): 4
✗ Exceeds threshold: 0

Detailed comparison:


Unnamed: 0,Campaign name,amount_growth,amount_gold,amount_diff_pct,impressions_growth,impressions_gold,impr_diff_pct,clicks_growth,clicks_gold,clicks_diff_pct,status
0,Ikonic -Scalp-Massager-Amazon-1-Nov2025,8967.01,8967.0,0.0,56943,56943,0.0,254.0,254,0.0,✓ PASS
1,Ikonic ME | Sales Retargeting,207786.87,207787.01,-0.0,1514495,1514502,-0.0,21140.0,21140,0.0,✓ PASS
2,Ikonic ME | Sales Prospecting,193185.73,193189.58,-0.0,1752762,1752793,-0.0,26509.0,26511,-0.01,✓ PASS
3,Ikonic Me | Sales Catalogue,156357.45,156365.1,-0.0,1752544,1752612,-0.0,34199.0,34199,0.0,✓ PASS


## Step 6: Validation by Gender

In [15]:
print("="*80)
print(f"SEGMENT VALIDATION: BY GENDER (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Aggregate by gender
growth_by_gender = growth_df.groupby('Gender').agg({
    'Amount spent (INR)': 'sum',
    'Impressions': 'sum',
    'Link clicks': 'sum'
}).reset_index()
growth_by_gender.columns = ['Gender', 'amount_growth', 'impressions_growth', 'clicks_growth']

gold_by_gender = gold_df.groupby('Gender').agg({
    'Amount spent (INR)': 'sum',
    'Impressions': 'sum',
    'Link clicks': 'sum'
}).reset_index()
gold_by_gender.columns = ['Gender', 'amount_gold', 'impressions_gold', 'clicks_gold']

# Merge and compare
gender_comparison = pd.merge(growth_by_gender, gold_by_gender, on='Gender', how='inner')

# Calculate percentage differences
gender_comparison['amount_diff_pct'] = ((gender_comparison['amount_growth'] - gender_comparison['amount_gold']) / gender_comparison['amount_gold'] * 100).round(2)
gender_comparison['impr_diff_pct'] = ((gender_comparison['impressions_growth'] - gender_comparison['impressions_gold']) / gender_comparison['impressions_gold'] * 100).round(2)
gender_comparison['clicks_diff_pct'] = ((gender_comparison['clicks_growth'] - gender_comparison['clicks_gold']) / gender_comparison['clicks_gold'] * 100).round(2)

# Apply threshold matching
gender_comparison['perfect_match'] = (
    (gender_comparison['amount_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (gender_comparison['impr_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (gender_comparison['clicks_diff_pct'].abs() <= THRESHOLD_PERCENT)
)
gender_comparison['status'] = gender_comparison['perfect_match'].apply(lambda x: '✓ PASS' if x else '✗ FAIL')

print(f"\nTotal gender segments compared: {len(gender_comparison)}")
print(f"✓ Matches (within {THRESHOLD_PERCENT}%): {gender_comparison['perfect_match'].sum()}")
print(f"✗ Exceeds threshold: {(~gender_comparison['perfect_match']).sum()}")

print("\nDetailed comparison:")
display(gender_comparison)

SEGMENT VALIDATION: BY GENDER (with 2.0% threshold)

Total gender segments compared: 3
✓ Matches (within 2.0%): 3
✗ Exceeds threshold: 0

Detailed comparison:


Unnamed: 0,Gender,amount_growth,impressions_growth,clicks_growth,amount_gold,impressions_gold,clicks_gold,amount_diff_pct,impr_diff_pct,clicks_diff_pct,perfect_match,status
0,female,413298.46,3096857,59065.0,413303.26,3096941,59066,-0.0,-0.0,-0.0,True,✓ PASS
1,male,151275.38,1965546,22799.0,151282.22,1965567,22800,-0.0,-0.0,-0.0,True,✓ PASS
2,unknown,1723.22,14341,238.0,1723.21,14342,238,0.0,-0.01,0.0,True,✓ PASS


## Step 7: Validation by Age Group

In [16]:
print("="*80)
print(f"SEGMENT VALIDATION: BY AGE GROUP (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Aggregate by age
growth_by_age = growth_df.groupby('Age').agg({
    'Amount spent (INR)': 'sum',
    'Impressions': 'sum',
    'Link clicks': 'sum'
}).reset_index()
growth_by_age.columns = ['Age', 'amount_growth', 'impressions_growth', 'clicks_growth']

gold_by_age = gold_df.groupby('Age').agg({
    'Amount spent (INR)': 'sum',
    'Impressions': 'sum',
    'Link clicks': 'sum'
}).reset_index()
gold_by_age.columns = ['Age', 'amount_gold', 'impressions_gold', 'clicks_gold']

# Merge and compare
age_comparison = pd.merge(growth_by_age, gold_by_age, on='Age', how='inner')

# Calculate percentage differences
age_comparison['amount_diff_pct'] = ((age_comparison['amount_growth'] - age_comparison['amount_gold']) / age_comparison['amount_gold'] * 100).round(2)
age_comparison['impr_diff_pct'] = ((age_comparison['impressions_growth'] - age_comparison['impressions_gold']) / age_comparison['impressions_gold'] * 100).round(2)
age_comparison['clicks_diff_pct'] = ((age_comparison['clicks_growth'] - age_comparison['clicks_gold']) / age_comparison['clicks_gold'] * 100).round(2)

# Apply threshold matching
age_comparison['perfect_match'] = (
    (age_comparison['amount_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (age_comparison['impr_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (age_comparison['clicks_diff_pct'].abs() <= THRESHOLD_PERCENT)
)
age_comparison['status'] = age_comparison['perfect_match'].apply(lambda x: '✓ PASS' if x else '✗ FAIL')

print(f"\nTotal age groups compared: {len(age_comparison)}")
print(f"✓ Matches (within {THRESHOLD_PERCENT}%): {age_comparison['perfect_match'].sum()}")
print(f"✗ Exceeds threshold: {(~age_comparison['perfect_match']).sum()}")

print("\nDetailed comparison:")
display(age_comparison.sort_values('Age'))

SEGMENT VALIDATION: BY AGE GROUP (with 2.0% threshold)

Total age groups compared: 7
✓ Matches (within 2.0%): 6
✗ Exceeds threshold: 1

Detailed comparison:


Unnamed: 0,Age,amount_growth,impressions_growth,clicks_growth,amount_gold,impressions_gold,clicks_gold,amount_diff_pct,impr_diff_pct,clicks_diff_pct,perfect_match,status
0,18-24,83407.78,1351352,14641.0,83408.57,1351350,14641,-0.0,0.0,0.0,True,✓ PASS
1,25-34,235278.51,2334647,34483.0,235284.91,2334713,34484,-0.0,-0.0,-0.0,True,✓ PASS
2,35-44,172653.31,1076844,24183.0,172660.88,1076875,24184,-0.0,-0.0,-0.0,True,✓ PASS
3,45-54,55768.24,237686,6824.0,55766.25,237697,6824,0.0,-0.0,0.0,True,✓ PASS
4,55-64,14534.83,49196,1371.0,14535.28,49197,1371,-0.0,-0.0,0.0,True,✓ PASS
5,65+,4644.2,26954,600.0,4642.62,26953,600,0.03,0.0,0.0,True,✓ PASS
6,Unknown,10.18,65,0.0,10.18,65,0,0.01,0.0,,False,✗ FAIL


## Step 8: Validation by Campaign + Date

In [17]:
print("="*80)
print(f"SEGMENT VALIDATION: BY CAMPAIGN + DATE (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)

# Aggregate by campaign and date
growth_by_camp_date = growth_df.groupby(['Campaign name', 'Day']).agg({
    'Amount spent (INR)': 'sum',
    'Impressions': 'sum',
    'Link clicks': 'sum'
}).reset_index()
growth_by_camp_date.columns = ['Campaign name', 'Day', 'amount_growth', 'impressions_growth', 'clicks_growth']

gold_by_camp_date = gold_df.groupby(['Campaign name', 'Day']).agg({
    'Amount spent (INR)': 'sum',
    'Impressions': 'sum',
    'Link clicks': 'sum'
}).reset_index()
gold_by_camp_date.columns = ['Campaign name', 'Day', 'amount_gold', 'impressions_gold', 'clicks_gold']

# Merge and compare
camp_date_comparison = pd.merge(growth_by_camp_date, gold_by_camp_date, on=['Campaign name', 'Day'], how='inner')

# Calculate percentage differences
camp_date_comparison['amount_diff_pct'] = ((camp_date_comparison['amount_growth'] - camp_date_comparison['amount_gold']) / camp_date_comparison['amount_gold'] * 100).round(2)
camp_date_comparison['impr_diff_pct'] = ((camp_date_comparison['impressions_growth'] - camp_date_comparison['impressions_gold']) / camp_date_comparison['impressions_gold'] * 100).round(2)
camp_date_comparison['clicks_diff_pct'] = ((camp_date_comparison['clicks_growth'] - camp_date_comparison['clicks_gold']) / camp_date_comparison['clicks_gold'] * 100).round(2)

# Apply threshold matching
camp_date_comparison['perfect_match'] = (
    (camp_date_comparison['amount_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (camp_date_comparison['impr_diff_pct'].abs() <= THRESHOLD_PERCENT) & 
    (camp_date_comparison['clicks_diff_pct'].abs() <= THRESHOLD_PERCENT)
)

print(f"\nTotal campaign+date segments: {len(camp_date_comparison)}")
print(f"✓ Matches (within {THRESHOLD_PERCENT}%): {camp_date_comparison['perfect_match'].sum()}")
print(f"✗ Exceeds threshold: {(~camp_date_comparison['perfect_match']).sum()}")

# Show mismatches if any
if (~camp_date_comparison['perfect_match']).sum() > 0:
    print("\nSample mismatches (first 10):")
    mismatches = camp_date_comparison[~camp_date_comparison['perfect_match']]
    display(mismatches[['Campaign name', 'Day', 'amount_diff_pct', 'impr_diff_pct', 'clicks_diff_pct']].head(10))
else:
    print("\n✓✓✓ ALL CAMPAIGN+DATE SEGMENTS MATCH! ✓✓✓")

SEGMENT VALIDATION: BY CAMPAIGN + DATE (with 2.0% threshold)

Total campaign+date segments: 101
✓ Matches (within 2.0%): 100
✗ Exceeds threshold: 1

Sample mismatches (first 10):


Unnamed: 0,Campaign name,Day,amount_diff_pct,impr_diff_pct,clicks_diff_pct
10,Ikonic -Scalp-Massager-Amazon-1-Nov2025,2025-11-11,,,


## Step 9: Final Summary Report

In [18]:
print("="*80)
print(f"META AGE/GENDER DATA VALIDATION SUMMARY (with {THRESHOLD_PERCENT}% threshold)")
print("="*80)
print(f"\nAnalysis completed: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

# Create summary table
summary_data = [
    ['Overall Totals', 3, overall_comparison['Match'].sum(), 3 - overall_comparison['Match'].sum()],
    ['By Date', len(date_comparison), date_comparison['perfect_match'].sum(), 
     (~date_comparison['perfect_match']).sum()],
    ['By Campaign', len(campaign_comparison), campaign_comparison['perfect_match'].sum(), 
     (~campaign_comparison['perfect_match']).sum()],
    ['By Gender', len(gender_comparison), gender_comparison['perfect_match'].sum(), 
     (~gender_comparison['perfect_match']).sum()],
    ['By Age Group', len(age_comparison), age_comparison['perfect_match'].sum(), 
     (~age_comparison['perfect_match']).sum()],
    ['By Campaign+Date', len(camp_date_comparison), camp_date_comparison['perfect_match'].sum(), 
     (~camp_date_comparison['perfect_match']).sum()]
]

summary_df = pd.DataFrame(summary_data, 
                         columns=['Segment Type', 'Total Segments', 'Matches', 'Exceeds Threshold'])
summary_df['Match %'] = (summary_df['Matches'] / summary_df['Total Segments'] * 100).round(2)

print("\n")
display(summary_df)

# Overall assessment
total_segments = summary_df['Total Segments'].sum()
total_matches = summary_df['Matches'].sum()
overall_match_pct = (total_matches / total_segments * 100)

print("\n" + "="*80)
print(f"OVERALL MATCH RATE (within {THRESHOLD_PERCENT}%): {total_matches}/{total_segments} ({overall_match_pct:.1f}%)")
print("="*80)

if overall_match_pct == 100:
    print(f"\n✓✓✓ PERFECT VALIDATION! All segments within {THRESHOLD_PERCENT}% threshold! ✓✓✓")
elif overall_match_pct >= 95:
    print(f"\n✓ EXCELLENT! {overall_match_pct:.1f}% of segments within {THRESHOLD_PERCENT}% threshold")
elif overall_match_pct >= 80:
    print(f"\n⚠ GOOD: {overall_match_pct:.1f}% within threshold. Some segments need review.")
else:
    print(f"\n⚠ ATTENTION: Only {overall_match_pct:.1f}% within {THRESHOLD_PERCENT}% threshold. Review required.")

print("\n" + "-"*80)
print("KEY INSIGHTS:")
print("-"*80)
print(f"• Threshold used: {THRESHOLD_PERCENT}%")
print(f"• Segments passing: {total_matches}/{total_segments}")
print(f"• Segments exceeding threshold: {total_segments - total_matches}")
print(f"• Growth rows: {len(growth_df):,}")
print(f"• Gold rows: {len(gold_df):,}")
if 'gender_comparison' in locals():
    print(f"• Unique genders: {gender_comparison['Gender'].nunique()}")
if 'age_comparison' in locals():
    print(f"• Unique age groups: {age_comparison['Age'].nunique()}")

print("\n" + "="*80)
print("VALIDATION COMPLETE")
print("="*80)


META AGE/GENDER DATA VALIDATION SUMMARY (with 2.0% threshold)

Analysis completed: 2025-12-18 04:04:32




Unnamed: 0,Segment Type,Total Segments,Matches,Exceeds Threshold,Match %
0,Overall Totals,3,3,0,100.0
1,By Date,30,30,0,100.0
2,By Campaign,4,4,0,100.0
3,By Gender,3,3,0,100.0
4,By Age Group,7,6,1,85.71
5,By Campaign+Date,101,100,1,99.01



OVERALL MATCH RATE (within 2.0%): 146/148 (98.6%)

✓ EXCELLENT! 98.6% of segments within 2.0% threshold

--------------------------------------------------------------------------------
KEY INSIGHTS:
--------------------------------------------------------------------------------
• Threshold used: 2.0%
• Segments passing: 146/148
• Segments exceeding threshold: 2
• Growth rows: 1,680
• Gold rows: 3,251

VALIDATION COMPLETE


## Step 10: Generate Interactive HTML Dashboard

In [None]:
# ================================================================================
# COMPREHENSIVE HTML REPORT GENERATION
# ================================================================================
import os
import webbrowser

def create_table_html(df, title):
    if df is None or len(df) == 0:
        return f"<div class='no-data'>No data available for {title}</div>"
    
    # Identify key columns for color coding
    status_col = 'status' if 'status' in df.columns else ('Status' if 'Status' in df.columns else None)
    
    html = f"<h3>{title}</h3>"
    html += "<div class='table-container'><table><thead><tr>"
    for col in df.columns:
        html += f"<th>{col}</th>"
    html += "</tr></thead><tbody>"
    
    for _, row in df.iterrows():
        row_style = ""
        if status_col:
            val = str(row[status_col])
            if 'FAIL' in val or 'Mismatch' in val or '✗' in val:
                row_style = " class='row-fail'"
            elif 'PASS' in val or 'Match' in val or '✓' in val:
                row_style = " class='row-pass'"
        
        html += f"<tr{row_style}>"
        for col in df.columns:
            val = row[col]
            if isinstance(val, float):
                html += f"<td>{val:,.2f}</td>"
            else:
                html += f"<td>{val}</td>"
        html += "</tr>"
    html += "</tbody></table></div>"
    return html

# Prepare Metrics for Dashboard Cards
total_segments_count = summary_df['Total Segments'].sum()
matches_count = summary_df['Matches'].sum()
match_rate = (matches_count / total_segments_count * 100)

# Prepare Chart Data
overall_metrics_labels = []
overall_metrics_growth = []
overall_metrics_gold = []
if 'overall_comparison' in dir():
    for _, row in overall_comparison.iterrows():
        overall_metrics_labels.append(row['Metric'])
        overall_metrics_growth.append(float(row['Growth']))
        overall_metrics_gold.append(float(row['Gold']))

segment_labels = summary_df['Segment Type'].tolist()
segment_matches = summary_df['Match %'].tolist()

# Date chart data (limit to 15 for readability)
date_labels = date_comparison['Day'].tail(15).tolist()
date_diffs = date_comparison['amount_diff_pct'].tail(15).tolist()

report_html = f"""
<!DOCTYPE html>
<html>
<head>
    <title>Meta Age/Gender Validation Report</title>
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
    <style>
        :root {{
            --primary: #1877F2; /* Meta Blue */
            --bg: #f0f2f5;
            --card: #ffffff;
            --text: #1c1e21;
            --pass: #e7f3ff;
            --pass-text: #1877F2;
            --fail: #fff0f0;
            --fail-text: #d70000;
        }}
        body {{ 
            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; 
            margin: 0; 
            padding: 20px; 
            background-color: var(--bg);
            color: var(--text);
        }}
        .header {{ 
            background: linear-gradient(135deg, #0062E0 0%, #1877F2 100%);
            color: white; 
            padding: 30px; 
            border-radius: 12px; 
            margin-bottom: 25px;
            box-shadow: 0 4px 6px rgba(0,0,0,0.1);
        }}
        .stats-grid {{ 
            display: grid; 
            grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); 
            gap: 20px; 
            margin-bottom: 25px; 
        }}
        .stat-card {{ 
            background: var(--card); 
            padding: 25px; 
            border-radius: 12px; 
            text-align: center; 
            box-shadow: 0 2px 4px rgba(0,0,0,0.05);
            border-top: 5px solid var(--primary);
        }}
        .stat-value {{ font-size: 32px; font-weight: bold; margin: 10px 0; color: var(--primary); }}
        .stat-label {{ color: #65676b; font-size: 14px; text-transform: uppercase; letter-spacing: 1px; }}
        
        .charts-grid {{ 
            display: grid; 
            grid-template-columns: repeat(auto-fit, minmax(450px, 1fr)); 
            gap: 20px; 
            margin-bottom: 25px; 
        }}
        .chart-container {{ 
            background: var(--card); 
            padding: 20px; 
            border-radius: 12px; 
            box-shadow: 0 2px 4px rgba(0,0,0,0.05);
            height: 350px;
        }}
        
        .table-section {{ 
            background: var(--card); 
            padding: 25px; 
            border-radius: 12px; 
            margin-bottom: 25px; 
            box-shadow: 0 2px 4px rgba(0,0,0,0.05);
        }}
        .table-container {{ overflow-x: auto; }}
        table {{ width: 100%; border-collapse: collapse; margin-top: 15px; font-size: 14px; }}
        th {{ background-color: #f8f9fa; padding: 12px; text-align: left; border-bottom: 2px solid #ddd; }}
        td {{ padding: 12px; border-bottom: 1px solid #eee; }}
        .row-pass {{ background-color: var(--pass); color: var(--pass-text); }}
        .row-fail {{ background-color: var(--fail); color: var(--fail-text); font-weight: bold; }}
        
        @media (max-width: 600px) {{
            .charts-grid {{ grid-template-columns: 1fr; }}
        }}
    </style>
</head>
<body>
    <div class='header'>
        <h1 style='margin:0'>Meta Age/Gender Validation Dashboard</h1>
        <p style='margin:10px 0 0 0; opacity: 0.9;'>Validation Report Generated on {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
    </div>
    
    <div class='stats-grid'>
        <div class='stat-card'>
            <div class='stat-label'>Overall Match Rate</div>
            <div class='stat-value'>{match_rate:.1f}%</div>
            <div class='stat-label'>{matches_count}/{total_segments_count} Segments</div>
        </div>
        <div class='stat-card'>
            <div class='stat-label'>Validation Strategy</div>
            <div class='stat-value'>6 Levels</div>
            <div class='stat-label'>Cross-Segment Checking</div>
        </div>
        <div class='stat-card'>
            <div class='stat-label'>Threshold Applied</div>
            <div class='stat-value'>{THRESHOLD_PERCENT}%</div>
            <div class='stat-label'>Acceptable Variance</div>
        </div>
    </div>

    <div class='charts-grid'>
        <div class='chart-container'>
            <canvas id='overallChart'></canvas>
        </div>
        <div class='chart-container'>
            <canvas id='segmentsChart'></canvas>
        </div>
        <div class='chart-container'>
            <canvas id='trendChart'></canvas>
        </div>
    </div>

    <div class='table-section'>
        <h2>Validation Details</h2>
        {create_table_html(summary_df, '📊 Summary Overview')}
        {create_table_html(overall_comparison, '🔍 Overall Totals')}
        {create_table_html(gender_comparison, '🚻 By Gender')}
        {create_table_html(age_comparison, '🎂 By Age Group')}
        {create_table_html(campaign_comparison.head(20), '🚀 By Campaign (Top 20)')}
        {create_table_html(date_comparison.sort_values('Day', ascending=False).head(15), '📅 By Date (Recent 15)')}
    </div>

    <script>
        // Overall Metrics Chart
        new Chart(document.getElementById('overallChart'), {{
            type: 'bar',
            data: {{
                labels: {overall_metrics_labels},
                datasets: [
                    {{ label: 'Growth', data: {overall_metrics_growth}, backgroundColor: '#1877F2' }},
                    {{ label: 'Gold', data: {overall_metrics_gold}, backgroundColor: '#42b72a' }}
                ]
            }},
            options: {{ 
                responsive: true, 
                maintainAspectRatio: false,
                plugins: {{ title: {{ display: true, text: 'Overall Totals: Growth vs Gold' }} }}
            }}
        }});

        // Match Rate by Segment
        new Chart(document.getElementById('segmentsChart'), {{
            type: 'polarArea',
            data: {{
                labels: {segment_labels},
                datasets: [{{ 
                    data: {segment_matches},
                    backgroundColor: [
                        'rgba(24, 119, 242, 0.7)',
                        'rgba(66, 183, 42, 0.7)',
                        'rgba(245, 120, 0, 0.7)',
                        'rgba(64, 191, 255, 0.7)',
                        'rgba(147, 51, 234, 0.7)',
                        'rgba(236, 72, 153, 0.7)'
                    ]
                }}]
            }},
            options: {{ 
                responsive: true, 
                maintainAspectRatio: false,
                plugins: {{ title: {{ display: true, text: 'Match % by Validation Level' }} }}
            }}
        }});

        // Trend Chart
        new Chart(document.getElementById('trendChart'), {{
            type: 'line',
            data: {{
                labels: {date_labels},
                datasets: [{{ 
                    label: 'Difference %',
                    data: {date_diffs},
                    borderColor: '#1877F2',
                    tension: 0.3,
                    fill: true,
                    backgroundColor: 'rgba(24, 119, 242, 0.1)'
                }}]
            }},
            options: {{ 
                responsive: true, 
                maintainAspectRatio: false,
                plugins: {{ title: {{ display: true, text: 'Cost Difference % Trend (Recent 15 Days)' }} }},
                scales: {{ y: {{ beginAtZero: false }} }}
            }}
        }});
    </script>
</body>
</html>
"""

with open('meta_age_gender_validation_report.html', 'w', encoding='utf-8') as f:
    f.write(report_html)

print("="*80)
print("INTERACTIVE DASHBOARD GENERATED")
print("="*80)
print(f"Report saved to: {os.path.abspath('meta_age_gender_validation_report.html')}")

# Try to open in browser automatically
try:
    webbrowser.open('file://' + os.path.abspath('meta_age_gender_validation_report.html'))
    print("Dashboard opened in default browser.")
except:
    print("Could not open browser automatically. Please open the file manually.")
