# Is there variance in Site Scanning and the Chrome User Experience Report (CrUX)?

Here, we look at the difference between CrUX data that reports the p75 values of the Core Web Vitals at the origin level and the results from the synthetic performance testing data from the Site Scanning report.

We learned during our interviews that most of the traffic on a site originates from search and not from the home page. Seeing what real users are experiencing at the origin-level may help illuminate other opportunities where performance was believed to be good based on the site scan but the real user data tells a different story.

In [34]:
import os
import pandas as pd

def load_results_to_dataframe(**kwargs):
    file_path = os.path.join(kwargs["notebook_dir"], kwargs["file_name"])
    df = pd.read_csv(file_path)
    return df

In [35]:
crux = load_results_to_dataframe(
    file_name="data/crux-data-202406.csv",
    notebook_dir=os.getcwd()
)

sitescanning = load_results_to_dataframe(
    file_name="data/site-scanning-weekly-snapshot-20240722.csv",
    notebook_dir=os.getcwd()
) 

sitescanning['final_url_website'] = 'https://' + sitescanning['final_url_website']

merged_df = pd.merge(sitescanning, crux, left_on='final_url_website', right_on='origin')
required_columns = ['origin', 'p75_lcp', 'largest_contentful_paint', 'p75_cls', 'cumulative_layout_shift', 'p75_inp']
merged_df.drop_duplicates(subset='origin', keep='first', inplace=True)
merged_df = merged_df[required_columns].dropna()
merged_df.rename(columns={'p75_lcp': 'lcp_crux_p75', 'p75_cls': 'cls_crux_p75', 'largest_contentful_paint': 'lcp_site_scanning', 'cumulative_layout_shift': 'cls_site_scanning'}, inplace=True)
merged_df['lcp_difference_crux_vs_scan'] = merged_df['lcp_crux_p75'] - merged_df['lcp_site_scanning']
merged_df['cls_difference_crux_vs_scan'] = merged_df['cls_crux_p75'] - merged_df['cls_site_scanning']
merged_df['lcp_abs_difference_crux_vs_scan'] = abs(merged_df['lcp_crux_p75'] - merged_df['lcp_site_scanning'])
merged_df['cls_abs_difference_crux_vs_scan'] = abs(merged_df['cls_crux_p75'] - merged_df['cls_site_scanning'])

merged_df.sort_values(by='lcp_abs_difference_crux_vs_scan', ascending=False).head(15000)

Unnamed: 0,origin,lcp_crux_p75,lcp_site_scanning,cls_crux_p75,cls_site_scanning,p75_inp,lcp_difference_crux_vs_scan,cls_difference_crux_vs_scan,lcp_abs_difference_crux_vs_scan,cls_abs_difference_crux_vs_scan
5437,https://apps.nea.gov,25000.0,107.100,0.00,0.000000,175.0,24892.900,0.000000,24892.900,0.000000
6851,https://invitation.nasa.gov,4900.0,26472.400,0.05,0.000045,175.0,-21572.400,0.049955,21572.400,0.049955
1143,https://passport.intelink.gov,20600.0,1286.399,0.00,0.038292,25.0,19313.601,-0.038292,19313.601,0.038292
8523,https://crg.health.mil,5400.0,23345.399,0.00,0.000000,75.0,-17945.399,0.000000,17945.399,0.000000
2223,https://cce-datasharing.gsfc.nasa.gov,4100.0,21531.400,0.00,0.000000,50.0,-17431.400,0.000000,17431.400,0.000000
...,...,...,...,...,...,...,...,...,...,...
6689,https://extranet.nichd.nih.gov,500.0,503.800,0.00,0.000403,0.0,-3.800,-0.000403,3.800,0.000403
924,https://gravelocator.cem.va.gov,900.0,896.500,0.00,0.000000,75.0,3.500,0.000000,3.500,0.000000
3998,https://meps.ahrq.gov,500.0,502.500,0.00,0.000000,25.0,-2.500,0.000000,2.500,0.000000
8699,https://techpartnerships.noaa.gov,3200.0,3201.000,0.00,0.002884,25.0,-1.000,-0.002884,1.000,0.002884


## Data differences

A positive value for the mean or median difference in the below means that the values reported by the CrUX dataset were higher than the Site Scanning data, while a negative value for those columns indicates that the value reported by the Site Scanning data was higher.  

In [36]:
data = {
    'Measure': ['LCP', 'CLS'],
    'Mean Difference': [merged_df['lcp_difference_crux_vs_scan'].mean(), merged_df['cls_difference_crux_vs_scan'].mean()],
    'Median Difference': [merged_df['lcp_difference_crux_vs_scan'].median(), merged_df['cls_difference_crux_vs_scan'].median()],
    'Standard Deviation': [merged_df['lcp_difference_crux_vs_scan'].std(), merged_df['cls_difference_crux_vs_scan'].std()]
}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,Measure,Mean Difference,Median Difference,Standard Deviation
0,LCP,1356.205682,1189.9,1581.717067
1,CLS,-0.094991,-0.011646,0.279291
