-
Notifications
You must be signed in to change notification settings - Fork 4
Strict merge of CrUX and crawl #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enforces a strict merge between CrUX and crawl data by aligning the calculation logic for origins across reports. The key changes include:
- Replacing custom origin computations with direct references to the tech_report_adoption table.
- Updating the SQL query structure for both versions and technologies reports to use unified filtering and aggregation.
- Incorporating CrUX data in the categories report via a merged pages CTE, ensuring only relevant pages present in both datasets are considered.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| definitions/output/reports/tech_report_versions.js | Replaces manual origin calculation with tech_report_adoption. |
| definitions/output/reports/tech_report_technologies.js | Updates the origins query and UNION ALL logic for technologies. |
| definitions/output/reports/tech_report_categories.js | Introduces CrUX data merge and adjusts join conditions accordingly. |
Comments suppressed due to low confidence (3)
definitions/output/reports/tech_report_versions.js:11
- Ensure that the 'adoption' value from tech_report_adoption fully replicates both version-specific and total origins as done by the original union query.
adoption AS origins
definitions/output/reports/tech_report_technologies.js:51
- Verify that filtering with 'WHERE technology = 'ALL'' correctly aggregates total origins as intended, matching previous logic.
technology,
definitions/output/reports/tech_report_categories.js:60
- Confirm that joining on merged_pages.technologies yields the expected behavior, especially if 'technologies' is stored as an array or complex structure.
INNER JOIN merged_pages.technologies AS tech
|
@tunetheweb so here is the version without any extensions of CrUX or crawl data, using only direct device matches. |
| INNER JOIN merged_pages.technologies AS tech | ||
| INNER JOIN tech.categories AS category |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be outer joins for pages with no technologies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aggregate by known categories here, so pages without any technologies will be excluded here.
If not here, then in the next steps:
INNER JOIN technology_stats ON category_stats.category IN UNNEST(technology_stats.categories)- or
INNER JOIN category_descriptions USING (category)
For the pages without any technologies (and thus no categories) we have a part after UNION ALL (based on merged_pages).
Co-authored-by: Barry Pollard <barrypollard@google.com>
Following #42
Matched calculation of origins in
category,technologiesandversionsdictionaries to the logic inadoptionmetric.Including only phone/mobile and desktop origins that are present in both crawl and CrUX datasets.