Skip to content

Conversation

@max-ostapenko
Copy link
Contributor

Following #42

Matched calculation of origins in category, technologies and versions dictionaries to the logic in adoption metric.
Including only phone/mobile and desktop origins that are present in both crawl and CrUX datasets.

@max-ostapenko max-ostapenko mentioned this pull request Apr 15, 2025
2 tasks
@max-ostapenko max-ostapenko marked this pull request as ready for review April 21, 2025 13:03
@max-ostapenko max-ostapenko requested a review from Copilot April 21, 2025 13:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enforces a strict merge between CrUX and crawl data by aligning the calculation logic for origins across reports. The key changes include:

  • Replacing custom origin computations with direct references to the tech_report_adoption table.
  • Updating the SQL query structure for both versions and technologies reports to use unified filtering and aggregation.
  • Incorporating CrUX data in the categories report via a merged pages CTE, ensuring only relevant pages present in both datasets are considered.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
definitions/output/reports/tech_report_versions.js Replaces manual origin calculation with tech_report_adoption.
definitions/output/reports/tech_report_technologies.js Updates the origins query and UNION ALL logic for technologies.
definitions/output/reports/tech_report_categories.js Introduces CrUX data merge and adjusts join conditions accordingly.
Comments suppressed due to low confidence (3)

definitions/output/reports/tech_report_versions.js:11

  • Ensure that the 'adoption' value from tech_report_adoption fully replicates both version-specific and total origins as done by the original union query.
 adoption AS origins

definitions/output/reports/tech_report_technologies.js:51

  • Verify that filtering with 'WHERE technology = 'ALL'' correctly aggregates total origins as intended, matching previous logic.
technology,

definitions/output/reports/tech_report_categories.js:60

  • Confirm that joining on merged_pages.technologies yields the expected behavior, especially if 'technologies' is stored as an array or complex structure.
INNER JOIN merged_pages.technologies AS tech

@max-ostapenko
Copy link
Contributor Author

@tunetheweb so here is the version without any extensions of CrUX or crawl data, using only direct device matches.
We don't not substitute missing CrUX origins metrics using alternative device measurements.

Comment on lines +60 to 61
INNER JOIN merged_pages.technologies AS tech
INNER JOIN tech.categories AS category
Copy link
Member

@tunetheweb tunetheweb Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be outer joins for pages with no technologies?

Copy link
Contributor Author

@max-ostapenko max-ostapenko Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aggregate by known categories here, so pages without any technologies will be excluded here.

If not here, then in the next steps:

  • INNER JOIN technology_stats ON category_stats.category IN UNNEST(technology_stats.categories)
  • or INNER JOIN category_descriptions USING (category)

For the pages without any technologies (and thus no categories) we have a part after UNION ALL (based on merged_pages).

@max-ostapenko max-ostapenko merged commit 4732f85 into main Apr 22, 2025
27 checks passed
@max-ostapenko max-ostapenko deleted the lexical-toad branch April 22, 2025 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants