-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Country-level origin counts should always increase by rank #13
Comments
Are you sure this is the SQL used as I can't get this bit to run: cwv-tech-report/sql/monthly.sql Lines 50 to 63 in 3e64d0b
The date is missing from the second UNION clause. |
Yes that query was used. It no longer works because the responsiveness metric has since been added to the materialized CrUX tables. So to get it working you need to add |
This is interesting: SELECT device, rank, COUNT(0) AS count
FROM `chrome-ux-report.materialized.country_summary`
WHERE yyyymm = 202202 AND
country_code = 'us'
GROUP BY device, rank
ORDER BY device, rank
Last number looks wrong in all of them |
Unless the materialized tables are already accounting for it, the rank on the raw data is exclusive of other rank groups (each page belongs to a single rank) so the numbers are exclusive of the other ranks. |
Sorry, at least they are in the HA data. I assume the same is true for the CrUX data where each origin belongs to a single rank. |
Ah yes that's true. Should have used this SQL and that works: SELECT device, _rank, COUNT(0) AS count, COUNT(DISTINCT origin) as origins
FROM `chrome-ux-report.materialized.country_summary`,
UNNEST([1000, 10000, 100000, 1000000, 10000000, 100000000]) AS _rank
WHERE yyyymm = 202202 AND
country_code = 'us' AND
rank <= _rank
GROUP BY device, _rank
ORDER BY device, _rank Weird that the 10m bucket (which we know is not complete and doesn't include 1m as you say so is the 1.00001m - 8m bucket approx.) has less US sites that 1m bucket. But not impossible I guess. |
Ok I'm closer to the answer. WITH tech AS (
SELECT DISTINCT
url
FROM
`httparchive.technologies.2022_02_01_mobile`
WHERE
app = 'React'
), crux AS (
SELECT
CONCAT(origin, '/') AS url,
rank
FROM
`chrome-ux-report.materialized.country_summary`
WHERE
yyyymm = 202202 AND
country_code = 'us' AND
device = 'phone'
)
SELECT
rank,
COUNT(0) AS origins
FROM
tech
JOIN
crux
USING
(url)
GROUP BY
rank
ORDER BY
origins
Similar to @tunetheweb's query in #13 (comment), this query intentionally does not nest smaller ranks under larger ones (ie 1k is not a subset of 10k). These results align perfectly with the React stats in #13 (comment). So the bug is that the ranks are not inclusive of smaller ranks. The expected behavior is: WITH tech AS (
SELECT DISTINCT
url
FROM
`httparchive.technologies.2022_02_01_mobile`
WHERE
app = 'React'
), crux AS (
SELECT
CONCAT(origin, '/') AS url,
rank
FROM
`chrome-ux-report.materialized.country_summary`
WHERE
yyyymm = 202202 AND
country_code = 'us' AND
device = 'phone'
)
SELECT
_rank,
COUNT(0) AS origins
FROM
tech
JOIN
crux
USING
(url),
UNNEST([1000, 10000, 100000, 1000000, 10000000, 100000000]) AS _rank
WHERE
rank <= _rank
GROUP BY
_rank
ORDER BY
origins
I think there's a bug in my query that selects/groups by cwv-tech-report/sql/monthly.sql Lines 64 to 92 in 3e64d0b
I'll update the all/monthly queries and regenerate the data to fix the issue in the dataset/dashboard. |
FYI, this is why I used |
Misremembered. It was |
Thank you for uncovering this! |
Updated the dashboard. I think it's working now. SELECT
rank, origins
FROM
`httparchive.core_web_vitals.technologies`
WHERE
date = '2022-02-01' AND
geo = 'United States of America' AND
app = 'React' AND
client = 'mobile'
ORDER BY
origins
Note that I also made the ranks more human readable and changed the biggest 100M rank to "ALL" for consistency with other fields. (We're close to exceeding 10M origins, so I figured 100M was more forward-compatible) One more thing: I was able to recover the data prior to March 2021 when the rank field was added to CrUX. So if you want historical data going as far back as January 2020, you'll need to select the "ALL" rank. |
@tunetheweb and @shappir pointed out that the number of origins for a given technology and country are not necessarily increasing by rank.
According to the results, there are 83k React websites in the US among the top 1M. However, in the top 10M segment, there are only 45k websites. This doesn't make sense because every website in the top 1M should also be in the top 10M.
The way this table is generated should be counting every website in the more popular ranks among the lesser popular ranks:
cwv-tech-report/sql/monthly.sql
Lines 86 to 92 in 3e64d0b
Something is clearly not working.
The text was updated successfully, but these errors were encountered: