Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance 2020 queries #1091

Merged
merged 41 commits into from Oct 12, 2020
Merged

Performance 2020 queries #1091

merged 41 commits into from Oct 12, 2020

Conversation

max-ostapenko
Copy link
Contributor

@max-ostapenko max-ostapenko commented Jul 24, 2020

Progress on #905

distribution of LH performance category scores

  • Distribution of Performance Score (slow, moderate, fast) on mobile in LH5
  • Changes to Performance Score from LH5 to LH6
  • Distribution of Performance Score (slow, moderate, fast) on mobile in LH6
  • Average/median changes in performance score between versions
  • LH5 ([?] 2020) vs LH6 (August 2020)

LH audits

  • all scores and weightings per performance category

LCP

  • % of CrUX origins that meet the "good" threshold (75%+ < 2.5s)
  • % of CrUX origins that meet the "NI" threshold (not good or poor)
  • % of CrUX origins that meet the "poor" threshold (25%+ >= 4.0s)
  • distribution by device
  • Segment by device, country and ECT

FID

  • % of CrUX origins that meet the good/NI/poor thresholds
  • distribution by device
  • Segment by device, country and ECT

CLS

  • % of CrUX origins that meet the good/NI/poor thresholds
  • distribution by device
  • Segment by device, country and ECT

FCP

  • % of CrUX origins that meet the fast/moderate/slow thresholds
  • Segment by device, country and ECT
  • distribution by device
  • Compare against 2019

TTFB

  • % of CrUX origins that meet the fast/moderate/slow thresholds
  • Segment by device, country and ECT
  • distribution by device
  • Compare against 2019

Field Data

  • % of website with new PerformanceObserver in JS (source)

@max-ostapenko max-ostapenko mentioned this pull request Jul 24, 2020
65 tasks
@max-ostapenko max-ostapenko mentioned this pull request Jul 25, 2020
20 tasks
@rviscomi rviscomi changed the title Performance sql 2020 Performance 2020 queries Jul 25, 2020
@rviscomi rviscomi added the analysis Querying the dataset label Jul 25, 2020
@rviscomi rviscomi added this to TODO in 2020 via automation Jul 25, 2020
@rviscomi rviscomi added this to the 2020 Analysis milestone Jul 25, 2020
@rviscomi
Copy link
Member

Thanks @max-ostapenko! Could you edit the PR description to include a checklist of the metrics needed by the chapter and check off the ones implemented in this PR so far? This will help us see at a glance how much work is still left to do.

@dooman87
Copy link
Contributor

dooman87 commented Jul 25, 2020

(See #1091 (comment))

@rviscomi
Copy link
Member

Thanks @max-ostapenko and @dooman87. I've moved the checklist up to the top of the PR.

sql/2020/09_Performance/lcp_score_by_month.sql Outdated Show resolved Hide resolved
sql/2020/09_Performance/offline_origins.sql Outdated Show resolved Hide resolved
sql/2020/09_Performance/web_vitals_by_country.sql Outdated Show resolved Hide resolved
sql/2020/09_Performance/web_vitals_by_device.sql Outdated Show resolved Hide resolved
sql/2020/09_Performance/web_vitals_by_device.sql Outdated Show resolved Hide resolved
@thefoxis
Copy link
Contributor

@max-ostapenko @dooman87 the list looks good to me with the exception of changing Distribution of Performance Score (slow, moderate, fast) by device in LH5 to Distribution of Performance Score (slow, moderate, fast) on mobile in LH5 to match the capabilities of HA + the dataset from LH6. 👍🏻

@dooman87
Copy link
Contributor

@thefoxis thanks for reviewing checklist. I update it on the ticket. I'm currently trying to add a query for "Average/median changes in performance score between versions" metric and I struggle with understanding what it should look like. Could you please explain what the result of the query would look like?

We already have two metrics that measures LH5 vs LH6 performance scores:

  • Calculates number of sites where performance score changed either low (<10), medium (> 10 && < 30) or big (> 30).
  • Average change of score between LH5 -> LH6 as of take delta on all URL and calculate average.

* Using last year dataset to show distribution of LH performance score
@rviscomi
Copy link
Member

rviscomi commented Oct 4, 2020

this will calculate percentage of fast, avg, slow websites given the LH performance score. @rviscomi is this what you meant by distribution in the comment above?

Not sure if this was my original comment, but to see the distribution of LH audit scores, I think this approach is straightforward and a bit more versatile; it counts the # and % of pages having each score. So we would query the raw distribution data and sum up those pages in the results sheet so they're grouped in fast/avg/slow buckets as needed, and/or we can chart the full distribution if that'd be interesting.

This one finds minimum, maximum and avg delta between performance score of latest LH5 results and LH6. Correct me if I wrong @thefoxis, we'd like to see how score changed because users of LH were surprised by the sudden change of their score in some cases.

For this query I do think it makes sense to represent the distribution in terms of percentiles. For min and max you can use the 0 and 100th percentiles. The rest can be summarized by the 10, 25, 50 (median), 75, and 90th percentiles. Kind of like this query except we'd be distributing the difference of newScore - oldScore.

this one shows percents of websites where there was small, medium and avg change in performance score when they switched from LH5 and LH6. This one will also use the latest LH5 and September data for LH6 with the same reason as above.

SGTM

Let me know if you'd like help with any of these queries.

@rviscomi
Copy link
Member

rviscomi commented Oct 4, 2020

Also note that the lighthouse.2020_09_01_mobile data is now available.

@dooman87
Copy link
Contributor

dooman87 commented Oct 4, 2020

Thanks @rviscomi, it's all has perfect sense to me now. I updated queries as you explained above. Please, have a look and if it's all good we could start filling in spreadsheets

Copy link
Member

@rviscomi rviscomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone, we're very close. I just have some final feedback then we should be good to go.

In addition to these suggestions I'd also like to see one more query based on @bazzadp's PWA audits query that counts the percent of non-zero scores for each LH audit in the category:

#standardSQL
# Get summary of all lighthouse scores for a category
# Note scores, weightings, groups and descriptions may be off in mixed months when new versions of Lighthouse roles out

CREATE TEMPORARY FUNCTION getAudits(report STRING, category STRING)
RETURNS ARRAY<STRUCT<id STRING, weight INT64, audit_group STRING, title STRING, description STRING, score INT64>> LANGUAGE js AS '''
var $ = JSON.parse(report);
var auditrefs = $.categories[category].auditRefs;
var audits = $.audits;
$ = null;
var results = [];
for (auditref of auditrefs) {
  results.push({
    id: auditref.id,
    weight: auditref.weight,
    audit_group: auditref.group,
    description: audits[auditref.id].description,
    score: audits[auditref.id].score
  });
}
return results;
''';

SELECT
  audits.id AS id,
  COUNTIF(audits.score > 0) AS num_pages,
  COUNT(0) AS total,
  COUNTIF(audits.score > 0) / COUNT(0) AS pct,
  APPROX_QUANTILES(audits.weight, 100)[OFFSET(50)] AS median_weight,
  MAX(audits.audit_group) AS audit_group,
  MAX(audits.description) AS description
FROM
  `httparchive.lighthouse.2020_08_01_mobile`,
  UNNEST(getAudits(report, "performance")) AS audits
WHERE
  LENGTH(report) < 20000000  # necessary to avoid out of memory issues. Excludes 16 very large results
GROUP BY
  audits.id
ORDER BY
  median_weight DESC,
  id

sql/2020/09_Performance/median_lcp_score_by_month.sql Outdated Show resolved Hide resolved
sql/2020/09_Performance/median_lcp_score_by_month.sql Outdated Show resolved Hide resolved
sql/2020/09_Performance/offline_origins.sql Outdated Show resolved Hide resolved
sql/2020/09_Performance/web_vitals_by_country.sql Outdated Show resolved Hide resolved
sql/2020/09_Performance/web_vitals_by_ect.sql Outdated Show resolved Hide resolved
sql/2020/09_Performance/web_vitals_by_ect.sql Outdated Show resolved Hide resolved
@thefoxis
Copy link
Contributor

thefoxis commented Oct 7, 2020

@rviscomi thanks for stepping in, your explanations/knowledge about retrieving the metrics is definitely more advanced than mine. everything you said makes sense and should give us the right data to comment on 👍

@dooman87 thanks so much for your work! glad that @rviscomi’s explanations make sense 🙌

@rviscomi
Copy link
Member

rviscomi commented Oct 8, 2020

A few outstanding review comments, otherwise this is ready to merge and we can unblock the first draft. @dooman87 @max-ostapenko would either of you be able to resolve the last of the feedback before the weekend?

max-ostapenko and others added 5 commits October 10, 2020 03:26
Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>
Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>
Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>
Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>
@max-ostapenko max-ostapenko requested review from rviscomi and a team October 10, 2020 11:37
Copy link
Contributor

@dooman87 dooman87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-ostapenko Thanks a lot for fixing my stuff. I've just managed to find some time to looks through changes. It all looks good to me, so let's get them in!

@max-ostapenko
Copy link
Contributor Author

@dooman87 Yeah, I'm adding data into the charts now. Looking forward to seeing how the results look with new data.

Copy link
Member

@rviscomi rviscomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

@rviscomi rviscomi merged commit 66b014d into main Oct 12, 2020
2020 automation moved this from In progress to Done Oct 12, 2020
@rviscomi rviscomi deleted the performance-sql-2020 branch October 12, 2020 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset
Projects
No open projects
2020
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants