Triage all proposed metrics (396 of 396 done) #33

rviscomi · 2019-06-04T02:35:02Z

Assigned: @HTTPArchive/data-analysts team

Due date: No later than July 1

Any metrics that require augmenting the test infrastructure (eg custom metrics) must be ready to go when the July crawl starts. This ensures that when the crawl completes at the end of July, we can query the dataset and pass it off to authors for interpretation in August.

As of now there are 350+ metrics spread over 20 chapters.

Part	Chapter	Able To Query	Not Feasible	Grand Total
I	01. JavaScript	24	1	25
I	02. CSS	39	7	46
I	03. Markup	4	1	5
I	04. Media	20	5	25
I	05. Third Parties	13		13
I	06. Fonts	40	7	47
II	07. Performance	24		24
II	08. Security	36	5	41
II	09. Accessibility	32	6	38
II	10. SEO	15		15
II	11. PWA	6		6
II	12. Mobile web	19	2	21
III	13. Ecommerce	10	3	13
III	14. CMS	11	1	12
IV	15. Compression	3	1	4
IV	16. Caching	14	1	15
IV	17. CDN	13	3	16
IV	18. Page Weight	3		3
IV	19. Resource Hints	10		10
IV	20. HTTP/2	14	3	17
	Grand Total	350	46	396

I've copied all of the metrics for each chapter to this sheet (named "Metrics Triage"). To edit the sheet please give me your email address to add to the editors list. What we need to do is go through the list of metrics for each chapter and assign a status from one of the following:

To Be Reviewed
Need More Info
Not Feasible
Able To Query
Custom Metric Required
Custom Metric Written
Query Written

The lifecycle is:

All metrics start as TBR
- Move to NMI if the metric is vaguely worded or otherwise unclear what is being asked for. Get in touch with the chapter author(s) and straighten out what the expected data should look like.
- Move to NF if the metric cannot be queried using the HTTP Archive dataset or other publicly available datasets on BigQuery (eg CrUX). This is the "done" state for metrics which cannot progress any further.
- Move to ATQ if the metric is able to be queried from the dataset based on the latest schema
  - Move to QW if the metric has a corresponding query written. This is the ideal "done" state for all metrics.
- Move to CMR if the metric can only be queried with the addition of a custom metric
  - Move to CMW if the metric has had a corresponding custom metric written. Metrics in this state must also have a corresponding query written and moved to QW when complete.

Custom metrics should only be added as a last resort and must adhere to strict performance requirements. We test on millions of pages so any complex/slow scripts would impede the crawl. Because we anticipate needing many custom metrics, we'll implement everything as individual functions within a single custom metric whose output is a JSON-encoded object with each result as its own sub-property. More on this when we get there.

Add your name in the Analyst column to take responsibility for moving it through the metric lifecycle.

Once we're ready to begin writing queries, we will create a thread on https://discuss.httparchive.org for each chapter, listing all queryable metrics. Hopefully we can crowdsource some of the querying by tapping into the power users on the forum.

The text was updated successfully, but these errors were encountered:

tjmonsi · 2019-06-04T13:24:31Z

Copy will do this

…

On Tue, Jun 4, 2019, 11:05 Rick Viscomi ***@***.***> wrote: Assigned #33 <#33> to @tjmonsi <https://github.com/tjmonsi>. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#33?email_source=notifications&email_token=AAUF5VV6YYNDFWW4VGWNSDDPYXLVVA5CNFSM4HSXT6X2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGORZBXJOI#event-2386785465>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAUF5VUMYMD5VSPLUTFXMNDPYXLVVANCNFSM4HSXT6XQ> .

rviscomi · 2019-06-06T18:18:21Z

@HTTPArchive/data-analysts reminder to please go through the Metrics Triage sheet when you have the time.

There was a lot of info in the first post so here's a condensed version:

Request edit access to the sheet. I don't have everyone's email address otherwise I'd give access now.
Go through the Metrics Triage tab and add your GitHub name to the Analyst column for any metrics you'll be responsible for.
Triage metrics marked To Be Reviewed and change their status depending on their feasibility.

The next step will be to start writing queries and custom metrics using the HTTP Archive forum to discuss solutions.

ymschaap · 2019-06-14T10:39:53Z

I understand we can create custom metrics for the next crawl. Which is really cool. I'm just unsure what this enables. For example for the SEO Chapter we would want to count the number of h1, h2, h3 elements and their string length. How would I go and create a custom metric? Do you have an example of a custom metric (e.g. piece of code)? Are there docs? Who tests and writes the code?

Once I understand the custom metrics capabilities, I could fill out the Metrics Triage sheet.

rviscomi · 2019-06-14T21:39:18Z

Good question! Custom metrics are JS snippets you can execute on each page. They are run by our legacy crawl system and the code for existing metrics is here: https://github.com/HTTPArchive/legacy.httparchive.org/tree/master/custom_metrics

For example, see the doctype custom metric. To test it, you can run it directly on webpagetest.org under the "Custom" tab:

Note that all WPT custom metrics must have [metricName] at the start of the script. This is excluded in the HTTP Archive code and generated automatically based on the file name.

You'll see the output in the WPT results:

For complex metrics like almanac.js you will need to inspect the JSON results directly to see the output. The test ID for the results is in the URL. Simply append ?f=json to see the JSON results. For example: http://webpagetest.org/result/190624_6W_f5211bdf38d897fb4cb5a4f0872eb1f6/?f=json

Then you can find the custom metric by going to data.median.firstView.almanac:

Let me know if you have any other questions!

patrickhulce · 2019-06-17T21:55:55Z

Sorry if I missed this somewhere, but do we need to do something extra to get the right permissions to query the sample datasets created in #34 and/or have our test queries not billed to us individually? :)

rviscomi · 2019-06-17T22:36:57Z

I've updated the permissions of the sample_data dataset so anyone can query it.

The goal for that dataset is to allow @HTTPArchive/data-analysts to explore the schema and validate their queries. The table sizes should be small enough so any queries fit comfortably within the free monthly quota. When we run the analysis against the full dataset, I hope to have BQ credits for everyone to cover any expenses.

rviscomi · 2019-06-19T13:15:17Z

@HTTPArchive/data-analysts we're behind on triaging all of the metrics so I think we need to take a different approach. There are 350 metrics and 12 analysts, so that's an average of 30 metrics per analyst. If we divide and conquer that way, we should be able to meet the July 1 deadline. I'll go through the triage sheet and assign each analyst to approximately 30 metrics each grouped by chapter. I'll update this issue with a table of the assignments.

I've updated the sheet with Analyst assignments and updated the summary table with each analyst's total metric status.

@khempenius and @patrickhulce since you're both authors and expressed interest only in taking on analyst roles for your respective chapters, I didn't add you to any new chapters. @fhoffa I coaxed you into this so I didn't give you too many metrics to work on. Let me know if any of you are willing to take on more metrics, it'd be a big help.

@beouss you expressed an interest in joining the team but never accepted your invitation. If you're still interested I'll assign you some metrics.

rviscomi · 2019-06-30T13:34:44Z

Today's the day! I've marked all 5 remaining Need More Info metrics as Not Feasible. We're finally done with the triage! Thanks again to the entire @HTTPArchive/data-analysts team for your hard work going through these ~400 metrics.

I'll be syncing the custom metrics with the HTTP Archive server today so they're included in tomorrow's July crawl.

rviscomi added this to the Infrastructure prepared milestone Jun 4, 2019

rviscomi assigned rviscomi, patrickhulce, paulcalvano, ymschaap, tjmonsi and voltek62 Jun 4, 2019

rviscomi mentioned this issue Jun 4, 2019

Finalize assignments: Chapter 18. Page weight #20

Closed

3 tasks

rviscomi added the analysis Querying the dataset label Jun 4, 2019

rviscomi mentioned this issue Jun 5, 2019

Form a team of data analysts #23

Closed

rviscomi assigned dougsillars, dotjs and khempenius Jun 5, 2019

rviscomi added this to TODO in Web Almanac 2019 via automation Jun 5, 2019

rviscomi moved this from TODO to In Progress in Web Almanac 2019 Jun 5, 2019

rviscomi added the ASAP This issue is blocking progress label Jun 11, 2019

rviscomi changed the title ~~Triage all proposed metrics~~ Triage all proposed metrics (42 of 281 done) Jun 11, 2019

rviscomi changed the title ~~Triage all proposed metrics (42 of 281 done)~~ Triage all proposed metrics (64 of 318 done) Jun 17, 2019

rviscomi mentioned this issue Jun 17, 2019

Review all chapter metrics by analysts #27

Closed

rviscomi changed the title ~~Triage all proposed metrics (64 of 318 done)~~ Triage all proposed metrics (88 of 350 done) Jun 19, 2019

rviscomi changed the title ~~Triage all proposed metrics (88 of 350 done)~~ Triage all proposed metrics (85 of 372 done) Jun 19, 2019

rviscomi assigned raghuramakrishnan71 and jrharalson Jun 19, 2019

raghuramakrishnan71 mentioned this issue Jun 30, 2019

Finalize assignments: Chapter 3. Markup #5

Closed

3 tasks

rviscomi changed the title ~~Triage all proposed metrics (388 of 396 done)~~ Triage all proposed metrics (390 of 396 done) Jun 30, 2019

rviscomi mentioned this issue Jun 30, 2019

earlyHash custom metric HTTPArchive/legacy.httparchive.org#166

Merged

rviscomi changed the title ~~Triage all proposed metrics (390 of 396 done)~~ Triage all proposed metrics (396 of 396 done) Jun 30, 2019

rviscomi closed this as completed Jun 30, 2019

Web Almanac 2019 automation moved this from In Progress to Done Jun 30, 2019

rviscomi unpinned this issue Jun 30, 2019

rviscomi removed the ASAP This issue is blocking progress label Sep 25, 2019

Tiggerito mentioned this issue Jul 20, 2020

SEO 2020 #908

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triage all proposed metrics (396 of 396 done) #33

Triage all proposed metrics (396 of 396 done) #33

rviscomi commented Jun 4, 2019 •

edited

tjmonsi commented Jun 4, 2019 via email

rviscomi commented Jun 6, 2019

ymschaap commented Jun 14, 2019 •

edited

rviscomi commented Jun 14, 2019 •

edited

patrickhulce commented Jun 17, 2019

rviscomi commented Jun 17, 2019

rviscomi commented Jun 19, 2019 •

edited

rviscomi commented Jun 30, 2019

Triage all proposed metrics (396 of 396 done) #33

Triage all proposed metrics (396 of 396 done) #33

Comments

rviscomi commented Jun 4, 2019 • edited

Due date: No later than July 1

tjmonsi commented Jun 4, 2019 via email

rviscomi commented Jun 6, 2019

ymschaap commented Jun 14, 2019 • edited

rviscomi commented Jun 14, 2019 • edited

patrickhulce commented Jun 17, 2019

rviscomi commented Jun 17, 2019

rviscomi commented Jun 19, 2019 • edited

rviscomi commented Jun 30, 2019

rviscomi commented Jun 4, 2019 •

edited

ymschaap commented Jun 14, 2019 •

edited

rviscomi commented Jun 14, 2019 •

edited

rviscomi commented Jun 19, 2019 •

edited