-
-
Notifications
You must be signed in to change notification settings - Fork 84
Integrate Lighthouse results #87
Comments
Collating notes from previous threads and conversations:
In terms of deployment: gathering LH data will require adding another run at the end of the current set, which will increase the overall load on the infrastructure. On the other hand, it sounds like new agents should consume fewer cycles, so hopefully we'll come out ahead. That said, we're running close to the limit on capacity, so we should monitor things closely. @rviscomi @pmeenan anything else we're missing here or need to think about? In terms of timelines:
|
Here is a sample HAR from this test. It should mirror what we'd get since it was run on desktop Linux with mobile emulation (and the new agent). Adding the LH step is a singe additional test option (lighthouse=1) and I can have it applied automatically at the same point where I turn on tracing (2 minutes of work). My proposal:
As far as desktop/mobile restrictions, technically I can run lighthouse on the desktop crawl as well but a lot of the audits are mobile PWA-specific (add to homescreen, address bar color, design being mobile friendly). That said, there may be enough value in collecting the best practices parts for desktop to make it worth running in all cases. |
Looking at the example HAR..
Timeline sounds good. Is there anything we can do upfront to sanity check and validate the bytes numbers? This part sounds pretty meaty on its own and probably deserves a separate bug. |
I was a bit split because at the top-level there aren't really pages either. If I strip out the help text (which I'm considering doing anyway), duplicating the lighthouse JSON across all 3 page load data sets could work. I can make that change when I get back (4/17) and send over a new HAR. As far as throttling:
|
Wait, you mean duplicating same report in each pages entry? That's probably unnecessary. It's a trivial change in our DataFlow pipeline to pull out the top level key and stuff it into pages table (if that's what we decide to go with). I'm OK with current setup, just mentioned it to surface that we'll have to do some work.
So, to confirm: you do want to enable network throttling, but via dummynet? If that's the case, do we trust the results? Previously we said that we shouldn't rely on any CPU/network timing data coming back from our agents.. |
We already enable network throttling (Cable for desktop crawls, 3G for mobile). The Lighthouse testing just inherits whatever the test uses. As far as timing goes, things are better than they used to be with the meachines not as oversubscribed as they used to be but they may still be running hotter than they should for performance measurement. When we added the new hardware and increased the mobile testing I set things up so that the host servers ran ~95% CPU utilized but individual runs may still be starved. Since then, changes to make the agents cycle faster may have pushed things hotter but we are in the neighborhood of being able to measure perf. Early testing shows the new agent on Linux uses significantly less CPU (30-40% less) so the headroom may allow us to both add the lighthouse pass and keep the servers from running too hot to measure perf. |
Funny, I didn't realize that's the case. Given that we'll want to update these settings at some point in the future, should we also log what BW/RTT settings we're using -- similar to what LH reports?
Fingers crossed. :-) |
Yeah, I think so. One less difference between test configs. Is there any separate process for updating LH or is that included in the general WPT update? And is that automatic?
I'm partial to having its own table but don't feel strongly. Whatever is easiest for analysis. |
It will be "automatic" in the sense that it is manual and needs to be turned on but I'll take care of it ;-) |
Curious, why? FWIW, the reason I'm leaning towards same table is precisely because doing cross-table joins is both more complicated and expensive in terms of overall query cost + runtime. I think a strong argument in favor of a separate table could be if we could define a full schema for the results.. but I'm not sure how stable that is. My guess is that we shouldn't. |
The lighthouse schema is decidedly not stable and is expected to change in the next couple of releases. |
In that case, let's stick into |
Some joins could be avoided by duplicating page/test metadata in the lighthouse table. But I agree that overall it's more convoluted. My preference was mostly just for separation so it's clearer where the data came from and not commingling/flattening results. Having it in |
Cool. Let's start with stuffing it into |
Blocked by #98 (in progress) |
If we merge and update the GCE worker soon the 6/1 pipeline can pick up the changes. |
Updated GCE. So we should see LH in the 6/1 HAR tables whenever they're available. |
Dataflow is failing with this error message:
The addition of the Lighthouse data must be spilling us over the row size limit. |
On a related note, if the limit is 10 MB, the request_bodies tables are over-truncating at 2 MB. Filed HTTPArchive/bigquery#13 |
Wowza.. how large are the LH reports? Can we pull out the report that's causing the exception? |
This LH report from mobile.twitter.com is 4.1 MB so 10+ MB wouldn't be so unusual. It looks like I'd suggest pruning |
Hah, disregard that. That file is the entire HAR returned from WPT, not just the LH report. |
So now I'm confused.. The LH report in the Twitter case is ~69KB, right? So, where are the exceptions coming from? |
Here are the actual mobile.twitter.com LH results from the 6/1 crawl. The JSON report is 376 KB. But the report size is very site-dependent. For example, Reddit is 2.8 MB. Those are just a couple of examples from the Alexa Top 10. I sampled some of the largest gzipped HAR files using this command: Given that this is difficult to reproduce, I'll run another test using a similar method to request_bodies truncation. I'll measure the row size before appending and set a bool field if it exceeds the limit. Anecdotally, LH significantly adds to the dataflow processing time. A normal job would take ~45 minutes, but the normal+LH jobs are now running ~150 minutes just to fail. If you want to see the temporary results, check out https://bigquery.cloud.google.com/table/httparchive:har.2017_06_20_android_pages?tab=preview while it lasts. |
So if max report is ~2.8MB.. that should fit in the 10MB row -- what's the issue? Is it 1MB? Looking at reddit report, I'm wondering if a wholesale import is the right approach. For example, it looks like there's some base64'ed screenshots in there, and lots of duplicate helpText? The a11y report, in particular, is massive.. All this is making me think that we might better of splitting this data into a standalone table with a more crafted schema and filter logic on what we import? |
The 2.8 MB Reddit report actually fits in the row without an error (see the temp 6/20 table). The row limit applies to the sum of all columns: page URL (negligibly small), HAR payload, and LH report. So some combination of large HAR payload and large LH report is overflowing the row. The max gzipped HAR file size in GCS is 1 MB, and none of the 10 largest files I manually ran through dataflow exceeded the limit either. I've added some logging to dataflow and am now rerunning the 6/1 mobile results. It's been going for 60 minutes and has skipped 430 LH reports so far. The I still think there is at least one HAR file that triggers the error. I ran tests without the custom LH counter and they also failed around 2.5 hours in. When I get more time (I'll be unavailable until Tuesday starting in a couple of hours) I'll try bisecting the dataset to try to pin down exactly which HAR is responsible. Edit: Yeah it failed again after 2.5 hours.There were 1,112 LH reports skipped. |
Hmm, interesting.. As an experiment, we can remove some unknown variables and split the LH data into a separate table? |
The last thing I did before leaving for my break was kick off the pipeline with the 6/1 mobile results and the original row size quota of 2 MB (rather than 10 MB). Checked on it today and it turns out that run was successful! I kept the I'll be committing my changes to get it in for the 6/15 crawl to be processed. |
Hmm, 12K pages is ~bearable, but that also depends on the rank of those pages -- e.g. if we're dropping amazon then that's a problem. Two routes:
|
They used to embed full-resolution screenshots of the filmstrip and I strip those out but I think they may have snuck in thumbnails of the filmstrip or moved the location. There's nothing in the actual data that should get near 1MB. I'll track down the binary data bloating the files and strip them out of the HAR json before the 7/1 crawl starts. |
Yes it's definitely embedding base64 encoded images in the report. They're hiding under the "screenshot-thumbnails" audit which is in both the Performance report category and the audits array:
|
We brought back embedding screenshots but just 10 thumbnails (max 60px wide) for the visualization in the timeline :) For httparchive I'm not sure it makes sense to save the |
Another contributing factor is UGC like selectors failing the color-contrast or link-name audits. For example, here's a snippet from Reddit's 2.8 MB report:
And that's just one of 300 selectors in the Pruning |
HTTPArchive/bigquery#16 does the pruning. Running the change over the 6/1 data and will report back with the difference in skipped rows. I'll also be looking into Ilya's suggestion to break LH results out into a separate table. |
Dropped rows |
Lighthouse is fully integrated so marking this as closed. |
No description provided.
The text was updated successfully, but these errors were encountered: