core(fr): add snapshot support to ImageElements gatherer #12663

adamraine · 2021-06-15T17:12:37Z

This is what splitting ImageElements to add support for snapshot mode will look like.

Most of ImageElements has been moved to ImageElementsSnapshot which makes the diff hard to read. I couldn't extend ImageElementsSnapshot from ImageElements because GathererMeta is not assignable to GathererMeta<'DevtoolsLog'>.

If you want to inspect the changes made after the code was moved, you can use this command:

git diff master:./lighthouse-core/gather/gatherers/image-elements.js fr-image-elements-snapshot:./lighthouse-core/gather/gatherers/image-elements-snapshot.js

I'll also do my best to highlight the differences in comments.

adamraine · 2021-06-15T17:13:50Z

lighthouse-core/gather/gatherers/image-elements-snapshot.js

+      if (element.isPicture || element.isCss || element.srcset) {
+        if (indexedNetworkRecords && !networkRecord) continue;
+        await this.fetchElementWithSizeInformation(driver, element);
+      }


This logic has been changed slightly. The image record will be fetched if we run in snapshot mode.

adamraine · 2021-06-15T17:15:49Z

lighthouse-core/gather/gatherers/image-elements-snapshot.js

+      session.sendCommand('CSS.enable'),
+      session.sendCommand('DOM.getDocument', {depth: -1, pierce: true}),
+    ]);
+


Element sorting was removed for convenience of making this draft. Will need to restore it somehow before landing.

patrickhulce

Incredible! This is a great conversation starter about how we should approach splitting gatherers 👍

I see a few generic options:

Fork / Inherit Gatherers (this PR). Return roughly the same artifact in multiple cases but under a different artifact ID to disambiguate which version can be used in different modes.
Optional Gatherer Dependencies. Similar to this approach in that the type of the artifact will be slightly different depending on mode, but contain the implementation in the same gatherer by allowing certain dependencies to be optional.
Split Artifact Information. Instead of forking / inheriting, we could split off the other information entirely. It's always been a bit weird that we duplicate the network record information onto these image elements, and it seems reasonable to have a separate artifact entirely for that data (that could be merged with ImageElements via computed artifact for ease of use). If the information isn't already optional, this would be a breaking change, but worth considering.
Change information / functionality. Always worth considering if the value being added is worth it. For example, in this case it might be worth just trying to fetch natural size information regardless of gather mode. We're far more sophisticated now with our time budgets than we were when those restrictions were put into place.
Some combination of the above. A blend of splitting / changing might fork for some artifacts and some might just need a fork.

In this particular instance, I think I'm leaning slightly toward the split + change approach. WDYT?

adamraine · 2021-06-15T18:52:43Z

In this particular instance, I think I'm leaning slightly toward the split + change approach. WDYT?

For ImageElements to work in snapshot mode, I needed to remove the dependency from DevtoolsLog/NetworkRecords. The network records are necessary for two things:

Getting the natural dimensions: The network record isn't really used here, the existence of the network record is used to determine if creating a new image is worthwhile. Additionally, some image elements will include valid natural dimensions without needing to fetch them separately. This isn't really a new source of information IMO.
Getting the mime type: This could be fully extracted to a separate gatherer, but I don't think we need a dedicated ImageMimeType gatherer.

For these reasons, I want to avoid splitting this into two sets of information. I think having ImageElements and ImageElementsSnapshot is more clear about how this single set of information should be used in different modes. I would also be interested in removing the dependency on network records entirely and having a single artifact.

patrickhulce · 2021-06-15T20:05:45Z

I would also be interested in removing the dependency on network records entirely and having a single artifact.

Awesome 👍 that's basically what I was going for.

The network record isn't really used here, the existence of the network record is used to determine if creating a new image is worthwhile. Additionally, some image elements will include valid natural dimensions without needing to fetch them separately. This isn't really a new source of information IMO.

Right on. That's why I think it might be OK if we don't depend on network records at all to fetch natural size :) Mainly because... "We're far more sophisticated now with our time budgets than we were when those restrictions were put into place."

Getting the mime type: This could be fully extracted to a separate gatherer, but I don't think we need a dedicated ImageMimeType gatherer.

Split was a little unclear of me, sorry it's really just removal :) I'm saying this information already exists in another artifact, the DevToolsLogs and it doesn't need to be duplicated. It was optional to begin with so maybe this is OK without a breaking change? Though it's definitely fudging a bit.

adamraine · 2021-06-15T21:44:49Z

If we are going to fetch the images a second time, can we look for the network requests that are created when we load them?

lighthouse/lighthouse-core/gather/gatherers/image-elements.js

Lines 159 to 172 in 8cd6821

    
           function determineNaturalSize(url) { 
        
             return new Promise((resolve, reject) => { 
        
               const img = new Image(); 
        
               img.addEventListener('error', _ => reject(new Error('determineNaturalSize failed img load'))); 
        
               img.addEventListener('load', () => { 
        
                 resolve({ 
        
                   naturalWidth: img.naturalWidth, 
        
                   naturalHeight: img.naturalHeight, 
        
                 }); 
        
               }); 
        
               img.src = url; 
        
             }); 
        
           }

Will this produce a network event that we can use to get the mime type?

Edit: I was unable to get this to work after 20 mins of trying.

patrickhulce · 2021-06-15T22:20:20Z

If we are going to fetch the images a second time, can we look for the network requests that are created when we load them?

The goal would be that most of these never actually trigger a new request because it can use the image from the memory cache.

Will this produce a network event that we can use to get the mime type?
Edit: I was unable to get this to work after 20 mins of trying.

That's a good early sign they're being reused :D

patrickhulce · 2021-06-15T22:25:22Z

Possibly bad idea: make a best effort guess at the mime type based on extension? There are certainly lots of reports where folks server a webp under a png extension or have extensionless images, but it should cover the common case for snapshots where we're just trying to tell if this thing is a vector to ignore it.

We would of course preserve the network record access validation when the information is available.

adamraine · 2021-06-16T17:10:07Z

The goal would be that most of these never actually trigger a new request because it can use the image from the memory cache.

I did look for Network.requestServedFromCache events but there were none.

It was optional to begin with so maybe this is OK without a breaking change? Though it's definitely fudging a bit.

I think I'm ready to remove the network records entirely, but this point is still a bit concerning. @paulirish @brendankenny @connorjclark any thoughts?

patrickhulce · 2021-06-16T17:18:04Z

I think I'm ready to remove the network records entirely, but this point is still a bit concerning.

WDYT about my fallback idea with the extension for plugins?

I'm somewhat skeptical that anyone would really notice :)

patrickhulce · 2021-06-16T17:27:10Z

I did look for Network.requestServedFromCache events but there were none.

Yeah you can't see any evidence of a cached network request in DevTools either. My guess is that it's related to the differences between image elements and other requests. It would be kinda weird if you had 100 copies of an <img> to see 100 network requests in DevTools that all say "from cache". I wouldn't really expect Chromium to try to generate fake network requests just for debugging. An image element duplicate should ideally be reusing much more than just a network request (the decoded image data from compositor workers too, etc), so "memory cache" is a bit vague here, more like "some cache somewhere that exists in memory" as opposed to "memory cache for network requests"?

You might have more luck getting a network request for very offscreen images that aren't decoded? Not sure if you were already trying that.

adamraine · 2021-06-16T17:29:39Z

WDYT about my fallback idea with the extension for plugins?

I'm willing to try it, but if it's just for identifying vector images, should we instead add an isVector property to the artifact that is guessed by file extensions if mime type is unavailable?

patrickhulce · 2021-06-16T18:11:20Z

if it's just for identifying vector images

That's definitely not its only purpose, that was just one example I could think of immediately of how it is used for snapshots where we don't have the higher fidelity network record option at all. The concern we face right now isn't with our own audits, but with potential plugins where I think the extension-based version + optionality would like solve 99% of the use cases (if there even are any out there).

connorjclark · 2021-06-16T18:29:12Z

Removing mime type from the gatherer (even tho a minor breaking change) SGTM. can just use network records in audits to look up mimetype.

too bad we can't know the image type in an image artifact... (await fetch(img.src, {method: 'HEAD', mode: 'cors'})).headers.get('Content-Type') nearly works, but would have issues w/ cross origin images.

also, woah:

fetch Response sets a header for data urls!

patrickhulce · 2021-06-17T16:31:59Z

@adamraine do you feel comfortable with the removal path forward here? Or should we discuss at some point today to unblock you?

adamraine · 2021-06-17T16:34:08Z

do you feel comfortable with the removal path forward here? Or should we discuss at some point today to unblock you?

I'm good with the removal, working on it now. Sorry, should have posted a SGTM :)

adamraine · 2021-06-17T20:43:08Z

lighthouse-core/computed/image-records.js

+    imageRecords.sort((a, b) => {
+      const aRecord = indexedNetworkRecords[a.src] || {};
+      const bRecord = indexedNetworkRecords[b.src] || {};
+      return bRecord.resourceSize - aRecord.resourceSize;
+    });


We can't do this sort in the gatherer anymore? This means the order of the results may be different in some audits.

sort by naturalSize if we have it, display size next could get us most of the way there? goal was to try and spend our gather budget on the most impactful images first. I don't really think the sort here matters all that much.

lighthouse-core/gather/gatherers/image-elements.js

lighthouse-core/lib/url-shim.js

image-elements-pptr.js

adamraine · 2021-06-18T22:26:15Z

lighthouse-core/computed/image-records.js

+    return networkRecords.reduce((map, record) => {
+      // An image response in newer formats is sometimes incorrectly marked as "application/octet-stream",
+      // so respect the extension too.
+      const isImage = /^image/.test(record.mimeType) || /\.(avif|webp)$/i.test(record.url);


Do we need this anymore?

I would say until proven otherwise, yes. AVIF is still unknown enough that I suspect many older servers will serve it as application/octet-stream for quite some time.

patrickhulce

thanks for being open to such a large change in direction @adamraine ! :D

patrickhulce · 2021-06-21T15:47:57Z

lighthouse-core/computed/image-records.js

+    return networkRecords.reduce((map, record) => {
+      // An image response in newer formats is sometimes incorrectly marked as "application/octet-stream",
+      // so respect the extension too.
+      const isImage = /^image/.test(record.mimeType) || /\.(avif|webp)$/i.test(record.url);


I would say until proven otherwise, yes. AVIF is still unknown enough that I suspect many older servers will serve it as application/octet-stream for quite some time.

lighthouse-core/gather/gatherers/image-elements.js

lighthouse-core/lib/url-shim.js

lighthouse-core/test/lib/url-shim-test.js

patrickhulce · 2021-06-21T15:57:44Z

lighthouse-core/test/lib/url-shim-test.js

+    it('uses mime type from data URI', () => {
+      expect(URL.guessMimeType('data:image/png;DATA')).toEqual('image/png');
+      expect(URL.guessMimeType('data:image/jpeg;DATA')).toEqual('image/jpeg');
+      expect(URL.guessMimeType('data:image/jpg;DATA')).toEqual('image/jpg');


this is an interesting one, should we use any image/ MIME type even if it's not valid? I'm fine narrowing scope and doing whatever you think is more straightforward @adamraine

I'm thinking get rid of image/jpg support, I'd rather surface undefined for unsupported mime types.

WIP: Split ImageElements gatherer for snapshot mode

9ddbfe6

google-cla bot added the cla: yes label Jun 15, 2021

adamraine commented Jun 15, 2021

View reviewed changes

adamraine requested a review from patrickhulce June 15, 2021 17:16

devtools-bot assigned patrickhulce Jun 15, 2021

devtools-bot added the waiting4reviewer label Jun 15, 2021

patrickhulce reviewed Jun 15, 2021

View reviewed changes

patrickhulce added waiting4committer and removed waiting4reviewer labels Jun 16, 2021

rm snapshot

ee7fc9f

add image record computed

e9eced1

vercel bot deployed to Preview June 17, 2021 17:11 View deployment

update tests

5aeb503

vercel bot deployed to Preview June 17, 2021 20:40 View deployment

adamraine commented Jun 17, 2021

View reviewed changes

adamraine changed the title ~~WIP: Split ImageElements gatherer for snapshot mode~~ core(fr): add snapshot support to ImageElements gatherer Jun 17, 2021

adamraine marked this pull request as ready for review June 17, 2021 20:47

tests

a241066

vercel bot deployed to Preview June 18, 2021 20:22 View deployment

connorjclark reviewed Jun 18, 2021

View reviewed changes

lighthouse-core/gather/gatherers/image-elements.js Outdated Show resolved Hide resolved

connorjclark reviewed Jun 18, 2021

View reviewed changes

lighthouse-core/lib/url-shim.js Show resolved Hide resolved

connorjclark reviewed Jun 18, 2021

View reviewed changes

lighthouse-core/lib/url-shim.js Outdated Show resolved Hide resolved

adamraine added 2 commits June 18, 2021 17:38

fix

f3671b3

dt

779d005

vercel bot deployed to Preview June 18, 2021 21:45 View deployment

adamraine added 3 commits June 18, 2021 18:03

rduce

d52aa1d

pixel area

cd2978a

fix url ext

004e7d5

vercel bot deployed to Preview June 18, 2021 22:15 View deployment

connorjclark reviewed Jun 18, 2021

View reviewed changes

lighthouse-core/lib/url-shim.js Outdated Show resolved Hide resolved

connorjclark approved these changes Jun 18, 2021

View reviewed changes

connorjclark reviewed Jun 18, 2021

View reviewed changes

image-elements-pptr.js Outdated Show resolved Hide resolved

cleanup

ca75ddc

adamraine commented Jun 18, 2021

View reviewed changes

vercel bot deployed to Preview June 18, 2021 22:26 View deployment

move sort

317c14f

vercel bot deployed to Preview June 21, 2021 15:58 View deployment

patrickhulce approved these changes Jun 21, 2021

View reviewed changes

pr comments

5b9bf3e

vercel bot deployed to Preview June 21, 2021 16:12 View deployment

rm image/jpg support

26e9c04

vercel bot deployed to Preview June 21, 2021 16:21 View deployment

adamraine merged commit 1ef36e8 into master Jun 21, 2021

adamraine deleted the fr-image-elements-snapshot branch June 21, 2021 16:37

adamraine mentioned this pull request Jun 28, 2021

☂️ Breaking changes for v9 #12614

Closed

15 tasks

adamraine mentioned this pull request Jul 9, 2021

WIP: add snapshot support to ScriptElements #12770

Closed

adamraine mentioned this pull request Oct 26, 2021

core(image-elements): remove mimeType from artifact #13265

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core(fr): add snapshot support to ImageElements gatherer #12663

core(fr): add snapshot support to ImageElements gatherer #12663

adamraine commented Jun 15, 2021

adamraine Jun 15, 2021

adamraine Jun 15, 2021

patrickhulce left a comment

adamraine commented Jun 15, 2021

patrickhulce commented Jun 15, 2021

adamraine commented Jun 15, 2021 •

edited

Loading

patrickhulce commented Jun 15, 2021

patrickhulce commented Jun 15, 2021 •

edited

Loading

adamraine commented Jun 16, 2021

patrickhulce commented Jun 16, 2021

patrickhulce commented Jun 16, 2021 •

edited

Loading

adamraine commented Jun 16, 2021

patrickhulce commented Jun 16, 2021

connorjclark commented Jun 16, 2021 •

edited

Loading

patrickhulce commented Jun 17, 2021

adamraine commented Jun 17, 2021

adamraine Jun 17, 2021

patrickhulce Jun 17, 2021

adamraine Jun 18, 2021

patrickhulce Jun 21, 2021

patrickhulce left a comment

patrickhulce Jun 21, 2021

patrickhulce Jun 21, 2021

adamraine Jun 21, 2021

core(fr): add snapshot support to ImageElements gatherer #12663

core(fr): add snapshot support to ImageElements gatherer #12663

Conversation

adamraine commented Jun 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickhulce left a comment

Choose a reason for hiding this comment

adamraine commented Jun 15, 2021

patrickhulce commented Jun 15, 2021

adamraine commented Jun 15, 2021 • edited Loading

patrickhulce commented Jun 15, 2021

patrickhulce commented Jun 15, 2021 • edited Loading

adamraine commented Jun 16, 2021

patrickhulce commented Jun 16, 2021

patrickhulce commented Jun 16, 2021 • edited Loading

adamraine commented Jun 16, 2021

patrickhulce commented Jun 16, 2021

connorjclark commented Jun 16, 2021 • edited Loading

patrickhulce commented Jun 17, 2021

adamraine commented Jun 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickhulce left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamraine commented Jun 15, 2021 •

edited

Loading

patrickhulce commented Jun 15, 2021 •

edited

Loading

patrickhulce commented Jun 16, 2021 •

edited

Loading

connorjclark commented Jun 16, 2021 •

edited

Loading