Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(fr): add snapshot support to ImageElements gatherer #12663

Merged
merged 19 commits into from
Jun 21, 2021

Conversation

adamraine
Copy link
Member

This is what splitting ImageElements to add support for snapshot mode will look like.

Most of ImageElements has been moved to ImageElementsSnapshot which makes the diff hard to read. I couldn't extend ImageElementsSnapshot from ImageElements because GathererMeta is not assignable to GathererMeta<'DevtoolsLog'>.

If you want to inspect the changes made after the code was moved, you can use this command:

git diff master:./lighthouse-core/gather/gatherers/image-elements.js fr-image-elements-snapshot:./lighthouse-core/gather/gatherers/image-elements-snapshot.js

I'll also do my best to highlight the differences in comments.

@google-cla google-cla bot added the cla: yes label Jun 15, 2021
Comment on lines 318 to 321
if (element.isPicture || element.isCss || element.srcset) {
if (indexedNetworkRecords && !networkRecord) continue;
await this.fetchElementWithSizeInformation(driver, element);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic has been changed slightly. The image record will be fetched if we run in snapshot mode.

session.sendCommand('CSS.enable'),
session.sendCommand('DOM.getDocument', {depth: -1, pierce: true}),
]);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Element sorting was removed for convenience of making this draft. Will need to restore it somehow before landing.

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible! This is a great conversation starter about how we should approach splitting gatherers 👍

I see a few generic options:

  • Fork / Inherit Gatherers (this PR). Return roughly the same artifact in multiple cases but under a different artifact ID to disambiguate which version can be used in different modes.
  • Optional Gatherer Dependencies. Similar to this approach in that the type of the artifact will be slightly different depending on mode, but contain the implementation in the same gatherer by allowing certain dependencies to be optional.
  • Split Artifact Information. Instead of forking / inheriting, we could split off the other information entirely. It's always been a bit weird that we duplicate the network record information onto these image elements, and it seems reasonable to have a separate artifact entirely for that data (that could be merged with ImageElements via computed artifact for ease of use). If the information isn't already optional, this would be a breaking change, but worth considering.
  • Change information / functionality. Always worth considering if the value being added is worth it. For example, in this case it might be worth just trying to fetch natural size information regardless of gather mode. We're far more sophisticated now with our time budgets than we were when those restrictions were put into place.
  • Some combination of the above. A blend of splitting / changing might fork for some artifacts and some might just need a fork.

In this particular instance, I think I'm leaning slightly toward the split + change approach. WDYT?

@adamraine
Copy link
Member Author

In this particular instance, I think I'm leaning slightly toward the split + change approach. WDYT?

For ImageElements to work in snapshot mode, I needed to remove the dependency from DevtoolsLog/NetworkRecords. The network records are necessary for two things:

  • Getting the natural dimensions: The network record isn't really used here, the existence of the network record is used to determine if creating a new image is worthwhile. Additionally, some image elements will include valid natural dimensions without needing to fetch them separately. This isn't really a new source of information IMO.
  • Getting the mime type: This could be fully extracted to a separate gatherer, but I don't think we need a dedicated ImageMimeType gatherer.

For these reasons, I want to avoid splitting this into two sets of information. I think having ImageElements and ImageElementsSnapshot is more clear about how this single set of information should be used in different modes. I would also be interested in removing the dependency on network records entirely and having a single artifact.

@patrickhulce
Copy link
Collaborator

I would also be interested in removing the dependency on network records entirely and having a single artifact.

Awesome 👍 that's basically what I was going for.

The network record isn't really used here, the existence of the network record is used to determine if creating a new image is worthwhile. Additionally, some image elements will include valid natural dimensions without needing to fetch them separately. This isn't really a new source of information IMO.

Right on. That's why I think it might be OK if we don't depend on network records at all to fetch natural size :) Mainly because... "We're far more sophisticated now with our time budgets than we were when those restrictions were put into place."

Getting the mime type: This could be fully extracted to a separate gatherer, but I don't think we need a dedicated ImageMimeType gatherer.

Split was a little unclear of me, sorry it's really just removal :) I'm saying this information already exists in another artifact, the DevToolsLogs and it doesn't need to be duplicated. It was optional to begin with so maybe this is OK without a breaking change? Though it's definitely fudging a bit.

@adamraine
Copy link
Member Author

adamraine commented Jun 15, 2021

If we are going to fetch the images a second time, can we look for the network requests that are created when we load them?

function determineNaturalSize(url) {
return new Promise((resolve, reject) => {
const img = new Image();
img.addEventListener('error', _ => reject(new Error('determineNaturalSize failed img load')));
img.addEventListener('load', () => {
resolve({
naturalWidth: img.naturalWidth,
naturalHeight: img.naturalHeight,
});
});
img.src = url;
});
}

Will this produce a network event that we can use to get the mime type?

Edit: I was unable to get this to work after 20 mins of trying.

@patrickhulce
Copy link
Collaborator

If we are going to fetch the images a second time, can we look for the network requests that are created when we load them?

The goal would be that most of these never actually trigger a new request because it can use the image from the memory cache.

Will this produce a network event that we can use to get the mime type?
Edit: I was unable to get this to work after 20 mins of trying.

That's a good early sign they're being reused :D

@patrickhulce
Copy link
Collaborator

patrickhulce commented Jun 15, 2021

Possibly bad idea: make a best effort guess at the mime type based on extension? There are certainly lots of reports where folks server a webp under a png extension or have extensionless images, but it should cover the common case for snapshots where we're just trying to tell if this thing is a vector to ignore it.

We would of course preserve the network record access validation when the information is available.

@adamraine
Copy link
Member Author

The goal would be that most of these never actually trigger a new request because it can use the image from the memory cache.

I did look for Network.requestServedFromCache events but there were none.

It was optional to begin with so maybe this is OK without a breaking change? Though it's definitely fudging a bit.

I think I'm ready to remove the network records entirely, but this point is still a bit concerning. @paulirish @brendankenny @connorjclark any thoughts?

@patrickhulce
Copy link
Collaborator

I think I'm ready to remove the network records entirely, but this point is still a bit concerning.

WDYT about my fallback idea with the extension for plugins?

I'm somewhat skeptical that anyone would really notice :)

@patrickhulce
Copy link
Collaborator

patrickhulce commented Jun 16, 2021

I did look for Network.requestServedFromCache events but there were none.

Yeah you can't see any evidence of a cached network request in DevTools either. My guess is that it's related to the differences between image elements and other requests. It would be kinda weird if you had 100 copies of an <img> to see 100 network requests in DevTools that all say "from cache". I wouldn't really expect Chromium to try to generate fake network requests just for debugging. An image element duplicate should ideally be reusing much more than just a network request (the decoded image data from compositor workers too, etc), so "memory cache" is a bit vague here, more like "some cache somewhere that exists in memory" as opposed to "memory cache for network requests"?

You might have more luck getting a network request for very offscreen images that aren't decoded? Not sure if you were already trying that.

@adamraine
Copy link
Member Author

WDYT about my fallback idea with the extension for plugins?

I'm willing to try it, but if it's just for identifying vector images, should we instead add an isVector property to the artifact that is guessed by file extensions if mime type is unavailable?

@patrickhulce
Copy link
Collaborator

if it's just for identifying vector images

That's definitely not its only purpose, that was just one example I could think of immediately of how it is used for snapshots where we don't have the higher fidelity network record option at all. The concern we face right now isn't with our own audits, but with potential plugins where I think the extension-based version + optionality would like solve 99% of the use cases (if there even are any out there).

@connorjclark
Copy link
Collaborator

connorjclark commented Jun 16, 2021

Removing mime type from the gatherer (even tho a minor breaking change) SGTM. can just use network records in audits to look up mimetype.

too bad we can't know the image type in an image artifact... (await fetch(img.src, {method: 'HEAD', mode: 'cors'})).headers.get('Content-Type') nearly works, but would have issues w/ cross origin images.

also, woah:

image

fetch Response sets a header for data urls!

@patrickhulce
Copy link
Collaborator

@adamraine do you feel comfortable with the removal path forward here? Or should we discuss at some point today to unblock you?

@adamraine
Copy link
Member Author

do you feel comfortable with the removal path forward here? Or should we discuss at some point today to unblock you?

I'm good with the removal, working on it now. Sorry, should have posted a SGTM :)

Comment on lines +48 to +52
imageRecords.sort((a, b) => {
const aRecord = indexedNetworkRecords[a.src] || {};
const bRecord = indexedNetworkRecords[b.src] || {};
return bRecord.resourceSize - aRecord.resourceSize;
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this sort in the gatherer anymore? This means the order of the results may be different in some audits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort by naturalSize if we have it, display size next could get us most of the way there? goal was to try and spend our gather budget on the most impactful images first. I don't really think the sort here matters all that much.

@adamraine adamraine changed the title WIP: Split ImageElements gatherer for snapshot mode core(fr): add snapshot support to ImageElements gatherer Jun 17, 2021
@adamraine adamraine marked this pull request as ready for review June 17, 2021 20:47
return networkRecords.reduce((map, record) => {
// An image response in newer formats is sometimes incorrectly marked as "application/octet-stream",
// so respect the extension too.
const isImage = /^image/.test(record.mimeType) || /\.(avif|webp)$/i.test(record.url);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this anymore?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say until proven otherwise, yes. AVIF is still unknown enough that I suspect many older servers will serve it as application/octet-stream for quite some time.

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for being open to such a large change in direction @adamraine ! :D

return networkRecords.reduce((map, record) => {
// An image response in newer formats is sometimes incorrectly marked as "application/octet-stream",
// so respect the extension too.
const isImage = /^image/.test(record.mimeType) || /\.(avif|webp)$/i.test(record.url);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say until proven otherwise, yes. AVIF is still unknown enough that I suspect many older servers will serve it as application/octet-stream for quite some time.

lighthouse-core/gather/gatherers/image-elements.js Outdated Show resolved Hide resolved
lighthouse-core/lib/url-shim.js Show resolved Hide resolved
lighthouse-core/lib/url-shim.js Outdated Show resolved Hide resolved
lighthouse-core/test/lib/url-shim-test.js Outdated Show resolved Hide resolved
it('uses mime type from data URI', () => {
expect(URL.guessMimeType('data:image/png;DATA')).toEqual('image/png');
expect(URL.guessMimeType('data:image/jpeg;DATA')).toEqual('image/jpeg');
expect(URL.guessMimeType('data:image/jpg;DATA')).toEqual('image/jpg');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an interesting one, should we use any image/ MIME type even if it's not valid? I'm fine narrowing scope and doing whatever you think is more straightforward @adamraine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking get rid of image/jpg support, I'd rather surface undefined for unsupported mime types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants