Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(DBW): add audit for detecting unoptimized images #1452

Merged
merged 7 commits into from
Jan 13, 2017

Conversation

patrickhulce
Copy link
Collaborator

Addresses one of several concerns around image bloat and another step towards PSI parity (#909).

Gathers all sameorigin or data URI images off of the network and runs them through canvas to determine a stripped down JPEG and WebP size.

Audit will fail if one of the conditions are met:

  • There is at least one JPEG or bitmap image that was larger than canvas encoded JPEG.
  • There is at least one image that would have saved more than 50KB by using WebP.
  • The savings of moving all images to WebP is greater than 100KB.

Addresses one of several concerns around image bloat and another step towards PSI parity (#909).

Gathers all sameorigin or data URI images off of the network and runs them through canvas to determine a stripped down JPEG and WebP size.

Audit will fail if one of the conditions are met:
* There is at least one JPEG or bitmap image that was larger than canvas encoded JPEG.
* There is at least one image that would have saved more than 50KB by using WebP.
* The savings of moving all images to WebP is greater than 100KB.
@patrickhulce
Copy link
Collaborator Author

Some open questions:

  • What's the right number for both of the byte cutoffs?
  • What's the right number for the quality to use when rendering with canvas?
  • Should we expand to support crossorigin in a future PR?
  • Should we still report all of the WebP savings when it is small (current behavior)?

function getTypeStats(type, quality) {
const dataURI = canvas.toDataURL(type, quality);
const base64 = dataURI.slice(dataURI.indexOf(',') + 1);
return {base64: base64.length, binary: atob(base64).length};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of atob you can use canvasElem.toBlob and then just check the blob.size

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to avoid atob? toBlob is async no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 that looks async to me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering if there's a benefit I'm missing. Double nested callback just seems to add some awkwardness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry. You're totally right. They changed it after years in the making and I refused to believe it :) https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement/toBlob

You could do a promise wrapper.

Generally, data urls add about 33% overhead to the filesize. If a page has a lot of images that might eat up some memory but I'm not sure it will be a huge issue. A benefit of passing a blob around would be that the caller can read it in many formats (binary string, data url, etc) using the FileReader API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha no worries :) Yeah I'm actually purposefully calculating the data URI size here (though I could check first to see if it wasn't a data URI to begin with and skip it) so that I can compare it under the assumption if you're shipping an image as a data URI you'd likely send the optimized version as one as well. I suppose avoiding large data URI's should really be an audit by itself too though 😄

@patrickhulce
Copy link
Collaborator Author

Another open question is how to best present the information in the report. Here's what it looks like right now with JPEG savings being shown only if the image was JPEG or bitmap to begin with and could save by recompressing at q=0.8

image

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good. Initial review pass

@@ -102,6 +102,10 @@
</style>
</template>

<template id="unoptimized-images-tmpl">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you going to do anything with this in the expectations file? Also, it is a very big addition to this file :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I check-in the image separately instead? And yes, haha, done :)

*/

/**
* @fileoverview Determines optimized jpeg/webp filesizes for all same-origin and dataURI images.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same origin makes sense (those are the images you have control over), but warning developers that a cross origin image is hurting them is also helpful. What does PSI do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would also be useful if this had a description of how this is calculated, especially as we aren't claiming this is 100% foolproof, just an ongoing best effort. Could put the description on top of or in getOptimizedNumBytes, but might be more apparent here at the top.

Uses a canvas in the target browser to re-encode each image at 80% etc etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking we should go x-origin for this one. Images are such a big, wasteful part of apps. 60% or whatever. And a lot of people use images hosted off a CDN. That will be x-origin.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my main concern was not processing CDN images at all which is most people who make some effort. The issue I ran into was canvas not dealing with them and adding crossorigin='anonymous' didn't necessarily fix. Fine to investigate in future PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, crossorigin='anonymous'. File a bug so we don't forget!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! #1467

/* global document */

/* istanbul ignore next */
function getOptimizedNumBytes(url) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should indicate in comment description that this runs in the browser

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

class OptimizedImages extends Gatherer {
/**
* @param {string} pageUrl
* @param {NetworkRecords} networkRecords
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{!NetworkRecords}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* @param {string} pageUrl
* @param {NetworkRecords} networkRecords
* @return {!Array.<!{url: string, isBase64DataUri: boolean, mimeType: string, resourceSize: number}>}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, structural types aren't default nullable (and you don't need the .), so you can do @return {!Array<{url: string...}>} here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh cool, done

}

/**
* @param {Object} driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{!Object}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


/**
* @param {Object} driver
* @param {!{url: string, isBase64DataUri: boolean, resourceSize: number}} networkRecord
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can drop the !

/**
* @param {Object} driver
* @param {!{url: string, isBase64DataUri: boolean, resourceSize: number}} networkRecord
* @return {!Promise.<!{originalSize: number, jpegSize: number, webpSize: number}>}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can drop the second ! and the .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The following images could have smaller file sizes when compressed with
[WebP](https://developers.google.com/speed/webp/) or JPEG at 80 quality.
[Learn more about image optimization](https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/image-optimization).
`.trim(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still has newlines that only get ignored in HTML. We should probably use the same style of single string per line with + concat as in other audits anyways

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah :/ where is the helpText used outside HTML?

* @return {{bytes: number, kb: number, percent: number}}
*/
static computeSavings(image, type) {
const bytes = image.originalSize - image[type + 'Size'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there ever a risk for an increase?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but there are checks in the audit logic to see if this is a positive number before reporting

* @return {string}
*/
static getUrl(image) {
return image.isBase64DataUri ? image.url.slice(0, 80) + '...' : image.url;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use the brand new URL.getDisplayName to do the pretty name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"unused-css-rules": {
"expectedValue": false
}
"unused-css-rules": {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first brave soul to omit expectedValue :)

*/

/**
* @fileoverview Determines optimized jpeg/webp filesizes for all same-origin and dataURI images.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking we should go x-origin for this one. Images are such a big, wasteful part of apps. 60% or whatever. And a lot of people use images hosted off a CDN. That will be x-origin.

canvas.width = img.width;
context.drawImage(img, 0, 0);

const jpeg = getTypeStats('image/jpeg', 0.8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought for @WeiweiAtGit. Might be interesting to be able to adjust the quality value and re-run with perf-x. Maybe make 0.8 a property that can be configured?

let debugString;
if (failedImages.length) {
const urls = failedImages.map(image => UsesOptimizedImages.getUrl(image));
debugString = `Lighthouse was unable to parse some of your images: ${urls.join(', ')}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is failed requests for images? I may say something other than "parse".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's usually canvas failing because the image was bad/not something we could read. Decode probably makes more sense here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or "Lighthouse was unable to determine if some of your images could be optimized:.."

}

return UsesOptimizedImages.generateAuditResult({
displayValue, debugString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we newline just for clarity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return {base64: base64.length, binary: atob(base64).length};
}

document.body.appendChild(canvas);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't used canvas in a while. It needs to be in the DOM for this to work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no that was me testing that it didn't, haha I'll remove

return driver.evaluateAsync(script).then(stats => {
const isBase64DataUri = networkRecord.isBase64DataUri;
return {
originalSize: isBase64DataUri ? networkRecord.url.length : networkRecord.resourceSize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's a data url, should we use the full length (data:..;base64,....) as you have) or just the file content part?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good catch

return networkRecords.reduce((prev, record) => {
const isOptimizableImage = /image\/(png|bmp|jpeg)/.test(record._mimeType);
const isSameOrigin = URL.hostsMatch(pageUrl, record._url);
const isBase64DataUri = /^data.{2,40}base64/.test(record._url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more specific: /^data:.{2,40};base64,/

Should include record._mimeType in the regex?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, are there cases where a URL could begin with ^data: and not be a data URI? don't want to make it brittle/less readable with a bunch of whitespace matchers if it doesn't matter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of.

@ebidel
Copy link
Contributor

ebidel commented Jan 13, 2017

What's the right number for the quality to use when rendering with canvas?

IIRC, 0.8 is the default for both encodings when you use tools. So 👍 to what you have.

Should we expand to support crossorigin in a future PR?

As mentioned, I think it would be more useful to include crossorigin images now.

Should we still report all of the WebP savings when it is small (current behavior)?

An improvement is an improvement, but we can see how noisy it gets. Mapping total bytes transferred to a RUM like FMP time might be good in the future. That way, it's not just "you can save x bytes", but "users will see your site Xms faster".

return {base64: base64.length, binary: atob(base64).length};
}

img.addEventListener('load', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value in using a img.addEventListener('error'.. to catch failed img requests? Is there something for failed image requests that I'm not seeing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the underlying assumption here was that because we're picking which images to check based on the network requests and mimetype it would already be cached and 🤞 guaranteed to load, but I'll add an explicit error check

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2
📣📏🖼

@brendankenny brendankenny merged commit d49244e into master Jan 13, 2017
@brendankenny brendankenny deleted the optimized-images branch January 13, 2017 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants