Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine LCP to take progressive loading into account #71

Open
jonsneyers opened this issue Jan 4, 2021 · 27 comments
Open

Refine LCP to take progressive loading into account #71

jonsneyers opened this issue Jan 4, 2021 · 27 comments

Comments

@jonsneyers
Copy link

This is an attempt to summarize the discussion in #68 into a single, simple proposal. The thread over there didn't get any new activity in a while - it's also getting too long to comfortably navigate, imo.

The idea is as follows: instead of recording the time at which the final image is shown, the LCP is computed in a somewhat different way. There are three phases:

  • LCP-Placeholder: any blurry preview, e.g. a blurhash or a 1:32 resolution image, or a WebP2 triangular preview, etc. This phase is useful to make it clear that an image is going to appear here, to avoid layout jank, and to prevent flash.
  • LCP-Preview: has to satisfy some minimum resolution/quality, e.g. JPEG DC is OK (since it is basically lossless 1:8), a lossy 1:4 preview might also work. This phase is useful to show progress, and to have a 'semantic' preview that conveys the basic idea of the image (e.g. you can see there are two people and a car in the image, even though you cannot yet see the details in their face or identify the brand of the car).
  • LCP-GoodEnough: the image is "good enough" to consider it 'loaded' for the purpose of interacting with the page – say, a PSNR of 33 dB w.r.t. the final image.

The LCP time would then be defined as a function of these three phases:
LCP = max(LCP-GoodEnough minus 500ms, LCP-Preview minus 200ms, LCP-Placeholder).

It is too complicated to actually keep track of all paint actions and check when exactly the criteria are met. Instead, for every image (sub)format, a statistical analysis can be done on some sufficiently large and representative corpus, and then the criterion for each phase can be simplified to "a paint action happened when at least X % of the total image size was available". For the computation of this truncation percentage, only the payload data is considered (not metadata like XMP or large ICC profiles).

For example, this could lead to a table like this:

Image type LCP-Placeholder LCP-Preview LCP-GoodEnough
AVIF 100% 100% 100%
WebP 80% 90% 95%
sequential JPEG 80% 90% 95%
progressive JPEG 15% 15% 50%

Embedded previews and blurhash (or similar approaches) can override the LCP-Placeholder and/or LCP-Preview time to a more favorable one:

Non-inherent placeholder/preview type LCP-Placeholder LCP-Preview LCP-GoodEnough
blurhash (when the blurhash is rendered, probably very early)
embedded very small preview image (when preview is shown)
embedded reasonably large (say at least 1:6 resolution) preview image (same as LCP-Preview) (when preview is shown)
@DamonHD
Copy link

DamonHD commented Jan 13, 2021

Does setting the (mean) colour as the background with trivial CSS count as 'placeholder'? B^>

Eg:

<img src="..." style="background:#996">

Because that is very fast and low bandwidth and seems to work nicely with decoding="async". Ask me how I know!

Rgds

Damon

@csswizardry
Copy link

Fascinating issue on a current client site where we reduced their masthead image weight by over 20% by switching it to WebP. It’s now rendering almost 2× later because WebP doesn’t offer progressive rendering like the previous JPG did:

image
@csswizardry

I’m here to lend a voice of support. I’m having more and more discussions with real clients about this topic. Generally it goes:

  • Lighthouse told them to use WebP
  • They implement WebP and get better Lighthouse scores
  • UX is actually worse due to non-progressive nature
  • I recommend reverting back
  • I’m met with resistance because ‘Lighthouse said…’

I don’t blame my clients at all in these scenarios, but I 100% think more consideration should be given to progressive formats.

I lack the sheer technical knowledge that folks like @jonsneyers have around this topic, but please do let me know if I can provide anything at all to help.

@jyrkialakuijala
Copy link

A bunch of us met up early January to talk this through.

Instead of doing a utility-based or holistic value rating approach to LCP that could include the whole user experience from blurhashes to backround colors to thumbnails to actual progression, we are currently considering measuring LCP when the image can be perceived as final. The arguments about a 200-500 ms preattentive processing phase starting earlier with progressive images were not convincing to all members of the meeting and it is now clear that this path would require more research. Overall sentiment from the meeting included diverse views: "perceived as final", "barely perceptible difference is a bit too strong requirement", "when it is usable", "it is about content, not aesthetics", and we decided to go ahead for now with the model of perceived as final and we try to understand what would that mean to LCP and to the ecosystem.

Our consensus decision was to conduct more experiments. We are planning to experiment with a JPEG specific strategy where we trigger LCP even when the last bit of higher AC components is not yet sent. This approach is partially inspired by YUV420 where 75 % of the high chromacity frequencies are never sent, but images still are considered final, or by one manufacturer's approach of cutting the lower triangle of highest AC coefficients altogether, while maintaining good image quality. Cutting only one bit in these areas should be less aggressive than is already in common use for final images. Moritz volunteered to conduct such experiments.

We believe the possible additional on-client computation load is easily manageable. We noted that we can compute the LCP trigger point in O(1) from the scan script during the decoding, although it may require some coding effort because the scan script is scattered within the image. https://lib.rs/crates/cloudflare-soos may turn out to be helpful.

One of us proposed that we may need to exclude the lowest quality images from the progressive LCP trigger where those images would further degrade below the "perceived as final" quality. It was proposed that low quality thresholding can be decided by summing the quantization coefficients like done in ImageMagick. However, no consensus was reached, and more experimentation is needed.

We do not yet take a position on if the new format specific remove-one-bit-from-higher-AC-components or the previously presented format agnostic corpora-PSNR-guided-byte-fraction heuristic is going to give more stable results. Once available, we will review the new results carefully and discuss further action at that stage.

We discussed surfacing earlier points in the image's rendering (e.g. blurhash, reaching 8x8 subresolution, etc) may be done as a separate effort (e.g. as an issue on Element Timing or HTML).

We discussed the subject of out-of-viewport image parts and concluded that it's orthogonal (but would be nice-to-have for LCP to handle it correctly).

We concluded that Moritz will run another analysis based on the above, to help us determine reasonable thresholds for significant bits and "good enough" images.

@mo271
Copy link

mo271 commented Feb 4, 2021

I'm experimenting with a large set of progressive real world images in order to find good criteria for the point when the image is perceived as final. Currently not having the last bit for the higher AC coefficients looks quite promising, but more viewing of actual images is required.

@mo271
Copy link

mo271 commented Feb 22, 2021

I'm experimenting with a large set of progressive real world images in order to find good criteria for the point when the image is perceived as final. Currently not having the last bit for the higher AC coefficients looks quite promising, but more viewing of actual images is required.

Still working on it.

@yoavweiss
Copy link
Contributor

Thanks for the update! :)

@mo271
Copy link

mo271 commented Mar 5, 2021

I did experiments with a large numbers of random images taken from the dataset provided by @anniesullie. I tried different criteria to find a reasonable threshold for "good enough" and I would like to present a random sample of 100 images where the following criterion is used:

We take the the image as soon as the first scan where we have for all channels

  1. all the bits of the DC
  2. all bits but the last bit for the AC.

For example when the scan script is:

        0,1,2: 0-0,   0, 1 ;
        0:     1-5,   0, 2 ;
        2:     1-63,  0, 1 ;
        1:     1-63,  0, 1 ;
        0:     6-63,  0, 2 ;
        0:     1-63,  2, 1 ;
        0,1,2: 0-0,   1, 0 ;
        #  HERE
        2:     1-63,  1, 0 ;
        1:     1-63,  1, 0 ;
        0:     1-63,  1, 0 ;

it will trigger where I wrote HERE. However this criterion is applied to all the incoming scan scripts and it might only trigger just before the last scan in a different scan script.

Here are the images (partial image fulfilling the criterion on the left, complete image on the right):
https://gist.github.com/mo271/8700cb377eeb587807b7632cd4fe85cb
(In order to compare the images at 100% size, open two in two different tabs and then flicker between them)

  • How applicable is this? About half of the progressive images in the dataset have a scan script where this is triggered before the end of the last scan.
  • How much of the image is loaded on average for those images where it is triggered? About 63% mean (62% median) of the bytes.

On the spectrum "perceived as final", "barely perceptible difference is a bit too strong requirement", "when it is usable", "it is about content, not aesthetics" for almost all images the criterion will give a perceived as final, especially when during a short preattentive processing phase.

When making a criterion with stronger requirements (e.g. also all bits for the first 5 AC for either the luma channel or the chroma channels), then they become less applicable, there are less files with a scan script that triggers and the savings when it triggers are also smaller. With these criteria we would definitely be in the "perceived as final" range of the spectrum.
When there's the last bit from the DC also missing, or more than one bit bits from the AC, artifacts become visible and we are not anymore in the "perceived as final" range.

Hence I believe the criterion described above is the sweet spot here.

@yoavweiss
Copy link
Contributor

Thanks @mo271! That looks extremely promising!!

Would it make sense to also try slightly weaker requirements (e.g. "all DC bits and all AC bits but the last 2"), and see if that gives us even larger savings and at what visual cost?

@jyrkialakuijala
Copy link

Thank you Yoav and Moritz!

Losing two bits from AC will create more savings. However, it will no longer maintain 'perceived as final' look. If we define the LCP in a reasonable and clear way, the community will adapt and soon more progressive JPEGs will conform to LCP's requirement -- we don't necessarily need to compromise with quality here.

I propose we move forward with Moritz's example scan script and trigger LCP before the last bit of AC coefficients.

@mo271
Copy link

mo271 commented Mar 8, 2021

Thanks @mo271! That looks extremely promising!!

Would it make sense to also try slightly weaker requirements (e.g. "all DC bits and all AC bits but the last 2"), and see if that gives us even larger savings and at what visual cost?

That would definitely make sense, @yoavweiss! The particular requirement you mention "all DC bits and all AC bits but the last 2" actually would probably work (or even better if the last 2 bits is only applied to AC coefficients 6 and up), but there just aren't many scan scripts where this would trigger differently than the criterion I describe above i.e. "all DC bits and all AC bits but the last 1", let's call that criterion 0. Hence for this particular requirement we would get very similar results in practice and I ran the same image dataset with that requirement to confirm that: for the 100 images, we get trigger on the exact same image, and when we trigger it is almost always at the same step in the scan script. On average, 61% (mean and median) of the bytes of the image are loaded in this case).
Same goes for the requirement "All bits but the last one, for both AC and DC"

We can take you takes your suggestion further and consider the following
criterion 1:

  1. all the bits but the last bit of the DC
  2. all bits but the last 2 bits for the AC

This triggers about the same amount of time, but when it triggers we have more savings: it is already triggering after having on average 45% (mean) 44% (median) of the bytes have arrived.
Inspecting the visual quality for those reveals that there are visible artifacts when viewing at 100% and flickering. As @jyrkialakuijala it is indeed not longer "perceived as final", at least not for all images. Some happen to have scan scripts aor have content where it still works.
Here are sample images for criterion 1:
https://gist.github.com/mo271/0f891b8bcf184cf9d704111ee2c6669a
(Just to avoid confusion: I don't propose to use criterion 1, this was just made in response to @yoavweiss's comment. I'm still in favor of triggering according to criterion 0, demo images here:
https://gist.github.com/mo271/8700cb377eeb587807b7632cd4fe85cb )

@npm1
Copy link
Collaborator

npm1 commented Mar 8, 2021

Thanks for the investigation! Do these checks make sense for other progressive image formats besides JPEG? If not, how would we translate them into other formats?

@jonsneyers
Copy link
Author

Thanks for the investigation! Do these checks make sense for other progressive image formats besides JPEG? If not, how would we translate them into other formats?

For JPEG XL, it would at least make sense for recompressed JPEGs, which could be given an equivalent scan script in JPEG XL. For the general case of VarDCT, I think the straightforward generalization of "criterion 0" would probably just work: all the bits of the LF (the 1:8 'DC' image), all but the least significant bit of the HF ('AC'). It would make sense to do a visual comparison to verify this, of course.

For JPEG 2000, I think we need some input from Safari / Core Image devs. I don't know if JPEG 2000 images are even rendered progressively at all at the moment. The format is not really very widely used on the web. Most likely a similar criterion can be defined but for DWT coefficients instead of DCT coefficients. I don't think it's worth investing a lot of time in this for now though.

As for other progressive formats: there's Adam7-interlaced PNG, which is not really something I would recommend for the web, but I suppose we could take a look at images with the final interlacing scan missing. I don't think it would be good enough though (every odd row is missing, which will be quite noticeable). The same for interlaced GIF still-images (but that's something I would recommend even less).

I'm not aware of any other progressive formats in use on the web. WebP and AVIF do not have progressive modes.

@npm1
Copy link
Collaborator

npm1 commented Mar 16, 2021

What about interlaced PNGs? Edit: oh you did mention interlaced PNG is not something you would recommend for the web. Why is that? I thought PNGs are fairly common too

@jonsneyers
Copy link
Author

Interlaced PNG tends to be ~20% larger than non-interlaced PNG, which is a big price to pay to get a not-so-great form of progressive decoding (it's only a nearest-neighbor subsampling you get, not a real downscale or lower-precision preview). That penalty can be even bigger for images where PNG actually makes sense on the web, which is non-photographic images.
If you compare that with JPEG, where progressive JPEG tends to be slightly smaller than non-progressive JPEG...

Also, many PNG images are relatively small images, like logos or UI elements, where progressive decode does not really make sense (those images are also unlikely to be the LCP).

The final scan of an Adam7 is exactly half of the image data (all the odd rows), which is probably too much missing information for the kind of images where PNG makes sense (images containing text, diagrams, plots, comics, logos, icons) to consider it "perceived as final".

@jyrkialakuijala
Copy link

While Core Web Vitals and LCP is becoming impactful and helping websites to become faster, this issue is still remaining and likely creating regressions in user experience.

@jonsneyers
Copy link
Author

I agree, this issue remains open and it is now alas causing LCP to harm user experience by incentivizing people to 1) not care about progressive or non-progressive encoding, 2) use non-progressive formats that are denser than progressive JPEG but actually (appear to) load slower.

Say the same image can be encoded as a 300 KB progressive JPEG, a 250 KB WebP or a 200 KB AVIF.
Assuming decode speed is not an issue, you could have the following LCP times: 1000 ms for the JPEG, 850 ms for the WebP, 700 ms for the AVIF. So obviously if you want to optimize for LCP, you need to switch to AVIF, or at least to WebP.

However, it could also be the case in this example that in terms of user experience, JPEG is vastly superior. Looking at the three alternatives in parallel, it could be e.g. as follows:

  • After 200 ms, you already see a blurry preview of the JPEG (the upsampled DC), while WebP shows a thin strip at the top and AVIF shows nothing.
  • After 600 ms, the JPEG might be hard to distinguish from the final image (say one of the last progressive scans was just shown), while the WebP shows the top half of the image, and AVIF shows nothing.

People are now trying to trick the LCP by using high-resolution but blurry placeholders (see artsy/force#8209) which effectively means they're sending something similar to the DC information twice (once redundantly to get a good LCP score, once again in the actual image). Basically they're manually and cumbersomely implementing a poor man's version of progressive JPEG.

If people are doing that already anyway, then we can just as well define LCP in such a way that they don't need to do the high-res blurry placeholder trick but they can get a good LCP score by using a progressive JPEG (or other progressive format).

To keep things simple: I would propose to define LCP to happen when

  • a paint (but not necessarily the final paint) happens of the (LCP) image, such that either
    • the paint covers the whole area and all color channels of the image, in 'enough' detail — close enough to the final image to be 'usable', say PSNR between paint and final is above 25 or something like that (to give it a format-agnostic definition), which e.g. in the case of progressive JPEG could correspond to ~60% of the bytes loaded, and can be checked by looking at the progressive scan script and knowing what scan was just painted (no need to actually keep preview images around and compute PSNR, which would be a bit of a problem); or
    • the paint covers at least 90% of the area and has final detail (e.g. sequential JPEG, PNG or WebP)

To avoid cheating, I think the LCP definition should also be changed to not allow an image (or any other element) that gets removed/turned invisible afterwards (i.e. a temporary placeholder) to be the LCP. Otherwise you can have just a solid-single-color huge image as a placeholder, which will compress to almost nothing in any codec, and get a splendid LCP time, but it's not really a great user experience.

@DamonHD
Copy link

DamonHD commented Sep 17, 2021

you can have just a solid-single-color huge image as a placeholder

A style=background:#rgb attribute on an img tag can do that already, so ~22 bytes before compression. I do it. I don't think that is taken as LCP with the current Google tools for example.

But I've said this already - I'll get back in my box!

Rgds

Damon

@yoavweiss
Copy link
Contributor

Hey @jyrkialakuijala and @jonsneyers!
Apologies for the silence on this thread, but please don't mistake it for inactivity on that front :)

I think that at least for progressive JPEGs, we can figure out a reasonable heuristic for "good enough" image quality (thanks to work @mo271 did on this thread, which I followed-up on in https://github.com/WICG/largest-contentful-paint/tree/progressive_jpeg_benchmark/progressive_jpeg_benchmark)

When trying to prioritize that work, a blocker question that came up was "do we have evidence that progressive image loading creates a better experience?".
Unfortunately, AFAICT, the web performance community has many opinions on this question, but very little data. I'm currently working with partners to try and get such data, that will hopefully prove (or disprove) the benefits of progressive loading.
Once we have that, we'd be able to prioritize this work.

@jonsneyers
Copy link
Author

you can have just a solid-single-color huge image as a placeholder

A style=background:#rgb attribute on an img tag can do that already, so ~22 bytes before compression. I do it. I don't think that is taken as LCP with the current Google tools for example.

Yes, that would be the "nice" way to do it. You could also do it the not so nice "trying to fool LCP" way, which would be putting a 1001x1001 single-color PNG as a placeholder for your 1000x1000 LCP image - where the PNG file can be very small and included as a data uri, and then basically the real LCP image can take 30 seconds to load but it's the PNG that will be counted as the LCP so the metrics will be fine.

When trying to prioritize that work, a blocker question that came up was "do we have evidence that progressive image loading creates a better experience?".
Unfortunately, AFAICT, the web performance community has many opinions on this question, but very little data. I'm currently working with partners to try and get such data, that will hopefully prove (or disprove) the benefits of progressive loading.
Once we have that, we'd be able to prioritize this work.

What's the methodology to get data on this?

One possible pitfall I can see is that there are multiple ways to do progressive loading, and some might be better for user experience than others. A good way imo to think about progressive loading is in terms of different stages of placeholders/previews:

  • Placeholder: not yet a preview, but just something with the right dimensions to indicate "image will come here" - either no image-dependent info at all, or just a single color or at most a gradient of predominant colors from the image
  • LQIP: low quality image preview: this can be something like a blurhash or some other very rough preview, maybe enough to get a general idea of what might be the overall subject of the image (a person, a car, a landscape) but not more than that
  • MQIP: medium quality image preview: something like the DC of a JPEG (a good 1:8 version of the image), enough to get a good idea of what is in the image (a recognizable face, a car on a road with trees in the background and a traffic light on the left, a panoramic landscape with a lake, forests, mountains and clouds), though it is clearly still blurry and not the final image
  • HQIP: high quality image preview: something like a late progressive AC scan of a JPEG (something with basically all the 1:2 detail and most of the 1:1 detail), hard to distinguish from the final image, and can effectively be considered functionally equivalent to the final image (overlay text is legible, faces can be recognized even of people in the background, etc)
  • Final image: everything is loaded

Some of the intermediate stages likely behave differently than others in terms of benefits vs disadvantages in terms of user experience, cognitive load, etc — and also likely the timing of these stages makes a difference on how the loading is perceived and evaluated.

I think the Placeholder and HQIP stages do not introduce significant cognitive load and are just beneficial to user experience. The LQIP and MQIP stages however may indeed require more investigation: it could be that the transition from blurry to (close to) final image is causing cognitive load that makes it less beneficial for (some) user(s') experience. If that's the case, then it might be justified to have a client/browser a11y option to opt in or opt out of getting such blurry intermediate LQIP/MQIP stages.

@benkingcode
Copy link

Is anything happening with this? I'm trying to optimise LCP now, and the obvious solution to this (using low-quality placeholders) paradoxically make LCP worse.

@florentb
Copy link

To complement @dbbk's post, here's an example of an integration with an LQIP approach:
https://codepen.io/twicpics/pen/jwGxZd
Full page here: https://cdpn.io/pen/debug/jwGxZd

The placeholder here is a preview in SVG format (1KB). It's displayed instantly even on slow connections and the aspect-ratio could easily be preserved.

Unfortunately, this behavior will currently negatively impact LCP while user experience is improved.

@yoavweiss
Copy link
Contributor

yoavweiss commented Mar 29, 2022

@dbbk and @florentb - This issue talks about progressively rendered images, which are different from LQIP.
While some of the user visible impacts of LQIP may be similar to progressively loaded images, they are different in kind, in the sense that LQIP is actually making the final version of the image load slower. So in that sense, I think LCP's current definition is correct.

@benkingcode
Copy link

In my case, I was planning on first loading a low-quality version of the image (1x DPR, heavy compression) and then later swapping it with a full-res image. The difference is perceptibly minimal. I would expect this to pass LCP properly, as the user effectively sees the image almost straight away, and there is only a little bump in sharpness/resolution later on.

@jonsneyers
Copy link
Author

Any updates on this?

I have one idea that might help to define/implement a progressive-aware LCP, which might be somewhat costly but maybe is still feasible and could also be applied to some other content types than images: at every repaint (of the currently largest visible object), keep track of the PSNR between this paint and the previous paint. It should be cheap to compute that (it doesn't even need to be computed exactly, you can e.g. sample random rows of pixels). Then define the LCP to happen the last time the change exceeded some threshold (say, 30dB). Importantly, if only a partial bounding box gets updated by the paint (as could happen with sequentially loading images or tile-based codecs), the PSNR should be computed only on the updated bounding box, not on the full object.

For progressively loading images, this means that the first passes (which still cause significant changes) would count towards the LCP, but the final pass(es) (which only cause small changes) wouldn't count.

For text with font swap, this means the LCP can be at the first render if it is 'close enough' to the final render (the fallback font with annotations etc is similar enough to the downloaded font to make most letters overlap enough to get a small enough PSNR between first and final render), but will be at the final render otherwise (e.g. if the words end being in significantly different positions).

This approach would also not care whether the repaints are caused by a single-file progressive loading or by some LQIP approach involving a placeholder that gets replaced later.

@yoavweiss
Copy link
Contributor

Unfortunately, I don't believe PSNR snapshots for each paint would be acceptable, from a performance perspective.

@mlstubblefield
Copy link

In my case, I was planning on first loading a low-quality version of the image (1x DPR, heavy compression) and then later swapping it with a full-res image. The difference is perceptibly minimal. I would expect this to pass LCP properly, as the user effectively sees the image almost straight away, and there is only a little bump in sharpness/resolution later on.

I applied this approach hoping to improve our LCP, but yea I think since it's counting the final image, it doesn't work.
+1 for tweaking the LCP rules!

@yoavweiss
Copy link
Contributor

Progress here is dependent on getting hard data RE the benefits (or lack thereof) of progressive image rendering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants