Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove default styles #71

Merged
merged 2 commits into from May 12, 2022
Merged

Conversation

joswhite
Copy link

getComputedStyle returns about 340 CSS styles for an element, almost all of which are default styles. According to my tests, omitting these default styles from the raster image captures makes screenshots 60% faster and shortens the image data URIs by 11x in length. (By "raster" I mean the types of captures not requiring CSS reduction, i.e. all methods besides toSvg, which shortens CSS styles even further but takes much longer to do so).

This is similar to PR #37 (which added the CSS reduction for toSvg), but the code caches the default styles for each type of HTML element, so the calculations are significantly faster and actually speed up screen captures.

This would close #70. It also helps close dom-to-image #245.

Performance comparisons according to local tests:

Test case: Capture a webpage with 5675 HTML elements using toBlob, which generates a raster image (does not use CSS reduction).

Baseline: The task took 7.9 seconds (average between Firefox, Chrome, and Edge), which involved copying 1929500 styles (340 per element) from the webpage to the clone being captured. The clone's data URI length was 32561735 characters (31.1 MB).

With this PR: The task takes 3.1 seconds, which involves copying 39051 styles (6.9 per element) from the webpage to the clone. The clone's data URI length is 2843600 characters (2.7 MB). This is 60% faster, has 49x less styles to copy, and the data URI length is 11x shorter in length.

Test case: Same as above, but capture using toSvg, which generates a non-raster image (uses CSS reduction). This task uses the CSS reduction code from PR #37, which has not been altered significantly by the PR. The baseline, as well as the performance with this PR is the same: The task takes 38.5 seconds, which involves copying 24374 styles (3.5 per element) from the webpage to the clone being captured. The clone's data URI length is 2174916 characters (2.1 MB), which is 15x shorter than the toBlob baseline, at the expense of 5x longer compute time.

Implementation details

For the raster image captures (all methods besides toSvg), the code uses a new copyUserComputedStyleFast function to copy styles from an HTML node to the clone. copyUserComputedStyleFast first obtains the default styles for HTML elements with the same tag name as the node being copied. (It does this by querying the computed styles for an HTML element within a hidden <iframe>. The default styles are identical to the computed styles, except an adjustment we make for width and height - see note [1] below.) It compares the node with the default element styles, then only copies the styles that differ to the clone.

For the non-raster image captures (toSvg), the code uses a new copyUserComputedStyle function in place of the previous getUserComputedStyle. The previous function obtained the non-default styles for an HTML element, then assigned these styles to a dummy DOM element so it could return them. The new function is equivalent, except it no longer needs to assign styles to a dummy DOM element since it only copies (not returns) styles. This might be a tad bit faster, although performance was identical in my tests.

One more minor change: Before this PR, toCanvas used CSS reduction, which I assumed was accidental, so I changed it to not require CSS reduction.

How does the Node environment differ from the browser environment? I'm asking because the code uses iframe.contentWindow.document.body and iframe.contentWindow.getComputedStyle while getting the default styles, which works great on Firefox, Edge, and Chrome (see MDN). I'm not sure whether a different syntax is needed for Node. Feel free to change any implementation details of this PR. Thanks for all your work on this amazing library!

[1]: For block elements, width, height, block-size, inset-block, transform-origin, and perspective-origin are "special" in that the initial and computed values don't match. Let me explain. For example, for width and height, the default style (initial value) is always auto. However, the computed value (as returned by getComputedStyle) is an absolute value measured in px. If the code returned this as the default style, then it wouldn't clone width for certain nodes (block elements with a set width that matched the measurement), which would be a bug. The same goes for height and the other 4 CSS properties listed above. To remedy this, the code returns auto as the default style for width and height, since that is always the initial value. The other 4 CSS properties are used infrequently, and the likelihood that they would be used and happen to match the measurement is very slim, so the code doesn't remedy the issue for them. If you do want to make the code more robust to accommodate these (very rare) situations, let me know and I'd be happy to change the PR to return the initial value for these other 4 properties as well. This could be done programmatically, for example, by simply creating two HTML elements in the hidden <iframe> instead of one, of different sizes, and only returning the default styles that matched between the two elements.

getComputedStyle returns about 340 CSS styles, almost all of which are default styles. Omitting these default styles from the raster image computations makes screenshots 60% faster and shortens the image data URIs by 11x in length.
`raster=true` is now applied to all methods besides `toSvg`, which was the original intention.
sandbox.style.position = 'fixed';
document.body.appendChild(sandbox);
// Ensure that the iframe is rendered in standard mode
sandbox.contentWindow.document.write('<!DOCTYPE html><meta charset="UTF-8"><title>sandbox</title><body>');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worthwhile reaching up to ensure the meta charset assumption is valid

Copy link
Author

@joswhite joswhite May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? Let's change this to

sandbox.contentWindow.document.write('<!DOCTYPE html><meta charset="' +
(document.characterSet || 'UTF-8') + '"><title>sandbox</title><body>');

According to tests with the latest Chrome, Edge, and Firefox, document.characterSet is always defined, so we could use document.characterSet in place of (document.characterSet || 'UTF-8'), but I haven't been able to test earlier versions of these browsers so keeping the default as UTF-8 seems to be a good idea.

Do we need a PR for this or would you like to change it?

var defaultElement = document.createElement(tagName);
sandbox.contentWindow.document.body.appendChild(defaultElement);
// Ensure that there is some content, so that properties like margin are applied.
defaultElement.textContent = '.';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set this to a zero-width non-breaking space instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in defaultElement.textContent = ' ';?

For block elements like div, that results in the defaultElement having a height of 0px but still having a width. Since we override height to auto, it only changes the block-size, persective-origin, and transform-origin being returned on the default style, to a different number. For inline elements like span, that results in defaultElement having a width of 0px but still having a height. In this case it only changes persective-origin and transform-origin being returned on the default style.

Not sure whether this is desirable or not. Seems to be equivalent. The element being cloned would need to have the CSS property set, and the value would have to match the defaultElement computed value, in order for it to cause problems (i.e. omitting the style from the cloned HTML). Very slim chance this would happen.

I'm inclined to stick with some text content (rather than spaces), due to the StackOverflow answer from which I derived this section of code: https://stackoverflow.com/questions/42025329/how-to-get-the-applied-style-from-an-element/42068963#42068963. In his code snippet the author includes

// ensure that there is some content, so that e.g. margin is applied
elVanilla.textContent = 'foo';

I'm fine with either though. Feel free to change it to a single space if you'd like.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small PITA from using .... it's the start of a CSS class in a <style> tag, or a invalid dot operator in a embedded <script> tag. I plan to change it to ; as a minimum in #102.

@IDisposable
Copy link
Member

Going to merge this as-is, but would appreciate further discussion on the two comments noted.

@IDisposable IDisposable merged commit 6a72047 into 1904labs:master May 12, 2022
@IDisposable
Copy link
Member

Released in 2.10.1

@joswhite
Copy link
Author

Thanks for merging this!

@zm-cttae
Copy link

I've identified a regression in the original implementation of copyUserComputedStyle - #90.

@joswhite joswhite deleted the remove-default-styles branch January 30, 2023 12:30
zm-cttae added a commit to zm-cttae/dom-to-image-more that referenced this pull request Jan 30, 2023
The other methods were self-cleaning since 1904labs#71
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance bottleneck Cannot take screenshot of more complex webpages
3 participants