Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long attribute value truncation (PR #10984) fires off requests to the truncated URLs #11244

Closed
joshmcarthur opened this issue Aug 10, 2020 · 4 comments

Comments

@joshmcarthur
Copy link

Provide the steps to reproduce

  1. Create an HTML file with two images. The first image should have a src attribute over 75 chraracters (just so it fails the largest contentful paint audit)
  2. Run a static server that logs HTTP requests. I used python3 -m http.server to run a HTTP server on localhost, port 8000.
    3, Audit the page: lighthouse --only-categories=performance http://localhost:8000

There is a repo containing an image, an HTML file and a convenience bash script to audit at https://github.com/joshmcarthur/lighthouse-requesting-ellipsied-urls-repro.

HTTP server output:

127.0.0.1 - - [10/Aug/2020 18:49:17] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [10/Aug/2020 18:49:17] "GET /image.jpg?query-param-to-get-over-75-chararacters-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx HTTP/1.1" 200 -
127.0.0.1 - - [10/Aug/2020 18:49:17] "GET /image.jpg HTTP/1.1" 200 -
127.0.0.1 - - [10/Aug/2020 18:49:17] code 404, message File not found
127.0.0.1 - - [10/Aug/2020 18:49:17] "GET /favicon.ico HTTP/1.1" 404 -
127.0.0.1 - - [10/Aug/2020 18:49:19] "GET /image.jpg?query-param-to-get-over-75-chararacters-xxxxxxxxxxxxxxxxxxxxxxx%E2%80%A6 HTTP/1.1" 200 -
127.0.0.1 - - [10/Aug/2020 18:49:19] code 404, message File not found
127.0.0.1 - - [10/Aug/2020 18:49:19] "GET /asset-manifest.json HTTP/1.1" 404 -

What is the current behavior?

getOuterHTMLSnippet truncates the attribute by cloning the attribute, setting the attribute with the truncated value, and then getting the attribute again. The specific issue is that some elements have a side effect of calling setAttribute - they request the attribute value. Tags that I know of with this behaviour are img and script, but I'm sure there are more. Because of this side effect, setAttribute being called on the cloned element results in a request to the truncated URLs.

This affects me specifically because I have a test run (RSpec/Capybara/Chromedriver) that runs Lighthouse as part of the CI process. Since updating to 6.2.0, the test suite fails, because any server-side failure is counted as a test suite failure - which is reasonable. When lighthouse runs, it tries to request a truncated URL for a webpack asset with a digest that pushes it over 75 chars, resulting in a server-side error being emitted due to the asset file being expected to be present (and it is - but not with a truncated filename).

What is the expected behavior?

In versions of Lighthouse prior to 6.2.0, image src attributes with a length greater than 75 chars would not be truncated, so the node would not be cloned, so the truncated URL would not be requested - thus, no malformed URLs were requested and our test suite was happy.

The frontend-facing truncation behaviour is correct, and appreciated. I'm unsure of an appropriate fix for this in the page functions, as I imagine the clone/setAttribute/getAttribute is being done for a good reason, and I'm not aware of a way to call setAttribute without triggering the effects of setting that attribute.

Environment Information

  • Affected Channels: Repro'd using CLI, but because this problem is in page-functions I think it will affect all channels.
  • Lighthouse version: 6.2.0
  • Chrome version: 84.0.4147.105 (Official Build) (64-bit) (Lighthouse 6.0.0)
  • Node.js version: Reproduced on latest (14.7.0) and current LTS
  • Operating System:

NAME="Pop!_OS"
VERSION="20.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
LOGO=distributor-logo-pop-os

Related issues

[ No bug reports found in search for: Recent, "truncated", "ellipsis", "truncating", "truncate" ]

@patrickhulce
Copy link
Collaborator

patrickhulce commented Aug 10, 2020

Thanks very much for the detailed bug report with examples @joshmcarthur! I can reproduce the behavior you describe.

Perhaps it's time to convert outer HTML to operate on the stringified version rather than a live node, even if it is detached from the document.

cc @Beytoven for his thoughts

@asarosenberg
Copy link

I have seen this issue on customer sites recently. Running Google PageSpeed against sites that have long resource URLs results in a large amount of 404s in rapid succession. Since WordPress handles 404s internally (via PHP) and those requests are not cached this causes a load spike. On sites with many long URLs and complex 404s it can even exhaust resources on hosting accounts (30+ 404s per second). Here is a sample access log from a test site: pastebin.com/mzmJqvMp

I am afraid I am not able to provide suggestions for a fix but wanted to highlight the potential negative impact of the bug in production environments.

@patrickhulce
Copy link
Collaborator

Ah so we had someone dig more into this in #11465 and we have a proposed solution now. Even though this one is first, that issue has a better discussion history :) Thanks again for filing @joshmcarthur and for the extra info @asarosenberg!

@csabapalfi
Copy link
Contributor

I submitted #11503 that fixes the issue.

127.0.0.1 - - [01/Oct/2020 11:30:13] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [01/Oct/2020 11:30:13] "GET /image.jpg?query-param-to-get-over-75-chararacters-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx HTTP/1.1" 200 -
127.0.0.1 - - [01/Oct/2020 11:30:13] "GET /image.jpg HTTP/1.1" 200 -
127.0.0.1 - - [01/Oct/2020 11:30:13] code 404, message File not found
127.0.0.1 - - [01/Oct/2020 11:30:13] "GET /favicon.ico HTTP/1.1" 404 -
127.0.0.1 - - [01/Oct/2020 11:30:15] code 404, message File not found
127.0.0.1 - - [01/Oct/2020 11:30:15] "GET /asset-manifest.json HTTP/1.1" 404 -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants