Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(fetcher): fetch over protocol #12199

Merged
merged 24 commits into from
Apr 27, 2021
Merged

core(fetcher): fetch over protocol #12199

merged 24 commits into from
Apr 27, 2021

Conversation

adamraine
Copy link
Member

Adds fetchResourceOverProtocol which uses Network.loadNetworkResource instead of injecting an iframe.

SourceMaps uses the new function, but keeps the old fetcher as a fallback.

Part of #12070
Closes #12064

@adamraine adamraine requested a review from a team as a code owner March 4, 2021 17:30
@adamraine adamraine requested review from paulirish and removed request for a team March 4, 2021 17:30
@google-cla google-cla bot added the cla: yes label Mar 4, 2021
* @param {{timeout: number}=} options,
* @return {Promise<string>}
*/
async readIOStream(handle, options = {timeout: 5000}) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned about spending too much time making an unknown number of calls to IO.read so I added this timeout.

lighthouse-core/gather/driver.js Outdated Show resolved Hide resolved
}

if (ioResponse.base64Encoded) {
data = Buffer.from(data, 'base64').toString('utf-8');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it would break if there is more than one chunk of base64 data. each chunk get's deserialized and saved to the same data variable.

I think if one io response is encoded, all will be. could just store the data string as it comes over the protocol, and then at the end of the function check the encoded flag and do a Buffer.from

EDIT: actually, base64 isn't a streamable format (can't concat the encoding data and decode it..), so you gotta decode each chunk as you go. so just do Buffer.from on each ioResponse before adding to data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the data isn't encoded in base64 before splitting into chunks. Each chunk is encoded in base64 individually after splitting.

Am I understanding this correctly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, what does the chromium source suggest?

Whatever you find out it'd be good to document that in the protocol definition file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
* @param {string} handle
* @param {{timeout: number}=} options,
* @return {Promise<string>}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string is fine for now and any use case I can envision in the future.

if we wanted to fetch image data, this function would need to return a Buffer, and we'd need an option in fetchResourceOverProtocol to return data as Buffer or a string (or just return Buffer have the caller deal with it)

lighthouse-core/gather/driver.js Outdated Show resolved Hide resolved
lighthouse-core/gather/driver.js Outdated Show resolved Hide resolved
lighthouse-core/gather/gatherers/source-maps.js Outdated Show resolved Hide resolved
@adamraine adamraine changed the title core(driver): fetch over protocol core(fetcher): fetch over protocol Mar 5, 2021
lighthouse-core/gather/driver.js Outdated Show resolved Hide resolved
lighthouse-core/gather/fetcher.js Outdated Show resolved Hide resolved
* @param {{timeout: number}=} options timeout is in ms
* @return {Promise<string>}
*/
async _fetchResourceOverProtocol(url, options = {timeout: 500}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this timeout doesn't apply to the actual network request, though, which seems like more of a worry than streaming the response from the backend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could set the timeout on fetching the resource, then set the remaining time as the timeout for reading the IO stream.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brendankenny What do you think of the new timeout structure?

@@ -28,7 +28,7 @@ class SourceMaps extends Gatherer {
*/
async fetchSourceMap(driver, sourceMapUrl) {
/** @type {string} */
const sourceMapJson = await driver.fetcher.fetchResource(sourceMapUrl, {timeout: 1500});
const sourceMapJson = await driver.fetcher.fetchResource(sourceMapUrl, {timeout: 10000});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's up with this timeout bump?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hitting the 1500 timeout pretty consistently on preactjs.com

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hitting the 1500 timeout pretty consistently on preactjs.com

10s might be too long to block on a single gatherer, though :/

Was this before or after you added _fetchResourceOverProtocol, though? Because if it was after and with the current timeout approach, that would mean just readIOStream is often taking longer than 1.5s?

Copy link
Member Author

@adamraine adamraine Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was after. The iframe method completed within 1.5s.

Never mind, I just tested again and it's timing out with the iframe as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think something else is going on with preactjs.

For the iframe method, I can now repro the file download behavior from #12064 using the CLI for the file bundle.dd34e.esm.js.map. That looks like it never resolves the requestInterceptionPromise and so it always times out no matter how long I set the timeout.

For the Network.loadNetworkResource approach it never times out for me. Instrumenting the function and running it several times, I get a mean time of 176ms for the Network.loadNetworkResource call and 14ms for driver.readIOStream(), way below the timeout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this appears to be an issue with preactjs.com. I'm successfully fetching source maps on other sites. Leaving this timeout as 1500 should be good.

lighthouse-core/test/gather/driver-test.js Show resolved Hide resolved
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@GoogleChrome GoogleChrome deleted a comment Apr 22, 2021
@adamraine
Copy link
Member Author

Forcing LH to make multiple calls to IO.read won't help. Long calls to IO.read don't only happen when 1 read is required to reach EOF. The method we are using to read the entire stream quits early once the timeout is reached. Temporarily disabling the timeout, I have seen long reads (>1000ms) as the first read but never the only one.

It's not all bad though...

The timing issue appears to be fixed in Canary. I have run the new fetcher 50+ times the past 2 days and never experienced a timeout with Canary. I posted a couple bisects in the Chromium issue but unfortunately haven't identified the commit responsible for fixing this. @connorjclark suggested this CL but it wasn't in any of my bisects.

@connorjclark
Copy link
Collaborator

@connorjclark suggested this CL but it wasn't in any of my bisects.

This was the only recent CL I found to the relevant stream files in Chromium.

@adamraine
Copy link
Member Author

@brendankenny
Copy link
Member

brendankenny commented Apr 26, 2021

The timing issue appears to be fixed in Canary. I have run the new fetcher 50+ times the past 2 days and never experienced a timeout with Canary.

I did 20 runs each of Chrome 90 and Chrome 92 (of the smoke test source-map-tester.html), and can confirm the same behavior here, too. About half the m90 runs were 4-5s or more, all of the m92 runs were less than 30ms.

@connorjclark
Copy link
Collaborator

Possible candidate CL:
chromium.googlesource.com/chromium/src/+/75ff92605fb1581f48a334a7d6a2df4ecc14d1ef

This is certainly the right CL, the regression mimics previous issues we had with startup and tasks: https://chromium-review.googlesource.com/c/chromium/src/+/1882029

@brendankenny
Copy link
Member

If we double check that yarn smoke source-maps will fail if the fetcher times out and ends up not getting the source map, that should catch if this ever regresses in the future. Maybe just need a comment there to remind us if it suddenly gets flaky one day?

lighthouse-core/gather/fetcher.js Outdated Show resolved Hide resolved
lighthouse-core/gather/fetcher.js Outdated Show resolved Hide resolved
Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Shame we have to wait for Chrome 92 :/

lighthouse-core/gather/fetcher.js Show resolved Hide resolved
@brendankenny
Copy link
Member

If we double check that yarn smoke source-maps will fail if the fetcher times out and ends up not getting the source map, that should catch if this ever regresses in the future. Maybe just need a comment there to remind us if it suddenly gets flaky one day?

with Chrome stable but using _fetchResourceOverProtocol, yarn smoke source-maps does indeed become flaky, failing often with

  ✘ difference at SourceMaps artifact[0].map
              expected: {"version":3,"file":"out.js","sourceRoot":"","sources":["foo.js","bar.js"],"names":["src","maps","are","fun"],"mappings":"AAgBC,SAAQ,CAAEA"}
                 found: undefined

          found result:
      [
        {
          "scriptUrl": "http://localhost:10200/source-map/source-map-tester.html",
          "sourceMapUrl": "http://localhost:10200/source-map/script.js.map",
          "errorMessage": "Error: Waiting for the end of the IO stream exceeded the allotted time."
        },
        {
          "scriptUrl": "http://localhost:10200/source-map/source-map-tester.html",
          "sourceMapUrl": "http://localhost:10503/source-map/script.js.map",
          "errorMessage": "Error: Waiting for the end of the IO stream exceeded the allotted time."
        }
      ]

so I think we should be good.

Note to future selves: don't just increase the timeout :P

@adamraine
Copy link
Member Author

If we double check that yarn smoke source-maps will fail if the fetcher times out and ends up not getting the source map

Yeah, timeout of 1 causes smoke to fail

@adamraine adamraine merged commit 83f8c56 into master Apr 27, 2021
@adamraine adamraine deleted the fetch-protocol branch April 27, 2021 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Source Map is downloaded during performance audit
5 participants