Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(main-resource): fix protocol error when page is reloaded #14520

Merged
merged 6 commits into from Sep 6, 2023

Conversation

connorjclark
Copy link
Collaborator

@connorjclark connorjclark commented Nov 10, 2022

Fixes #10876

tldr location.reload during a navigation results in MainDocumentContent artifact trying to fetch the contents of a resource that Chrome backend has already evicted. We only use this artifact in one place charset, FYI.

mainDocumentUrl is defined to be the last document during a navigation, so we should respect that in main-resource by returning the last record that matches this url. Note, the smoke test provided in this PR did not error (w/o these changes) if the page JS-navigates to a different url (ex: ?redirect), because in that case only one record applies and the current code correctly selects it.

Co-authored-by: Wojciech Reszelewski <wreszelewski@gmail.com>

@connorjclark connorjclark requested a review from a team as a code owner November 10, 2022 20:24
@connorjclark connorjclark requested review from adamraine and removed request for a team November 10, 2022 20:24
import redirectsHistoryPushState from './test-definitions/redirects-history-push-state.js';
import redirectsMultipleServer from './test-definitions/redirects-multiple-server.js';
import redirectScripts from './test-definitions/redirects-scripts.js';
import redirectsSelf from './test-definitions/redirects-self.js';
import redirectsSingleClient from './test-definitions/redirects-single-client.js';
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by sort, btw

@@ -22,7 +22,16 @@ class MainResource {
const {mainDocumentUrl} = data.URL;
if (!mainDocumentUrl) throw new Error('mainDocumentUrl must exist to get the main resource');
const requests = await NetworkRecords.request(data.devtoolsLog, context);
const mainResource = NetworkAnalyzer.findResourceForUrl(requests, mainDocumentUrl);

const mainResourceRequests = requests.filter(request =>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1- Should we add a new function to NetworkAnalyzer for findLastResourceForUrl?

2- I got rid of the resourceUrl.startsWith(request.url) optimization check because my test case tripped on it, suggesting that it may be broken. I think it might need to become (resourceUrl.startsWith(request.url) || request.url.startsWith(resourceUrl)) to be correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a new function to NetworkAnalyzer for findLastResourceForUrl?

I think this is a good idea.

I got rid of the resourceUrl.startsWith(request.url) optimization check because my test case tripped on it, suggesting that it may be broken.

What did it trip on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two urls can be equal ignoring the fragment, without one being a substring of the other. Just think of a url with a fragment and one without, and consider that this filter predicate won't return the same value if you flip the order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request.url won't ever have fragments, so it should always be the start of a URL that it's equal to when excluding fragments.

@@ -22,7 +22,16 @@ class MainResource {
const {mainDocumentUrl} = data.URL;
if (!mainDocumentUrl) throw new Error('mainDocumentUrl must exist to get the main resource');
const requests = await NetworkRecords.request(data.devtoolsLog, context);
const mainResource = NetworkAnalyzer.findResourceForUrl(requests, mainDocumentUrl);

const mainResourceRequests = requests.filter(request =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a new function to NetworkAnalyzer for findLastResourceForUrl?

I think this is a good idea.

I got rid of the resourceUrl.startsWith(request.url) optimization check because my test case tripped on it, suggesting that it may be broken.

What did it trip on?

// document request, we should return the last candidate here. Besides, the browser
// would have evicted the first request by the time `MainDocumentRequest` (a consumer
// of this computed artifact) attempts to fetch the contents, resulting in a protocol error.
const mainResource = mainResourceRequests[mainResourceRequests.length - 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple other places where we get the main document request this way:

let mainRecord = NetworkAnalyzer.findResourceForUrl(networkRecords, url);

const rootRequest = NetworkAnalyzer.findResourceForUrl(networkRecords, requestedUrl);
if (!rootRequest) throw new Error('rootRequest not found');
const rootNode = networkNodeOutput.idToNodeMap.get(rootRequest.requestId);
if (!rootNode) throw new Error('rootNode not found');
const mainDocumentRequest = NetworkAnalyzer.findResourceForUrl(networkRecords, mainDocumentUrl);

Should we be getting the last request in these places as well?

@brendankenny
Copy link
Member

brendankenny commented Nov 10, 2022

The update:sample-json errors are correct: you can't assume the last request for the main document URL is the actual main document request (thanks dbw for being such a gross site that you'd XHR your own html :)

Is this case generally recoverable? There's an unexpected navigation in the middle of the navigation.

I guess you can think of it as a redirect, but there's going to be a lot of strange stuff going on vs a typical redirect, resources already in the cache, etc

@connorjclark
Copy link
Collaborator Author

connorjclark commented Nov 10, 2022

(authored before reading previous comments..)

Hehe, this is why sample json's critical request chain broke:

image

For dbw_tester.html, that xhr request is now the main resource! Just need to check the type is Document :)

@connorjclark
Copy link
Collaborator Author

I guess you can think of it as a redirect, but there's going to be a lot of strange stuff going on vs a typical redirect, resources already in the cache, etc

This seems better than the alternatives:

  1. catch this error and make the charset audit n/a
  2. re-define what MainDocumentContent/mainDocumentUrl means, if we don't really mean "the last document request"

We could hedge things a bit by emitting a warning when we detect a reload, treating it like the redirect warning.

@brendankenny
Copy link
Member

iframes of itself will also be caught by this but have type Document.

The real alternative is issuing a navigationError in this case. It doesn't seem like any metrics should be trusted for this kind of load.

@adamraine
Copy link
Member

iframes of itself will also be caught by this but have type Document.

Would the initiator type help us in this case? Looks like the main document has type other when iframes have type parser

@wreszelewski
Copy link

Can I help somehow to push this PR further?

core/computed/main-resource.js Show resolved Hide resolved
@connorjclark connorjclark merged commit 09aba86 into main Sep 6, 2023
29 checks passed
@connorjclark connorjclark deleted the main-res-network-res branch September 6, 2023 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

web.dev consistently errors on Network.getResponseBody for main content
5 participants