-
Notifications
You must be signed in to change notification settings - Fork 9.1k
-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to stop puppeteer follow redirects #1132
Comments
Same problem here… |
So I have a solution now but I agree with @GuilloOme that followRedirect: false should be a .goto option prop.
then in redirect I have:
|
Thanks a lot @ali-habibzadeh for this workaround! |
definitely agree. using let with side effect like i have done is not ideal code. |
@ali-habibzadeh, your solution is good enough if you are sure that it will not be any concurrent request. In my context, it could be possible to have multiple concurrent request ; so, to avoid blocking the wrong request, I store the redirect response and compare the "location" url with the given url. Here is my workaround: let lastRedirectResponse = undefined;
page.setRequestInterceptionEnabled(true);
page.on('response', response => {
// if this response is a redirect
if ([301, 302, 303, 307, 308].includes(response.status)
&& response.request().resourceType === 'document') {
lastRedirectResponse = response;
}
});
page.on('request', interceptedRequest => {
// if this request is the one related to the lastRedirect
if (lastRedirectResponse
&& lastRedirectResponse.headers.location === interceptedRequest.url) {
interceptedRequest.abort();
}
}); |
@ali-habibzadeh @GuilloOme I'm curious why would you need to load without redirects? |
@GuilloOme Would concurrency of request be a concern even if chromium is running with @aslushnikov My rationale for this is mainly based on two things. Firstly if you're a technical tool that reports on redirect chains, it suits the context more to make a clean request, get the response and report on it. |
@ali-habibzadeh, my concern is more about pages where multiple resources are requested at the "same" time and in the case of one or more returns a redirect, I will block the wrong one… You make me think that I should store all the redirect response (not only the last one) and check against them all. @aslushnikov, to add some context to @ali-habibzadeh's point: I use puppeteer with chromium to crawl pages. For a given starting url, I need to get all the "outbound" urls without navigating it. The redirect is a type of "outbound" url since it goes "away" from my starting url. Then, I make a decision based on multiple factors to go through with it or not. In this case, it is very important to have a complete control over any navigation (on a side note, that also why I need some workaround for #823) |
We potentially need to bring meta refresh directs under the umbrella of this. Unfortunately page scripts can not stop those redirects. Also there isn't a supported methodology for modifying the response text before the page is rendered (e.g. to modify/remove the meta refresh tags). Anyone has a solution for stopping the meta redirect too? I found this page: https://bugs.chromium.org/p/chromium/issues/detail?id=63107 |
@ali-habibzadeh, I tried and you can prevent the meta redirect with the extension I wrote as a workaround for the "navigation away" problem (here: #823 (comment)). if you have question about the extension, you can comment on the related gist. |
Coupling request interception with await page.setRequestInterception(true);
page.on('request', request => {
if (request.isNavigationRequest() && request.redirectChain().length)
request.abort();
else
request.continue();
});
await page.goto('https://example.com'); |
@aslushnikov |
@aslushnikov Thank you! Life saver ) |
Commenting here to prevent from creating a new issue. I have a problem with these, im trying to take screenshots from pages before the make the redirect, if I use your example im unable to do this, as it takes it as a navigation error, theres no real way to prevent the redirects from making me get another url? in this case they would be 307 (Temporary redirects) |
@aslushnikov if all i want is to know if the URL has a follow redirect that this code enough?
|
Any way I can call |
Currently it seems the default behaviour of puppeteer is to follow redirects and return the DOM at the end of the chain.
How can this be changed when need to stop the behaviour and make the .goto to stop after the first redirect and simply return the html from that first 301 page for example?
The text was updated successfully, but these errors were encountered: