New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get all navigation redirect urls #2163
Comments
Since pptr 1.2.0, you can get a redirect chain for every request, see For the main navigation, you're interested in the redirect chain for the main resource: const response = await page.goto('http://example.com');
const chain = response.request().redirectChain();
console.log(chain.length); // 1
console.log(chain[0].url()); // 'http://example.com' Hope this helps. |
This seems to be the solution to @leem32's problem (found at https://groups.google.com/forum/#!topic/chrome-debugging-protocol/rPSMWfFD2Jo): const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
// note: add trailing slash since chrome adds it
if (!url.endsWith('/'))
url = url + '/';
// urls hold redirect chain
const urls = [url];
const client = await page.target().createCDPSession();
await client.send('Network.enable');
await client.on('Network.requestWillBeSent', (e) => {
if (e.type !== "Document") {
return;
}
console.log("EVENT INFO: ");
console.log(e.type);
console.log(e.documentURL);
console.log("INITIATOR: " + JSON.stringify(e.initiator, null, 4));
// check if url redirected
if (typeof e.redirectResponse != "undefined") {
// get redirect info
console.log("REDIRECT STATUS CODE: ");
console.log(e.redirectResponse.status);
console.log("REDIRECT REQUEST URL: ");
console.log(e.request.url);
urls.push(e.redirectResponse.status, e.request.url);
} else {
// url did not redirect
if (e.request.url !== urls[urls.length - 1]) {
console.log("NO REDIRECT REQUEST URL: ");
console.log(e.request.url);
urls.push(e.request.url);
}
}
});
await page.goto(url);
console.log("Final urls array: ");
console.log(urls);
await browser.close();
}); |
Doesn't work for me if URLs are being shorten using bit.ly or https://www.shorturl.at/shortener.php |
This worked for me for capturing redirects specific to the location of the browser. Not all page assets. // Request interception handler
page.on('request', request => {
// Capture any request that is a navigation requests that attempts to load a new document
// This will capture HTTP Status 301, 302, 303, 307, 308, HTML, and Javascript redirects
// Make sure the redirect is in the parent frame or we will see the navigation for other frames
var parentFrame = request.frame().parentFrame()
if (request.isNavigationRequest() && parentFrame === null) {
o = { url: request.url() }
redirects.chain.push(o)
}
// Continue to next request
request.continue()
}); |
I'm trying to use the new
request.frame()
method to log all navigation/domain redirects, but it only seems to log JS redirects. Along with JS redirects I also need Meta refresh and PHP redirects.Another issue with
request.frame()
is it also logs other non-navigation redirects which I do not want such as doubleclick.net and image server links.How can I use
request.frame
to log all navigation/domain redirects?request.frame()
If using
request.frame()
will not work to log all navigation/domain redirects (JS, Meta, PHP), how else can I achieve this with Puppeteer?Note: Ideally I'd also get the status code of each navigation redirect i.e 200, 301, 404, but if this isn't possible I can just curl each URL instead.
Thoughts: In the
Network tab
of Chrome Dev tools, if I selectPreserve log
and then load a URL in the browser which redirects a few times. Chrome Dev tools picks up all the client and server redirects along with the status codes. Could I access this info somehow? from the request headers maybe? I would just require the navigation/domain redirects and status codes. If I could access this info I would need to find a way to differentiate between navigation redirects and all other requests.v1.1.1
Code example:
Thanks :)
The text was updated successfully, but these errors were encountered: