New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to catch switching to download mode? #948

Closed
vsemozhetbyt opened this Issue Oct 3, 2017 · 9 comments

Comments

Projects
None yet
3 participants
@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Oct 3, 2017

  • Puppeteer version: v0.12.0-alpha
  • Platform / OS version: Windows 7 x64
  • URLs (if applicable): any with non-openable content type

A stright attempt just throws a timeout (the file is downloaded successfully though):

const puppeteer = require('puppeteer');

(async function main() {
  try {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    const response = await page.goto('https://nodejs.org/dist/v8.6.0/node-v8.6.0.tar.gz');

    console.log(response.status);

    await browser.close();
  } catch (err) {
    console.error(err);
  }
})();
Error: Navigation Timeout Exceeded: 30000ms exceeded
    at NavigatorWatcher.waitForNavigation (puppeteer\lib\NavigatorWatcher.js:73:20)
    at <anonymous>
@ks07

This comment has been minimized.

Copy link

ks07 commented Oct 12, 2017

Also affected by this - navigating to a URL with an unsupported content-type such as application/octet-stream always causes navigation to hit the timeout, regardless of how quickly the server responds to the request.

@vsemozhetbyt I hadn't noticed that the files were downloaded, where are they stored?

@vsemozhetbyt

This comment has been minimized.

Copy link
Contributor

vsemozhetbyt commented Oct 12, 2017

@ks07 If you do not use preallocated user data dir with download dir defined, there are default places, like c:\Users\%username%\Downloads\ on Windows.

@ks07

This comment has been minimized.

Copy link

ks07 commented Nov 15, 2017

After upgrading to puppeteer 0.13.0 this problem has gotten worse. Previously, using 0.12.0, I found it was possible to listen to the response event on the page object and determine based on the response headers (i.e. content-type) whether the goto failure was due to an actual timeout or because the URL triggered a download. In 0.13.0 the response event is seemingly no longer triggered when visiting a download URL, so it's not possible to detect when this is happening any more.

@ks07

This comment has been minimized.

Copy link

ks07 commented Nov 15, 2017

A slight adjustment to the example script posted by @vsemozhetbyt shows how the behaviour has changed:

const puppeteer = require('puppeteer');

(async function main() {
  try {
    const browser = await puppeteer.launch({args: ['--no-sandbox']});
    const page = await browser.newPage();

    page.on('response', (resp) => {
        console.log('Got a response.');
    });
    page.on('requestfailed', (req) => {
        console.log('Request failed.');
    });
    page.on('requestfinished', (req) => {
        console.log('Request finished.');
    });

    const response = await page.goto('https://nodejs.org/dist/v8.6.0/node-v8.6.0.tar.gz');

    console.log(response.status);

    await browser.close();
  } catch (err) {
    console.error(err);
  }
})();

With puppeteer@0.12.0:

$ node index.js
Got a response.
Got a response.
Request failed.
Error: Navigation Timeout Exceeded: 30000ms exceeded
    at NavigatorWatcher.waitForNavigation (/home/george/chromedltest/node_modules_12/puppeteer/lib/NavigatorWatcher.js:76:20)
    at <anonymous>

With puppeteer@0.13.0:

$ node index.js
Error: Navigation Timeout Exceeded: 30000ms exceeded
    at Promise.then (/home/george/chromedltest/node_modules_13/puppeteer/lib/NavigatorWatcher.js:69:21)
    at <anonymous>

This means there's no longer a way to detect when navigation has triggered a download vs a normal timeout. This is unfortunate seeing as we now have a fix for #901

@ks07

This comment has been minimized.

Copy link

ks07 commented Nov 16, 2017

I've tried playing with the setDownloadBehavior API mentioned in #299 (comment) but regardless of whether behavior is set to allow or deny the navigation times out and the various response events aren't triggered.

You could use a combination of allow and checking for the existence of a new file in the specified download location, as a hacky workaround.

@aslushnikov

This comment has been minimized.

Copy link
Contributor

aslushnikov commented Jan 12, 2018

@vsemozhetbyt @ks07 can you please share your usecases? Do I understand correctly that this is different from #299?

@vsemozhetbyt

This comment has been minimized.

Copy link
Contributor

vsemozhetbyt commented Jan 12, 2018

@aslushnikov I have a script that checks all the links in a doc set. I can use response.ok and response.status() for openable URLs, but cannot check if non-openable URLs are valid (not 404 or likes).

@aslushnikov

This comment has been minimized.

Copy link
Contributor

aslushnikov commented Jan 12, 2018

@vsemozhetbyt ah makes sense, thank you for sharing.

@aslushnikov

This comment has been minimized.

Copy link
Contributor

aslushnikov commented May 31, 2018

This behaves much better now. Chromium aborts navigation to resource if it causes a download. With pptr 1.4.0, this kind of navigation results in ERR:Aborted being thrown; with #299 in place we'll also get a download event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment