Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Domain APIs and Events never resolve/reject when connection to remote chrome instance fails. #304

Closed
sravani93 opened this issue Nov 14, 2017 · 8 comments

Comments

@sravani93
Copy link

Component Version
Operating system Debian GNU/Linux buster/sid
Node.js v8.4.0
Chrome/Chromium/... 61.0.3163.91
chrome-remote-interface 0.23.3

Is Chrome running in a container? NO
Chrome instance(Headless) is running on port 9222.

So the issue is that there are a few Domain APIs that never resolve or reject when connection to the remote chrome instance fails .

These are few Domain APIs that do reject with an error -

  • CSS.createStyleSheet()
  • Page.captureScreenshot() etc
    The above reject with an error . ("connect ECONNREFUSED 127.0.0.1:9222" , "read ECONNRESET" etc)

Where as The below just never resolve/reject and remain indefinitely in unresolved state.

  • Runtime.evaluate();
  • CSS.enable(); (and other ".enable()"s)
  • Emulation.setDeviceMetricsOverride()

Steps to reproduce the problem :

  • Create a chromeInstance using https://github.com/GoogleChrome/chrome-launcher
  • Open a tab and navigate to any URL.
  • use the kill() method of the chrome-launcher to kill the chrome instance. Note : This should be an asynchronous call . Dont await on the kill() method. We are trying to simulate the case when the remote chrome instance abruptly closes .
  • await on CSS.enable() .
const target = await CDP.New();
const client = await CDP({ target: target.id });
const { CSS  } = client;
chromeInstance.kill() //dont await on this.
await CSS.enable();
  • The above promise will remain in unresolved state indefinitely .

Note : The kill() should be asynchronous. If chrome is killed synchronously , CSS.enable() rejects with a "not opened" error message.

The real scenario :

  • We are opening multiple tabs in chrome and using various domains of the respective clients to perform certain activities.
  • We started noticing that when the load is heavy , a lot of the above mentioned promises never resolve/reject , causing our application to pause.
  • On checking the kernel logs with "dmesg -w" , we found a ton of these messages: "request_sock_TCP: Possible SYN flooding on port 9222. Sending cookies. Check SNMP counters."
  • We thus concluded that some of the APIs we are awaiting on are not resolving because the connection to the remote chrome instance is not getting established.
  • Note : This can be simulated in linux using "sudo iptables -A OUTPUT -p tcp --dport 9222 -j DROP" to drop connections to port 9222.
  • Currently , We have kept timeouts for these promises so that our application doesn't stop.
  • However , this looks like a shabby way of dealing with it.

Failure of connection establishment can happen in different ways -

  • The remote chrome instance has abruptly closed.
  • There is a TCP backlog buffer overflow .

The APIs must throw some kind of an error in this situation.

If not , everytime I use a new API , I should be careful to have a timeout .

@sravani93 sravani93 changed the title Some Domain APIs and Events never resolve/return when connection to remote chrome instance fails. Some Domain APIs and Events never resolve/reject when connection to remote chrome instance fails. Nov 14, 2017
@cyrus-and
Copy link
Owner

So the issue is that there are a few Domain APIs that never resolve or reject when connection to the remote chrome instance fails .

It's not about some particular methods, every method is affected but since there is a race condition you may not always experience it.

This problem is an instance of this, basically:

  1. Chrome is killed;
  2. the WebSocket is closed;
  3. the promise associated to the CDP method that relies on an incoming message is simply discarded by the JavaScript runtime.

Based on your code this is a standalone example:

const CDP = require('chrome-remote-interface');
const chromeLauncher = require('chrome-launcher');

async function test() {
    const chromeInstance = await chromeLauncher.launch({
        chromeFlags: ['--headless', '--disable-gpu', '--no-sandbox']
    });

    const port = chromeInstance.port;
    const client = await CDP({port});

    chromeInstance.kill();

    console.log('pre');
    await client.DOM.enable();
    console.log('post'); // never reached
}

test();

Now for what concerns the solution, you can register for the disconnect event:

client.on('disconnect', () => {
    console.log('disconnect');
});

@sravani93
Copy link
Author

Thanks (Y) @cyrus-and

@dinofx
Copy link

dinofx commented Feb 3, 2021

I'm confused why this isn't a valid bug. If node has had a chance to handle the websocket getting disconnected, then Promises that come back from the client will reject. But if Chrome dies and you call a method on the client before the event loop has had a chance to run, you end up with a Promise that never resolves.

It seems like a bad idea to return a promise that has a small (but non-zero) chance of never resolving.

@cyrus-and
Copy link
Owner

@dinofx this is not a bug given the current implementation. But that's certainly a doable thing, at least for promises only, but to be coherent with the callbacks, they should fail upon disconnection too. And especially for events (that we should handle too), for example:

await Page.loadEventFired();

But this would require a change in the API, because now they only accept the params object. Changing it to (err, params) would break so many thinks that's probably not worth it.

@lilacdev
Copy link

I'm confused why this isn't a valid bug. If node has had a chance to handle the websocket getting disconnected, then Promises that come back from the client will reject. But if Chrome dies and you call a method on the client before the event loop has had a chance to run, you end up with a Promise that never resolves.

It seems like a bad idea to return a promise that has a small (but non-zero) chance of never resolving.

I totally agree with that. During our tests, if a network issue occurs, disconnect event is never triggered, and code 'runs' forever. I think a timeout should be managed, and throw an error.

@cyrus-and
Copy link
Owner

@lilacdev isn't this fixed by 161cd17?

@lilacdev
Copy link

I was reading code and saw this part. As it should solve our issue but doesn't, I am investigating and trying to write a short test in order to find why we are facing this issue on our test stack.

@lilacdev
Copy link

lilacdev commented Aug 31, 2023

@lilacdev isn't this fixed by 161cd17?

@cyrus-and Yes it does. Awesome ! Our dep version was stuck on 0.32.2. Many thanks, and by the way, I think this patch solved a lot of issues. Have a nice day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants