Zombie Process problem. #1825

bahattincinic · 2018-01-17T11:57:45Z

Hello,

Recently we talked about this problem in the issues #1823 and #1791.

Environment:

Puppeteer Version: 1.0.
Chrome Version: 64.0.3282.71 (https://github.com/adieuadieu/serverless-chrome/releases/tag/v1.0.0-34)
Platform / OS version: AWS Lambda
Node.js version: 6.10 (https://aws.amazon.com/about-aws/whats-new/2017/03/aws-lambda-supports-node-js-6-10/)

Use Case:

We are using puppeteer on AWS Lambda. We take a screenshot of given HTML template and upload it to S3 and use this image for future requests
It handles over 100 million requests each month. That's why every process should be atomic and immutable. (AWS Lambda has a disk and process limit.)

Example Code:

const browser = await puppeteer.launch({
  args: ['--disable-gpu', '--no-sandbox', '--single-process', 
             '--disable-web-security', '--disable-dev-profile']
});
const page = await browser.newPage();
await page.goto('https://s3bucket.com/markup/a.html');
const response = await page.screenshot({{ type: 'jpeg', quality: 95 }});
browser.close();

Problem

When we are using example code, we got disk error from AWS Lambda.

Example /tmp folder:

2018-01-12T14:55:38.553Z    a6ef3454-f7a8-11e7-be0f-17f405d5a180    start stdout: total 226084
drwx------ 3 sbx_user1067 479 4096 Jan 12 14:55 .
drwxr-xr-x 21 root root 4096 Jan 12 10:53 ..
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:33 core.headless-chromi.129
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:15 core.headless-chromi.131
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:49 core.headless-chromi.135
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:52 core.headless-chromi.137
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:50 core.headless-chromi.138
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:51 core.headless-chromi.14
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:49 core.headless-chromi.15
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:36 core.headless-chromi.169
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:15 core.headless-chromi.174
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:52 core.headless-chromi.178
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:50 core.headless-chromi.180
drwx------ 3 sbx_user1067 479 4096 Jan 12 14:14 .pki

When we investigated these files, we understood that it is a core dump. We removed these files after the process completed.

When we monitored process list, we saw zombie processes Zombie chrome processes have been growing increasingly. We can't kill them. AWS Lambda has a maximum process limit. (max 1024 process) That's why we reach the lambda limits.

483 1 3.3 1.6 1226196 65408 ? Ssl 22:07 0:05 /var/lang/bin/node --max-old-space-size=870 --max-semi-space-size=54 --max-executable-size=109 --expose-gc /var/runtime/node_modules/awslambda/index.js
483 22 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 73 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 119 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 166 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 214 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 262 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 307 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 353 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 1915 0.0 0.0 0 0 ? Z 22:09 0:00 [sh] <defunct>

We couldn't use dump-init on lambda. Because lambda already has an init system.

How did we fix it? (very hacky method)

We used browser.disconnect() instead of browser.close(). We manualy managed chrome processes such as kill.

Example Code:

browser.on('disconnected', () => {
    console.log('sleeping 100ms'); //  sleep to eliminate race condition  
    setTimeout(function(){
    console.log(`Browser Disconnected... Process Id: ${process}`);
    child_process.exec(`kill -9 ${process}`, (error, stdout, stderr) => {
        if (error) {
        console.log(`Process Kill Error: ${error}`)
        }
        console.log(`Process Kill Success. stdout: ${stdout} stderr:${stderr}`);
    });
}, 100);

Firstly we didn't use this method. We only killed the process after browser disconnect. We got the following error:

Error: read ECONNRESET at exports._errnoException (util.js:1018:11) at TCP.onread (net.js:568:26)

I think it looks like a puppeteer process management problem. When we used this method, we didn't receive any puppeteer related errors. How can we fix it?

Thanks.

The text was updated successfully, but these errors were encountered:

aslushnikov · 2018-01-17T23:04:58Z

@bahattincinic thanks for filing this.

So if I understand correctly, your workaround with disconnect-and-kill works? This should hint us into what's going on.

bahattincinic · 2018-01-18T07:36:03Z

Yes. We disconnected browser and killed process.

paambaati · 2018-01-19T04:09:47Z

@bahattincinic @aslushnikov I've briefly touched upon this here; killing the Chromium process aggressively on complete/timeout/errors helped us greatly as well.

grantstephens · 2018-02-16T16:32:26Z

I found that doing await page.goto('about:blank') helps in terms of reducing cpu and memory usage, even if reusing the tabs setting to about:blank between shots somehow seems to keep cpu and memory under control.

leobudima · 2018-03-11T05:53:40Z

@bahattincinic - thanks, I've tried your method of disconnecting + killing the process, and while it does kill the "main" process returned by puppeteer.launch(), each run seems to leave another defunct zombie with a PID that is different than the killed one...

What's worse, when I run ps aux right after puppeteer.launch(), aside from the "main" process, there is already one that's defunct, right away, before running code or trying to kill anything.

I've tried sending a kill -15, hoping that will allow the main process to clean up its children, but -15 or -9 doesn't make any difference, so I'm still stuck with an ever-growing list of zombies and rising memory...

Do you have any advice on how you managed to keep it clean of those as well (if you had a similar experience)? I'm also running on Lambda, same args used, puppeteer 1.1.1. Thanks!

bahattincinic · 2018-03-12T09:18:17Z

@leobudima

We are doing following methods to avoid zombie process.

We used browser.close(); instead of killing the process.
We are using waitpid (https://www.npmjs.com/package/waitpid) (while (waitpid2.waitpid(-1, 0 | waitpid2.WNOHANG) == -1))
We delete /tmp folder after process completed (rm -r /tmp/core.* || true)

if your project doesn't depend on AWS Lambda, you can use my example project. https://github.com/bahattincinic/puppeteer-docker-example

leobudima · 2018-03-13T22:11:03Z

@bahattincinic - thanks a lot for providing details - waitpid is an interesting approach and I'll definitely try with cleaning /tmp, hopefully that helps! If I don't manage to make it run reliably on Lambda, I'm going to have to try with docker - thanks for linking the example!

jdiamond · 2018-03-26T18:31:26Z

@bahattincinic How does stopping the parent process with waitpid help with reaping zombie processes?

yoongkang · 2018-05-16T01:31:16Z

Hi, I am also having problems with this, also serverless-chrome with AWS Lambda.

In my case, it looks like it does not have anything to do with the browser cleanup process. It looks like it is being caused by something that happens during Puppeteer launch.

Running ps alx immediately after browser launch gives me this:

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 487 1 0 20 0 1488300 271872 ep_pol Ssl ? 0:34 /var/lang/bin/node --expose-gc --max-semi-space-size=102 --max-old-space-size=1843 /var/runtime/node_modules/awslambda/index.js
1 487 13 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 59 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 107 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 153 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 203 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
0 487 246 1 20 0 1074400 67740 ep_pol Ssl ? 0:00 ./chrome/headless-chromium --disable-background-networking --disable-background-timer-throttling --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=basic --use-mock-keychain --remote-debugging-port=0 --user-data-dir=/tmp/puppeteer_dev_profile-yQiw0t --headless --disable-gpu --hide-scrollbars --mute-audio --no-sandbox --disable-setuid-sandbox --disable-dev-shm-usage --single-process --disable-gpu --no-zygote --user-agent=REDACTED
1 487 248 246 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
0 487 280 1 20 0 115096 1588 - R ? 0:00 ps -alx

See process 248 which is already defunct at this point.

And then after the browser closes:

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 487 1 0 20 0 1486912 282588 ep_pol Ssl ? 0:41 /var/lang/bin/node --expose-gc --max-semi-space-size=102 --max-old-space-size=1843 /var/runtime/node_modules/awslambda/index.js
1 487 13 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 59 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 107 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 153 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 203 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 248 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
0 487 292 1 20 0 115092 1588 - R ? 0:00 ps -alx

Look at process with pid 248 which now has ppid 1.

Is this even a Puppeteer bug?

webyneter · 2018-06-16T20:02:39Z

I have puppeteer running in a Docker container both locally and in a Heroku dyno.
Over time, a growing number of chrome processes remain Sleeping which, once again, holds true both for local and Heroku setup apart from the fact that Heroku dynos, unlike my local containers are subject to Process/thread limits (in my case it's 256). After a while, the limit gets reached and the following error is raised:

2018-06-16T16:25:16.463094+00:00 app[web.1]: Error: spawn /app/node_modules/puppeteer/.local-chromium/linux-564778/chrome-linux/chrome EAGAIN
2018-06-16T16:25:16.463108+00:00 app[web.1]: at _errnoException (util.js:1022:11)
2018-06-16T16:25:16.463110+00:00 app[web.1]: at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
2018-06-16T16:25:16.463111+00:00 app[web.1]: at onErrorNT (internal/child_process.js:372:16)
2018-06-16T16:25:16.463112+00:00 app[web.1]: at _combinedTickCallback (internal/process/next_tick.js:138:11)
2018-06-16T16:25:16.463114+00:00 app[web.1]: at process._tickDomainCallback (internal/process/next_tick.js:218:9)
2018-06-16T16:25:16.495669+00:00 app[web.1]: npm ERR! code ELIFECYCLE
2018-06-16T16:25:16.495980+00:00 app[web.1]: npm ERR! errno 1
2018-06-16T16:25:16.497099+00:00 app[web.1]: npm ERR! irs.gov-form_filling@1.0.0 start: `node app.js`
2018-06-16T16:25:16.497182+00:00 app[web.1]: npm ERR! Exit status 1
2018-06-16T16:25:16.497371+00:00 app[web.1]: npm ERR!
2018-06-16T16:25:16.497499+00:00 app[web.1]: npm ERR! Failed at the irs.gov-form_filling@1.0.0 start script.
2018-06-16T16:25:16.497621+00:00 app[web.1]: npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
2018-06-16T16:25:16.502935+00:00 app[web.1]: 
2018-06-16T16:25:16.503086+00:00 app[web.1]: npm ERR! A complete log of this run can be found in:
2018-06-16T16:25:16.503166+00:00 app[web.1]: npm ERR!     /app/.npm/_logs/2018-06-16T16_25_16_498Z-debug.log

How should I handle that?

Multiply · 2018-06-16T20:07:29Z

@webyneter We're using an alpine-based image. I just installed tini, and added "/sbin/tini", "--" in front of our normal entrypoint/command.

webyneter · 2018-06-16T20:13:31Z

@Multiply you mean, you did

ENTRYPOINT ["/sbin/tini", "--"]

?

webyneter · 2018-06-16T20:15:13Z

@Multiply got that, will try Alpine image puppeteer guide with tini.

bahattincinic · 2018-06-16T23:00:33Z

@webyneter i previously explained my solution. But you can use my example implementation

https://github.com/bahattincinic/puppeteer-docker-example

According to puppetter documentation, it suggests yelp/dump-init.
https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md#running-puppeteer-in-docker

Also you can check https://github.com/ebidel/try-puppeteer this project. The repo creator is a core developer of puppetter.

bahattincinic · 2018-06-16T23:04:39Z

@jdiamond did you try #1825 (comment) it ?

webyneter · 2018-06-17T06:57:58Z

@bahattincinic thanks, I already implemented mine based on your suggestion to employ Alpine and tini. Although I followed the official guide, including the Tips section where it's said to --disable-dev-shm-usage Heroku gave me another surprise: "Less than 64MB of free space in temporary directory for shared memory files". Not sure yet how to deal with it, and if that's even possible from a user's end...

webyneter · 2018-06-17T07:40:42Z

Found the Increased size of /dev/shm on the Common Runtime official dev article.

Multiply · 2018-06-17T08:19:16Z

You don't have to use alpine to use tini, or dumbinit, it's just what we're using quite successfully so far.

webyneter · 2018-06-17T08:22:14Z

@Multiply I always strive to keep my image size down to minimum, but in my specific case I've just found out I can't use Alpine because "The latest version of puppeteer is not supported by the latest version of chrome alpine supports. Sorry, it's just the way it is right now." whislt we use "puppeteer": "^1.3.0". Switching back to node:8.11-slim now.

webyneter · 2018-06-17T09:17:45Z

I'm having a growing number of sleeping chrome process which is a real blocker for me since Heroku limits the number of processes considerably on lower tier plans.
@Multiply, in your comment above you mentioned you were rm -r /tmp/core.* || true, however there's none like these in my node container:

pptruser@21dcf0ed8fc3:/app$ ls -lAh /tmp/
total 36K
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:09 puppeteer_dev_profile-46TBCn
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:10 puppeteer_dev_profile-EW50RG
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:10 puppeteer_dev_profile-KPAMX2
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:10 puppeteer_dev_profile-LsBKzm
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:09 puppeteer_dev_profile-aD98EK
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:10 puppeteer_dev_profile-fqBvuu
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:10 puppeteer_dev_profile-qo6kj3
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:09 puppeteer_dev_profile-uVomwy
drwx------ 2 root     root     4.0K Jun 17 08:44 tmp.NvsynR0wqx

Besides, I haven't found any samples of waitpid implementation in the example repo of yours.

bryanlarsen · 2018-07-04T18:10:20Z

rather than using tini or dumbinit to reap zombies, you can pass the --init flag to docker or the --docker-disable-shared-pid=false flag to Kubernetes' kubelet and make sure that kubelet is configured to use pause:3.1.

bryanlarsen · 2018-07-05T18:25:41Z

More information I've discovered:

The zombies are created using a "double fork", so they have a parent PID of 1. So waitpid won't work unless your node process has a PID of 1. That's certainly possible in a container environment, though.

So @webyneter, you either use an init or waitpid.

ratkorle · 2018-10-17T14:16:42Z

Hi guys. I am having Lambda function with puppeteer. It runs perfectly returns what I expect but never exits and I got timedOut. The only thing I do not use on the puppeteer side is

await browser.close()
due to prevent of opening 100 browsers and pages.
Do you have any idea

this is the log I am receiving:

14:07:03
START RequestId: 6df967f7-787a-5cda-9e10-33b63c469bd2 Version: $LATEST

14:07:03
2018-10-17T14:07:03.877Z 6df967f7-787a-5cda-9e10-33b63c469bd2 getting browser

14:07:03
2018-10-17T14:07:03.885Z 6df967f7-787a-5cda-9e10-33b63c469bd2 setup local chrome

14:07:05
2018-10-17T14:07:05.376Z 6df967f7-787a-5cda-9e10-33b63c469bd2 setup done

14:07:06
2018-10-17T14:07:06.149Z 6df967f7-787a-5cda-9e10-33b63c469bd2 Launch chrome: HeadlessChrome/67.0.3361.0

14:07:06
2018-10-17T14:07:06.149Z 6df967f7-787a-5cda-9e10-33b63c469bd2 opening new page

14:07:06
2018-10-17T14:07:06.967Z 6df967f7-787a-5cda-9e10-33b63c469bd2 page opened

14:07:07
2018-10-17T14:07:07.069Z 6df967f7-787a-5cda-9e10-33b63c469bd2 viewport set

14:07:07
2018-10-17T14:07:07.170Z 6df967f7-787a-5cda-9e10-33b63c469bd2 userAgent set

14:07:07
2018-10-17T14:07:07.171Z 6df967f7-787a-5cda-9e10-33b63c469bd2 getting action from commandArray

14:07:07
2018-10-17T14:07:07.171Z 6df967f7-787a-5cda-9e10-33b63c469bd2 case connect

14:07:07
2018-10-17T14:07:07.171Z 6df967f7-787a-5cda-9e10-33b63c469bd2 connecting to: https://www.google.com/

14:07:07
2018-10-17T14:07:07.541Z 6df967f7-787a-5cda-9e10-33b63c469bd2 succesfully connected to https://www.google.com/

14:07:08
2018-10-17T14:07:08.542Z 6df967f7-787a-5cda-9e10-33b63c469bd2 case connect passed

14:07:08
2018-10-17T14:07:08.542Z 6df967f7-787a-5cda-9e10-33b63c469bd2

14:07:08
2018-10-17T14:07:08.542Z 6df967f7-787a-5cda-9e10-33b63c469bd2 {}

14:07:08
2018-10-17T14:07:08.543Z 6df967f7-787a-5cda-9e10-33b63c469bd2 puppeteer done

14:07:08
2018-10-17T14:07:08.543Z 6df967f7-787a-5cda-9e10-33b63c469bd2 promise finished

14:12:03
END RequestId: 6df967f7-787a-5cda-9e10-33b63c469bd2

14:12:03
REPORT RequestId: 6df967f7-787a-5cda-9e10-33b63c469bd2 Duration: 300100.24 ms Billed Duration: 300000 ms Memory Size: 3008 MB Max Memory Used: 354 MB

14:12:03
2018-10-17T14:12:03.975Z 6df967f7-787a-5cda-9e10-33b63c469bd2 Task timed out after 300.10 seconds

exports.handler = function (event, context, callback) {
const promises = [];
const records = event["Records"];
for (let record of records) {
const message = JSON.parse(record.body);
const promise = scrapper.parseEngine(message.commands, null, null, null);
promises.push(promise);
}
Promise.all(promises).then((data) => {
console.log('promise finished');
callback(null, data);
}).catch((err) => {
console.log('error', err);
callback(err);
});
};

natterstefan · 2018-10-27T13:33:14Z

Hi @bryanlarsen, hi @bahattincinic,

thanks for your helpful comments. I am pretty new to the Docker&Puppeteer topic. So please, excuse me when I ask (again):

you either use an init or waitpid.

What do you mean @bryanlarsen? Can I just use the solution proposed on puppeteer/troubleshooting? My current image looks similar to:

FROM node:8.11.1-alpine

# ... stuff

# It's a good idea to use dumb-init to help prevent zombie chrome processes.
ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init
RUN chmod +x /usr/local/bin/dumb-init

# ... stuff
ENTRYPOINT ["dumb-init", "--"]

# start the app
CMD [ "node", "app/index.js"]

Or how can I (should I) use waitpid?

@bahattincinic I can't find waitpid in your example-repo (see comment #1825 (comment)). Am I missing something?

And you let

We delete /tmp folder after process completed (rm -r /tmp/core.* || true)

run from your node app after await browser.close();? Do you use rimraf for this?

Thanks to both of you for your time and help!

insanehong · 2018-11-13T01:19:48Z

Am try process lookup and kill process. how about this?

const puppeteer = require('puppeteer');
const ps = require('ps-node-promise-es6');
const _ = require('lodash');

async function getWebPageHtml(targetUrl) {
    const browser = await puppeteer.launch();
    const borwserPID =  browser._process.pid;
    const page = await browser.newPage();

    try {
      const response = await page.goto(targetUrl);
    } catch (error) {
      throw new Error(error);
    } finally {
      await page.close();
      await browser.close();
      const psLookup = await ps.lookup({ pid: borwserPID });

      for (let proc of psLookup) {
        if (_.has(proc, 'pid')) {
          await ps.kill(proc.pid, 'SIGKILL');
        }
      }
    }
  }

yale8848 · 2018-11-26T10:26:05Z

@insanehong i just test your code ,but it do not work well. it seem that borwserPID not find

bahattincinic · 2018-11-26T11:47:12Z

@insanehong Can you try the following code?

const waitpid2 = require('waitpid2');
const child_process = require('child_process');

const cleanEnv = function() {
  console.log('cleanEnv:waitpid');
  while (waitpid2.waitpid(-1, 0 | waitpid2.WNOHANG) == -1);
  child_process.exec('rm -r /tmp/core.* || true');
}

const screenCapture = function() {
    // do something.
}

exports.handler = function(event, context, callback) {
    cleanEnv();
    screenCapture();
}

marius080 · 2020-06-30T12:19:04Z

I've overcome these issues by adding the flags for chrome headless:

const chromeFlags = [
    '--headless',
    '--no-sandbox',
    "--disable-gpu",
    "--single-process",
    "--no-zygote"
]

I think the child processes are orphaned when the parent is killed and that leads to the zombies. With this, I only get one process and it works pretty well

Attempts to prevent leaking zombie chromium processes by running chromium with the --single-process and --no-zygote flags. Flag descriptions: https://kapeli.com/cheat_sheets/Chromium_Command_Line_Switches.docset/Contents/Resources/Documents/index GitHub issue where this approach was reported to be successful: puppeteer/puppeteer#1825 Chromium related docs (for background): https://www.chromium.org/developers/design-documents/multi-process-architecture https://chromium.googlesource.com/chromium/src.git/+/master/docs/linux/zygote.md N.B. If the approach is agreed upon and this patch merged, the change will also need to be made separately for production in operations/deployment-charts. Bug: T257679 Change-Id: I8420c4b9e967e098f279fc26e0128e6169a204c6

zdm · 2020-09-02T08:52:31Z

@marius080 Even with your settings zombies processes are left after browser.close().

Kikobeats · 2020-09-02T12:51:53Z

what worked fine to me was to destroy the PID and subprocesses associated:
https://github.com/microlinkhq/browserless/blob/master/packages/browserless/src/driver.js#L66

puppeteer/puppeteer#1825

g00dnatur3 · 2021-01-12T00:27:38Z

This is is the code i created to fix it, tested it, no more dangling chromes :)

  const browser = await puppeteer.launch({args: ['--no-sandbox']});
  const process = browser.process()
  const killBrowser = (retries=5) => {
    if (retries === 0) {
      return // exit condition
    }
    if (process && process.pid && process.kill && !process.killed) {
      setTimeout(() => {
        console.log(`BROWSER Process Id: ${process.pid}, KILLING IT! retries:`, retries);
        if (!process.kill('SIGKILL')) {
          retries--
          killBrowser(retries)
        }
      }, 200);
    }
  }

just call killBrowser() built in retries

zdm · 2021-01-12T06:54:05Z

If you run it under docker you need to use docker --init option.

moises138 · 2021-01-13T01:12:35Z

Er El lun., 11 de enero de 2021 6:28 p. m., g00dnatur3 < notifications@github.com> escribió:

…

This is is the code i created to fix it, tested it, no more dangling chromes :) const browser = await puppeteer.launch({args: ['--no-sandbox']}); const process = browser.process() const killBrowser = (retries=5) => { if (process && process.pid && process.kill && !process.killed) { setTimeout(() => { console.log(`BROWSER Process Id: ${process.pid}, KILLING IT! retries:`, retries); if (!process.kill('SIGKILL')) { retries-- killBrowser(retries) } }, 200); } } — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1825 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARXQAM3WCJH4IIPK7JOOJ4LSZOJTXANCNFSM4EME5SAA> .

voxsoftware · 2021-01-14T20:32:03Z

browser.on('disconnected', () => {
    console.log('sleeping 100ms'); //  sleep to eliminate race condition  
    setTimeout(function(){
    console.log(`Browser Disconnected... Process Id: ${process}`);
    child_process.exec(`kill -9 ${process}`, (error, stdout, stderr) => {
        if (error) {
        console.log(`Process Kill Error: ${error}`)
        }
        console.log(`Process Kill Success. stdout: ${stdout} stderr:${stderr}`);
    });
}, 100);

Same problem on manjaro using headless. I will test your workaround

ndbroadbent · 2021-02-25T00:55:52Z

Hi everyone, just wanted to provide a quick warning about the --single-process flag. I have some integration tests that were broken after I started using this flag, because it broke the font rendering and kerning for the generated PDF. I found this Process Models page in the chromium docs:

Finally, for the purposes of comparison, Chromium supports a single process model that can be enabled using the --single-process command-line switch. In this model, both the browser and rendering engine are run within a single OS process.

The single process model provides a baseline for measuring any overhead that the multi-process architectures impose. It is not a safe or robust architecture, as any renderer crash will cause the loss of the entire browser process. It is designed for testing and development purposes, and it may contain bugs that are not present in the other architectures.

I can confirm that the single process model does cause a rendering bug (at least on Chromium 88.x), and that this rendering bug is not present when I remove the --single-process flag. So I would not recommend using this flag, since the docs say that it's only designed for testing and development purposes, and it shouldn't really be used in production.

pitiboy · 2021-03-08T15:07:58Z

I'm also using puppeteer in docker, and I had also tried the
puppeteer.launch({ args: ['--no-sandbox', '--no-zygote'] });
but that did not help.

Eventually, I figured out that the init:true flag solves the orphaned zombie process problem, which can be used woth docker-compose, according to docker documentation: https://docs.docker.com/compose/compose-file/compose-file-v3/#init
(@bryanlarsen and @zdm also mentioned the --init flag for docker, and I also gained inspiration from the great blog to
understand it more https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ )

puppeteer/puppeteer#1825 (comment)

eldoy · 2022-08-08T10:12:52Z

This works for me on Debian 10, just kills the process group based on the browser pid:

const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()

try {
  await page.goto(url, { waitUntil: 'networkidle2', timeout: 10000 })
  await page.screenshot({ path })
  await page.close()
  await browser.close()
} catch (e) {
  console.error(e)
} finally {
  const pid = -browser.process().pid
  try {
    process.kill(pid, 'SIGKILL')
  } catch (e) {}
}

devopsmash · 2022-08-09T14:55:37Z

Here is what happens while the puppeteer is working:

Before sending request - all processes are empty:

STAT PID   PPID  COMMAND          COMMAND
S        1     0 dumb-init        /usr/local/bin/dumb-init -- node server.js
S        7     1 node             node server.js
S      135     0 sh               sh

After sending a request to render the web page - the puppeteer spawn 7 chromium processes

STAT PID   PPID  COMMAND          COMMAND
S        1     0 dumb-init        /usr/local/bin/dumb-init -- node server.js
S        7     1 node             node server.js
S      135     0 sh               sh
S      630     7 chrome           /usr/lib/chromium/chrome --extra-plugin-dir=/usr/lib/nsbrowser/plugins --allow-pre-commit-input --disable-background-networking --enable-features=NetworkServiceInProces
S      634   630 chrome           /usr/lib/chromium/chrome --type=zygote --no-zygote-sandbox --no-sandbox --headless --headless
S      635   630 chrome           /usr/lib/chromium/chrome --type=zygote --no-sandbox --headless --headless
S      655   630 chrome           /usr/lib/chromium/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=none --no-sandbox --disable-dev-shm-usage --
S      659   635 chrome           /usr/lib/chromium/chrome --type=renderer --headless --lang=en-US --no-sandbox --disable-dev-shm-usage --disable-background-timer-throttling --disable-breakpad --enable-
S      698   634 chrome           /usr/lib/chromium/chrome --type=gpu-process --no-sandbox --disable-dev-shm-usage --disable-breakpad --headless --ozone-platform=headless --use-angle=swiftshader-webgl -
R      710   635 chrome           /usr/lib/chromium/chrome --type=renderer --headless --lang=en-US --no-sandbox --disable-dev-shm-usage --disable-background-timer-throttling --disable-breakpad --enable-

5 seconds after sending the request - you can see 3 zombie processes

S        1     0 dumb-init        /usr/local/bin/dumb-init -- node server.js
S        7     1 node             node server.js
S      135     0 sh               sh
S      630     7 chrome           /usr/lib/chromium/chrome --extra-plugin-dir=/usr/lib/nsbrowser/plugins --allow-pre-commit-input --disable-background-networking --enable-features=NetworkServiceInProces
S      634   630 chrome           /usr/lib/chromium/chrome --type=zygote --no-zygote-sandbox --no-sandbox --headless --headless
S      635   630 chrome           /usr/lib/chromium/chrome --type=zygote --no-sandbox --headless --headless
Z      655   630 chrome           [chrome]
Z      659   635 chrome           [chrome]
S      698   634 chrome           /usr/lib/chromium/chrome --type=gpu-process --no-sandbox --disable-dev-shm-usage --disable-breakpad --headless --ozone-platform=headless --use-angle=swiftshader-webgl -
Z      710   635 chrome           [chrome]

10 seconds after sending the request - everything back to normal thanks to dumb-init

STAT PID   PPID  COMMAND          COMMAND
S        1     0 dumb-init        /usr/local/bin/dumb-init -- node server.js
S        7     1 node             node server.js
S      135     0 sh               sh

My question is why processes 655,659,710 became zombie process ? what in the flow is not closing them right?
I prefer to avoid using 3rd party tool like dumb-init to overcome this issue

I'm using the Docker image node:16.14.2-alpine3.15 on AWS EKS 1.22
Command to list processes: ps -A -ostat,pid,ppid,comm,args

It seems that using the arg --single-process can solve this, because it will use only 1 process instead of 7,but due to some issues that mentioned above with the arg --single-process I prefer not to use that

BoD · 2022-10-14T10:09:12Z

Sorry for the noise, I just wanted to confirm in case it can help somebody: in my case simply adding --init to the docker command did work indeed.

heaven · 2023-11-14T09:47:34Z

Nothing of this helps when the browser crashes, which happens quite often after we upgraded to Node 18 and newer Puppeteer.

The child process isn't properly detached from the parent I guess, and thus will stay a zombie until the parent is finished.

rajeshpal53 · 2024-03-02T21:05:12Z

I have thoroughly reviewed the documentation and exhausted all available solutions in an attempt to resolve the zombie process issue. Despite my efforts, the problem persisted. I attempted to terminate process IDs, but within the pods, the zombie processes remained resilient. Devoting several consecutive days to diligently updating every package eventually proved successful. The issue was ultimately resolved by making key adjustments: switching the operating system from Node Alpine to Node Slim Linux and transitioning from Chromium to Chrome as the browser. The specific changes implemented to rectify the problem are outlined below.

If you are working with Puppeteer and encountering zombie process issues, consider employing the following Docker commands. These commands have proven effective in preventing the creation of zombie processes.

FROM node:18-slim
RUN apt-get update
RUN apt-get upgrade

RUN apt-get update && apt-get install curl gnupg -y
&& curl --location --silent dl-ssl.google.com/linux/linux_sign... | apt-key add -
&& sh -c 'echo "deb [arch=amd64] dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
&& apt-get update
&& apt-get install google-chrome-stable -y --no-install-recommends
&& rm -rf /var/lib/apt/lists/*

RUN apt-get update &&
apt-get upgrade && apt-get install -y vim

ADD ./puppetron.tar /usr/share/
WORKDIR /usr/share/puppetron

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV SERVICE_PATH=/usr/share/puppetron

CMD node main.js;

Path of browser change to
executablePath: '/usr/bin/google-chrome',

v-dev-cl · 2024-05-01T20:40:21Z

I've overcome these issues by adding the flags for chrome headless:
const chromeFlags = [
    '--headless',
    '--no-sandbox',
    "--disable-gpu",
    "--single-process",
    "--no-zygote"
]
I think the child processes are orphaned when the parent is killed and that leads to the zombies. With this, I only get one process and it works pretty well

adding "--headless" worked for me, i had a lot of chrome_crashpad that were created per each request even after closing without errors the page and browser.

bahattincinic mentioned this issue Jan 17, 2018

Properly handle target crashes #1454

Open

smithclay mentioned this issue Mar 14, 2018

zombie process issue smithclay/lambdium#13

Open

alixaxel mentioned this issue Mar 27, 2018

forceKillChrome() creates Core Dump Files #2264

Closed

pmdevers mentioned this issue Jun 30, 2020

Use Page.DisposeAsync and Browser.DisposeAsync hang forever hardkoded/puppeteer-sharp#1489

Closed

andrewald pushed a commit to MobiMedia/fury that referenced this issue Sep 10, 2020

Chrome Zombie Process Bugfix

c8fe35e

puppeteer/puppeteer#1825

abeloin mentioned this issue Nov 21, 2020

Add dumb-init to prevent zombie process NoahCardoza/CloudProxy#38

Merged

alexellis mentioned this issue Feb 12, 2021

Question about Zombie processes buildkite/docker-puppeteer#142

Closed

baron-chiu mentioned this issue Sep 11, 2021

Chrome process isn't closed and became zombie kwhsiung/mm-punch#22

Open

WilliamNurmi mentioned this issue Oct 4, 2021

Zombie chromium processes zenato/puppeteer-renderer#50

Closed

xiaoxiaojx mentioned this issue Dec 28, 2021

Docker 中 15k 僵尸进程残留案发现场还原 xiaoxiaojx/blog#24

Open

petrprikryl added a commit to olc-systems-sro/puppeteer-pdf that referenced this issue Mar 22, 2022

try fix orphan

5531e71

puppeteer/puppeteer#1825 (comment)

samreid mentioned this issue Jul 7, 2022

CT: Error: Failed to launch the browser process! phetsims/aqua#150

Closed

YoungiiJC mentioned this issue Aug 10, 2022

AWS Lambda memory increases each consecutive warm start invocation go-rod/rod#681

Closed

submarcos mentioned this issue Oct 4, 2022

[Info] Fuite de mémoire avec le process screamshotter GeotrekCE/Geotrek-admin#3214

Open

felixkosmalla mentioned this issue Apr 3, 2023

When spawning singlefile from within a docker container, it will leave defunct chrome processes behind. gildas-lormeau/single-file-cli#26

Open

acuifex mentioned this issue Sep 2, 2023

Add a null check for ConfigDebugger creation openbullet/OpenBullet2#926

Open

pirate mentioned this issue Mar 18, 2024

New headless chrome randomly hangs at the end cypress-io/cypress#27264

Open

Zombie Process problem. #1825

Zombie Process problem. #1825

Comments

bahattincinic commented Jan 17, 2018 • edited

Use Case:

Problem

How did we fix it? (very hacky method)

aslushnikov commented Jan 17, 2018

bahattincinic commented Jan 18, 2018

paambaati commented Jan 19, 2018

grantstephens commented Feb 16, 2018

leobudima commented Mar 11, 2018 • edited

bahattincinic commented Mar 12, 2018 • edited

leobudima commented Mar 13, 2018

jdiamond commented Mar 26, 2018

yoongkang commented May 16, 2018 • edited

webyneter commented Jun 16, 2018 • edited

Multiply commented Jun 16, 2018

webyneter commented Jun 16, 2018

webyneter commented Jun 16, 2018

bahattincinic commented Jun 16, 2018

bahattincinic commented Jun 16, 2018

webyneter commented Jun 17, 2018 • edited

webyneter commented Jun 17, 2018 • edited

Multiply commented Jun 17, 2018

webyneter commented Jun 17, 2018

webyneter commented Jun 17, 2018 • edited

bryanlarsen commented Jul 4, 2018 • edited

bryanlarsen commented Jul 5, 2018

ratkorle commented Oct 17, 2018 • edited

natterstefan commented Oct 27, 2018 • edited

insanehong commented Nov 13, 2018 • edited

yale8848 commented Nov 26, 2018

bahattincinic commented Nov 26, 2018

marius080 commented Jun 30, 2020

zdm commented Sep 2, 2020

Kikobeats commented Sep 2, 2020

g00dnatur3 commented Jan 12, 2021 • edited

zdm commented Jan 12, 2021

moises138 commented Jan 13, 2021 via email

voxsoftware commented Jan 14, 2021

ndbroadbent commented Feb 25, 2021 • edited

pitiboy commented Mar 8, 2021 • edited

eldoy commented Aug 8, 2022

devopsmash commented Aug 9, 2022

BoD commented Oct 14, 2022

heaven commented Nov 14, 2023 • edited

rajeshpal53 commented Mar 2, 2024

v-dev-cl commented May 1, 2024

bahattincinic commented Jan 17, 2018 •

edited

leobudima commented Mar 11, 2018 •

edited

bahattincinic commented Mar 12, 2018 •

edited

yoongkang commented May 16, 2018 •

edited

webyneter commented Jun 16, 2018 •

edited

webyneter commented Jun 17, 2018 •

edited

webyneter commented Jun 17, 2018 •

edited

webyneter commented Jun 17, 2018 •

edited

bryanlarsen commented Jul 4, 2018 •

edited

ratkorle commented Oct 17, 2018 •

edited

natterstefan commented Oct 27, 2018 •

edited

insanehong commented Nov 13, 2018 •

edited

g00dnatur3 commented Jan 12, 2021 •

edited

ndbroadbent commented Feb 25, 2021 •

edited

pitiboy commented Mar 8, 2021 •

edited

heaven commented Nov 14, 2023 •

edited