Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombie Process problem. #1825

Closed
bahattincinic opened this issue Jan 17, 2018 · 54 comments · Fixed by #6011
Closed

Zombie Process problem. #1825

bahattincinic opened this issue Jan 17, 2018 · 54 comments · Fixed by #6011
Labels
chromium Issues with Puppeteer-Chromium

Comments

@bahattincinic
Copy link

bahattincinic commented Jan 17, 2018

Hello,

Recently we talked about this problem in the issues #1823 and #1791.

Environment:

Use Case:

We are using puppeteer on AWS Lambda. We take a screenshot of given HTML template and upload it to S3 and use this image for future requests
It handles over 100 million requests each month. That's why every process should be atomic and immutable. (AWS Lambda has a disk and process limit.)

Example Code:

const browser = await puppeteer.launch({
  args: ['--disable-gpu', '--no-sandbox', '--single-process', 
             '--disable-web-security', '--disable-dev-profile']
});
const page = await browser.newPage();
await page.goto('https://s3bucket.com/markup/a.html');
const response = await page.screenshot({{ type: 'jpeg', quality: 95 }});
browser.close();

Problem

When we are using example code, we got disk error from AWS Lambda.

Example /tmp folder:

2018-01-12T14:55:38.553Z    a6ef3454-f7a8-11e7-be0f-17f405d5a180    start stdout: total 226084
drwx------ 3 sbx_user1067 479 4096 Jan 12 14:55 .
drwxr-xr-x 21 root root 4096 Jan 12 10:53 ..
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:33 core.headless-chromi.129
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:15 core.headless-chromi.131
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:49 core.headless-chromi.135
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:52 core.headless-chromi.137
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:50 core.headless-chromi.138
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:51 core.headless-chromi.14
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:49 core.headless-chromi.15
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:36 core.headless-chromi.169
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:15 core.headless-chromi.174
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:52 core.headless-chromi.178
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:50 core.headless-chromi.180
drwx------ 3 sbx_user1067 479 4096 Jan 12 14:14 .pki

When we investigated these files, we understood that it is a core dump. We removed these files after the process completed.

When we monitored process list, we saw zombie processes Zombie chrome processes have been growing increasingly. We can't kill them. AWS Lambda has a maximum process limit. (max 1024 process) That's why we reach the lambda limits.

483 1 3.3 1.6 1226196 65408 ? Ssl 22:07 0:05 /var/lang/bin/node --max-old-space-size=870 --max-semi-space-size=54 --max-executable-size=109 --expose-gc /var/runtime/node_modules/awslambda/index.js
483 22 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 73 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 119 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 166 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 214 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 262 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 307 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 353 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 1915 0.0 0.0 0 0 ? Z 22:09 0:00 [sh] <defunct>

We couldn't use dump-init on lambda. Because lambda already has an init system.

How did we fix it? (very hacky method)

We used browser.disconnect() instead of browser.close(). We manualy managed chrome processes such as kill.

Example Code:

browser.on('disconnected', () => {
    console.log('sleeping 100ms'); //  sleep to eliminate race condition  
    setTimeout(function(){
    console.log(`Browser Disconnected... Process Id: ${process}`);
    child_process.exec(`kill -9 ${process}`, (error, stdout, stderr) => {
        if (error) {
        console.log(`Process Kill Error: ${error}`)
        }
        console.log(`Process Kill Success. stdout: ${stdout} stderr:${stderr}`);
    });
}, 100);

Firstly we didn't use this method. We only killed the process after browser disconnect. We got the following error:

Error: read ECONNRESET at exports._errnoException (util.js:1018:11) at TCP.onread (net.js:568:26)

I think it looks like a puppeteer process management problem. When we used this method, we didn't receive any puppeteer related errors. How can we fix it?

Thanks.

@aslushnikov
Copy link
Contributor

@bahattincinic thanks for filing this.

So if I understand correctly, your workaround with disconnect-and-kill works? This should hint us into what's going on.

@bahattincinic
Copy link
Author

Yes. We disconnected browser and killed process.

@paambaati
Copy link

@bahattincinic @aslushnikov I've briefly touched upon this here; killing the Chromium process aggressively on complete/timeout/errors helped us greatly as well.

@grantstephens
Copy link

I found that doing await page.goto('about:blank') helps in terms of reducing cpu and memory usage, even if reusing the tabs setting to about:blank between shots somehow seems to keep cpu and memory under control.

@leobudima
Copy link

leobudima commented Mar 11, 2018

@bahattincinic - thanks, I've tried your method of disconnecting + killing the process, and while it does kill the "main" process returned by puppeteer.launch(), each run seems to leave another defunct zombie with a PID that is different than the killed one...

What's worse, when I run ps aux right after puppeteer.launch(), aside from the "main" process, there is already one that's defunct, right away, before running code or trying to kill anything.

I've tried sending a kill -15, hoping that will allow the main process to clean up its children, but -15 or -9 doesn't make any difference, so I'm still stuck with an ever-growing list of zombies and rising memory...

Do you have any advice on how you managed to keep it clean of those as well (if you had a similar experience)? I'm also running on Lambda, same args used, puppeteer 1.1.1. Thanks!

@bahattincinic
Copy link
Author

bahattincinic commented Mar 12, 2018

@leobudima

We are doing following methods to avoid zombie process.

  • We used browser.close(); instead of killing the process.
  • We are using waitpid (https://www.npmjs.com/package/waitpid) (while (waitpid2.waitpid(-1, 0 | waitpid2.WNOHANG) == -1))
  • We delete /tmp folder after process completed (rm -r /tmp/core.* || true)

if your project doesn't depend on AWS Lambda, you can use my example project. https://github.com/bahattincinic/puppeteer-docker-example

@leobudima
Copy link

@bahattincinic - thanks a lot for providing details - waitpid is an interesting approach and I'll definitely try with cleaning /tmp, hopefully that helps! If I don't manage to make it run reliably on Lambda, I'm going to have to try with docker - thanks for linking the example!

@jdiamond
Copy link

@bahattincinic How does stopping the parent process with waitpid help with reaping zombie processes?

@yoongkang
Copy link

yoongkang commented May 16, 2018

Hi, I am also having problems with this, also serverless-chrome with AWS Lambda.

In my case, it looks like it does not have anything to do with the browser cleanup process. It looks like it is being caused by something that happens during Puppeteer launch.

Running ps alx immediately after browser launch gives me this:

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 487 1 0 20 0 1488300 271872 ep_pol Ssl ? 0:34 /var/lang/bin/node --expose-gc --max-semi-space-size=102 --max-old-space-size=1843 /var/runtime/node_modules/awslambda/index.js
1 487 13 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 59 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 107 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 153 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 203 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
0 487 246 1 20 0 1074400 67740 ep_pol Ssl ? 0:00 ./chrome/headless-chromium --disable-background-networking --disable-background-timer-throttling --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=basic --use-mock-keychain --remote-debugging-port=0 --user-data-dir=/tmp/puppeteer_dev_profile-yQiw0t --headless --disable-gpu --hide-scrollbars --mute-audio --no-sandbox --disable-setuid-sandbox --disable-dev-shm-usage --single-process --disable-gpu --no-zygote --user-agent=REDACTED
1 487 248 246 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
0 487 280 1 20 0 115096 1588 - R ? 0:00 ps -alx

See process 248 which is already defunct at this point.

And then after the browser closes:

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 487 1 0 20 0 1486912 282588 ep_pol Ssl ? 0:41 /var/lang/bin/node --expose-gc --max-semi-space-size=102 --max-old-space-size=1843 /var/runtime/node_modules/awslambda/index.js
1 487 13 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 59 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 107 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 153 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 203 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 248 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
0 487 292 1 20 0 115092 1588 - R ? 0:00 ps -alx

Look at process with pid 248 which now has ppid 1.

Is this even a Puppeteer bug?

@webyneter
Copy link

webyneter commented Jun 16, 2018

I have puppeteer running in a Docker container both locally and in a Heroku dyno.
Over time, a growing number of chrome processes remain Sleeping which, once again, holds true both for local and Heroku setup apart from the fact that Heroku dynos, unlike my local containers are subject to Process/thread limits (in my case it's 256). After a while, the limit gets reached and the following error is raised:

2018-06-16T16:25:16.463094+00:00 app[web.1]: Error: spawn /app/node_modules/puppeteer/.local-chromium/linux-564778/chrome-linux/chrome EAGAIN
2018-06-16T16:25:16.463108+00:00 app[web.1]: at _errnoException (util.js:1022:11)
2018-06-16T16:25:16.463110+00:00 app[web.1]: at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
2018-06-16T16:25:16.463111+00:00 app[web.1]: at onErrorNT (internal/child_process.js:372:16)
2018-06-16T16:25:16.463112+00:00 app[web.1]: at _combinedTickCallback (internal/process/next_tick.js:138:11)
2018-06-16T16:25:16.463114+00:00 app[web.1]: at process._tickDomainCallback (internal/process/next_tick.js:218:9)
2018-06-16T16:25:16.495669+00:00 app[web.1]: npm ERR! code ELIFECYCLE
2018-06-16T16:25:16.495980+00:00 app[web.1]: npm ERR! errno 1
2018-06-16T16:25:16.497099+00:00 app[web.1]: npm ERR! irs.gov-form_filling@1.0.0 start: `node app.js`
2018-06-16T16:25:16.497182+00:00 app[web.1]: npm ERR! Exit status 1
2018-06-16T16:25:16.497371+00:00 app[web.1]: npm ERR!
2018-06-16T16:25:16.497499+00:00 app[web.1]: npm ERR! Failed at the irs.gov-form_filling@1.0.0 start script.
2018-06-16T16:25:16.497621+00:00 app[web.1]: npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
2018-06-16T16:25:16.502935+00:00 app[web.1]: 
2018-06-16T16:25:16.503086+00:00 app[web.1]: npm ERR! A complete log of this run can be found in:
2018-06-16T16:25:16.503166+00:00 app[web.1]: npm ERR!     /app/.npm/_logs/2018-06-16T16_25_16_498Z-debug.log

How should I handle that?

@Multiply
Copy link

@webyneter We're using an alpine-based image. I just installed tini, and added "/sbin/tini", "--" in front of our normal entrypoint/command.

@webyneter
Copy link

@Multiply you mean, you did

ENTRYPOINT ["/sbin/tini", "--"]

?

@webyneter
Copy link

@Multiply got that, will try Alpine image puppeteer guide with tini.

@bahattincinic
Copy link
Author

@webyneter i previously explained my solution. But you can use my example implementation

https://github.com/bahattincinic/puppeteer-docker-example

According to puppetter documentation, it suggests yelp/dump-init.
https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md#running-puppeteer-in-docker

Also you can check https://github.com/ebidel/try-puppeteer this project. The repo creator is a core developer of puppetter.

@bahattincinic
Copy link
Author

@jdiamond did you try #1825 (comment) it ?

@webyneter
Copy link

webyneter commented Jun 17, 2018

@bahattincinic thanks, I already implemented mine based on your suggestion to employ Alpine and tini. Although I followed the official guide, including the Tips section where it's said to --disable-dev-shm-usage Heroku gave me another surprise: "Less than 64MB of free space in temporary directory for shared memory files". Not sure yet how to deal with it, and if that's even possible from a user's end...

@webyneter
Copy link

webyneter commented Jun 17, 2018

Found the Increased size of /dev/shm on the Common Runtime official dev article.

@Multiply
Copy link

You don't have to use alpine to use tini, or dumbinit, it's just what we're using quite successfully so far.

@webyneter
Copy link

@Multiply I always strive to keep my image size down to minimum, but in my specific case I've just found out I can't use Alpine because "The latest version of puppeteer is not supported by the latest version of chrome alpine supports. Sorry, it's just the way it is right now." whislt we use "puppeteer": "^1.3.0". Switching back to node:8.11-slim now.

@webyneter
Copy link

webyneter commented Jun 17, 2018

I'm having a growing number of sleeping chrome process which is a real blocker for me since Heroku limits the number of processes considerably on lower tier plans.
@Multiply, in your comment above you mentioned you were rm -r /tmp/core.* || true, however there's none like these in my node container:

pptruser@21dcf0ed8fc3:/app$ ls -lAh /tmp/
total 36K
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:09 puppeteer_dev_profile-46TBCn
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:10 puppeteer_dev_profile-EW50RG
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:10 puppeteer_dev_profile-KPAMX2
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:10 puppeteer_dev_profile-LsBKzm
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:09 puppeteer_dev_profile-aD98EK
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:10 puppeteer_dev_profile-fqBvuu
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:10 puppeteer_dev_profile-qo6kj3
drwx------ 3 pptruser pptruser 4.0K Jun 17 09:09 puppeteer_dev_profile-uVomwy
drwx------ 2 root     root     4.0K Jun 17 08:44 tmp.NvsynR0wqx

Besides, I haven't found any samples of waitpid implementation in the example repo of yours.

@bryanlarsen
Copy link

bryanlarsen commented Jul 4, 2018

rather than using tini or dumbinit to reap zombies, you can pass the --init flag to docker or the --docker-disable-shared-pid=false flag to Kubernetes' kubelet and make sure that kubelet is configured to use pause:3.1.

@bryanlarsen
Copy link

More information I've discovered:

The zombies are created using a "double fork", so they have a parent PID of 1. So waitpid won't work unless your node process has a PID of 1. That's certainly possible in a container environment, though.

So @webyneter, you either use an init or waitpid.

@ratkorle
Copy link

ratkorle commented Oct 17, 2018

Hi guys. I am having Lambda function with puppeteer. It runs perfectly returns what I expect but never exits and I got timedOut. The only thing I do not use on the puppeteer side is

await browser.close()
due to prevent of opening 100 browsers and pages.
Do you have any idea

this is the log I am receiving:

14:07:03
START RequestId: 6df967f7-787a-5cda-9e10-33b63c469bd2 Version: $LATEST

14:07:03
2018-10-17T14:07:03.877Z 6df967f7-787a-5cda-9e10-33b63c469bd2 getting browser

14:07:03
2018-10-17T14:07:03.885Z 6df967f7-787a-5cda-9e10-33b63c469bd2 setup local chrome

14:07:05
2018-10-17T14:07:05.376Z 6df967f7-787a-5cda-9e10-33b63c469bd2 setup done

14:07:06
2018-10-17T14:07:06.149Z 6df967f7-787a-5cda-9e10-33b63c469bd2 Launch chrome: HeadlessChrome/67.0.3361.0

14:07:06
2018-10-17T14:07:06.149Z 6df967f7-787a-5cda-9e10-33b63c469bd2 opening new page

14:07:06
2018-10-17T14:07:06.967Z 6df967f7-787a-5cda-9e10-33b63c469bd2 page opened

14:07:07
2018-10-17T14:07:07.069Z 6df967f7-787a-5cda-9e10-33b63c469bd2 viewport set

14:07:07
2018-10-17T14:07:07.170Z 6df967f7-787a-5cda-9e10-33b63c469bd2 userAgent set

14:07:07
2018-10-17T14:07:07.171Z 6df967f7-787a-5cda-9e10-33b63c469bd2 getting action from commandArray

14:07:07
2018-10-17T14:07:07.171Z 6df967f7-787a-5cda-9e10-33b63c469bd2 case connect

14:07:07
2018-10-17T14:07:07.171Z 6df967f7-787a-5cda-9e10-33b63c469bd2 connecting to: https://www.google.com/

14:07:07
2018-10-17T14:07:07.541Z 6df967f7-787a-5cda-9e10-33b63c469bd2 succesfully connected to https://www.google.com/

14:07:08
2018-10-17T14:07:08.542Z 6df967f7-787a-5cda-9e10-33b63c469bd2 case connect passed

14:07:08
2018-10-17T14:07:08.542Z 6df967f7-787a-5cda-9e10-33b63c469bd2

14:07:08
2018-10-17T14:07:08.542Z 6df967f7-787a-5cda-9e10-33b63c469bd2 {}

14:07:08
2018-10-17T14:07:08.543Z 6df967f7-787a-5cda-9e10-33b63c469bd2 puppeteer done

14:07:08
2018-10-17T14:07:08.543Z 6df967f7-787a-5cda-9e10-33b63c469bd2 promise finished

14:12:03
END RequestId: 6df967f7-787a-5cda-9e10-33b63c469bd2

14:12:03
REPORT RequestId: 6df967f7-787a-5cda-9e10-33b63c469bd2 Duration: 300100.24 ms Billed Duration: 300000 ms Memory Size: 3008 MB Max Memory Used: 354 MB

14:12:03
2018-10-17T14:12:03.975Z 6df967f7-787a-5cda-9e10-33b63c469bd2 Task timed out after 300.10 seconds

exports.handler = function (event, context, callback) {
const promises = [];
const records = event["Records"];
for (let record of records) {
const message = JSON.parse(record.body);
const promise = scrapper.parseEngine(message.commands, null, null, null);
promises.push(promise);
}
Promise.all(promises).then((data) => {
console.log('promise finished');
callback(null, data);
}).catch((err) => {
console.log('error', err);
callback(err);
});
};

@natterstefan
Copy link

natterstefan commented Oct 27, 2018

Hi @bryanlarsen, hi @bahattincinic,

thanks for your helpful comments. I am pretty new to the Docker&Puppeteer topic. So please, excuse me when I ask (again):

you either use an init or waitpid.

What do you mean @bryanlarsen? Can I just use the solution proposed on puppeteer/troubleshooting? My current image looks similar to:

FROM node:8.11.1-alpine

# ... stuff

# It's a good idea to use dumb-init to help prevent zombie chrome processes.
ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init
RUN chmod +x /usr/local/bin/dumb-init

# ... stuff
ENTRYPOINT ["dumb-init", "--"]

# start the app
CMD [ "node", "app/index.js"]

Or how can I (should I) use waitpid?

@bahattincinic I can't find waitpid in your example-repo (see comment #1825 (comment)). Am I missing something?

And you let

We delete /tmp folder after process completed (rm -r /tmp/core.* || true)

run from your node app after await browser.close();? Do you use rimraf for this?

Thanks to both of you for your time and help!

@insanehong
Copy link

insanehong commented Nov 13, 2018

Am try process lookup and kill process. how about this?

const puppeteer = require('puppeteer');
const ps = require('ps-node-promise-es6');
const _ = require('lodash');

async function getWebPageHtml(targetUrl) {
    const browser = await puppeteer.launch();
    const borwserPID =  browser._process.pid;
    const page = await browser.newPage();

    try {
      const response = await page.goto(targetUrl);
    } catch (error) {
      throw new Error(error);
    } finally {
      await page.close();
      await browser.close();
      const psLookup = await ps.lookup({ pid: borwserPID });

      for (let proc of psLookup) {
        if (_.has(proc, 'pid')) {
          await ps.kill(proc.pid, 'SIGKILL');
        }
      }
    }
  }

@yale8848
Copy link

@insanehong i just test your code ,but it do not work well. it seem that borwserPID not find

@bahattincinic
Copy link
Author

@insanehong Can you try the following code?

const waitpid2 = require('waitpid2');
const child_process = require('child_process');

const cleanEnv = function() {
  console.log('cleanEnv:waitpid');
  while (waitpid2.waitpid(-1, 0 | waitpid2.WNOHANG) == -1);
  child_process.exec('rm -r /tmp/core.* || true');
}

const screenCapture = function() {
    // do something.
}

exports.handler = function(event, context, callback) {
    cleanEnv();
    screenCapture();
}

@marius080
Copy link

I've overcome these issues by adding the flags for chrome headless:

const chromeFlags = [
    '--headless',
    '--no-sandbox',
    "--disable-gpu",
    "--single-process",
    "--no-zygote"
]

I think the child processes are orphaned when the parent is killed and that leads to the zombies. With this, I only get one process and it works pretty well

wmfgerrit pushed a commit to wikimedia/mediawiki-services-chromium-render that referenced this issue Jul 27, 2020
Attempts to prevent leaking zombie chromium processes by running
chromium with the --single-process and --no-zygote flags.

Flag descriptions:
https://kapeli.com/cheat_sheets/Chromium_Command_Line_Switches.docset/Contents/Resources/Documents/index

GitHub issue where this approach was reported to be successful:
puppeteer/puppeteer#1825

Chromium related docs (for background):
https://www.chromium.org/developers/design-documents/multi-process-architecture
https://chromium.googlesource.com/chromium/src.git/+/master/docs/linux/zygote.md

N.B. If the approach is agreed upon and this patch merged, the
change will also need to be made separately for production in
operations/deployment-charts.

Bug: T257679
Change-Id: I8420c4b9e967e098f279fc26e0128e6169a204c6
@zdm
Copy link

zdm commented Sep 2, 2020

@marius080 Even with your settings zombies processes are left after browser.close().

@Kikobeats
Copy link
Contributor

what worked fine to me was to destroy the PID and subprocesses associated:
https://github.com/microlinkhq/browserless/blob/master/packages/browserless/src/driver.js#L66

@g00dnatur3
Copy link

g00dnatur3 commented Jan 12, 2021

This is is the code i created to fix it, tested it, no more dangling chromes :)

  const browser = await puppeteer.launch({args: ['--no-sandbox']});
  const process = browser.process()
  const killBrowser = (retries=5) => {
    if (retries === 0) {
      return // exit condition
    }
    if (process && process.pid && process.kill && !process.killed) {
      setTimeout(() => {
        console.log(`BROWSER Process Id: ${process.pid}, KILLING IT! retries:`, retries);
        if (!process.kill('SIGKILL')) {
          retries--
          killBrowser(retries)
        }
      }, 200);
    }
  }

just call killBrowser() built in retries

@zdm
Copy link

zdm commented Jan 12, 2021

If you run it under docker you need to use docker --init option.

@moises138
Copy link

moises138 commented Jan 13, 2021 via email

@voxsoftware
Copy link

browser.on('disconnected', () => {
    console.log('sleeping 100ms'); //  sleep to eliminate race condition  
    setTimeout(function(){
    console.log(`Browser Disconnected... Process Id: ${process}`);
    child_process.exec(`kill -9 ${process}`, (error, stdout, stderr) => {
        if (error) {
        console.log(`Process Kill Error: ${error}`)
        }
        console.log(`Process Kill Success. stdout: ${stdout} stderr:${stderr}`);
    });
}, 100);

Same problem on manjaro using headless. I will test your workaround

@ndbroadbent
Copy link

ndbroadbent commented Feb 25, 2021

Hi everyone, just wanted to provide a quick warning about the --single-process flag. I have some integration tests that were broken after I started using this flag, because it broke the font rendering and kerning for the generated PDF. I found this Process Models page in the chromium docs:

Finally, for the purposes of comparison, Chromium supports a single process model that can be enabled using the --single-process command-line switch. In this model, both the browser and rendering engine are run within a single OS process.

The single process model provides a baseline for measuring any overhead that the multi-process architectures impose. It is not a safe or robust architecture, as any renderer crash will cause the loss of the entire browser process. It is designed for testing and development purposes, and it may contain bugs that are not present in the other architectures.

I can confirm that the single process model does cause a rendering bug (at least on Chromium 88.x), and that this rendering bug is not present when I remove the --single-process flag. So I would not recommend using this flag, since the docs say that it's only designed for testing and development purposes, and it shouldn't really be used in production.

@pitiboy
Copy link

pitiboy commented Mar 8, 2021

I'm also using puppeteer in docker, and I had also tried the
puppeteer.launch({ args: ['--no-sandbox', '--no-zygote'] });
but that did not help.

Eventually, I figured out that the init:true flag solves the orphaned zombie process problem, which can be used woth docker-compose, according to docker documentation: https://docs.docker.com/compose/compose-file/compose-file-v3/#init
(@bryanlarsen and @zdm also mentioned the --init flag for docker, and I also gained inspiration from the great blog to
understand it more https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ )

@eldoy
Copy link

eldoy commented Aug 8, 2022

This works for me on Debian 10, just kills the process group based on the browser pid:

const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()

try {
  await page.goto(url, { waitUntil: 'networkidle2', timeout: 10000 })
  await page.screenshot({ path })
  await page.close()
  await browser.close()
} catch (e) {
  console.error(e)
} finally {
  const pid = -browser.process().pid
  try {
    process.kill(pid, 'SIGKILL')
  } catch (e) {}
}

@devopsmash
Copy link

Here is what happens while the puppeteer is working:

  1. Before sending request - all processes are empty:
STAT PID   PPID  COMMAND          COMMAND
S        1     0 dumb-init        /usr/local/bin/dumb-init -- node server.js
S        7     1 node             node server.js
S      135     0 sh               sh
  1. After sending a request to render the web page - the puppeteer spawn 7 chromium processes
STAT PID   PPID  COMMAND          COMMAND
S        1     0 dumb-init        /usr/local/bin/dumb-init -- node server.js
S        7     1 node             node server.js
S      135     0 sh               sh
S      630     7 chrome           /usr/lib/chromium/chrome --extra-plugin-dir=/usr/lib/nsbrowser/plugins --allow-pre-commit-input --disable-background-networking --enable-features=NetworkServiceInProces
S      634   630 chrome           /usr/lib/chromium/chrome --type=zygote --no-zygote-sandbox --no-sandbox --headless --headless
S      635   630 chrome           /usr/lib/chromium/chrome --type=zygote --no-sandbox --headless --headless
S      655   630 chrome           /usr/lib/chromium/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=none --no-sandbox --disable-dev-shm-usage --
S      659   635 chrome           /usr/lib/chromium/chrome --type=renderer --headless --lang=en-US --no-sandbox --disable-dev-shm-usage --disable-background-timer-throttling --disable-breakpad --enable-
S      698   634 chrome           /usr/lib/chromium/chrome --type=gpu-process --no-sandbox --disable-dev-shm-usage --disable-breakpad --headless --ozone-platform=headless --use-angle=swiftshader-webgl -
R      710   635 chrome           /usr/lib/chromium/chrome --type=renderer --headless --lang=en-US --no-sandbox --disable-dev-shm-usage --disable-background-timer-throttling --disable-breakpad --enable-
  1. 5 seconds after sending the request - you can see 3 zombie processes
S        1     0 dumb-init        /usr/local/bin/dumb-init -- node server.js
S        7     1 node             node server.js
S      135     0 sh               sh
S      630     7 chrome           /usr/lib/chromium/chrome --extra-plugin-dir=/usr/lib/nsbrowser/plugins --allow-pre-commit-input --disable-background-networking --enable-features=NetworkServiceInProces
S      634   630 chrome           /usr/lib/chromium/chrome --type=zygote --no-zygote-sandbox --no-sandbox --headless --headless
S      635   630 chrome           /usr/lib/chromium/chrome --type=zygote --no-sandbox --headless --headless
Z      655   630 chrome           [chrome]
Z      659   635 chrome           [chrome]
S      698   634 chrome           /usr/lib/chromium/chrome --type=gpu-process --no-sandbox --disable-dev-shm-usage --disable-breakpad --headless --ozone-platform=headless --use-angle=swiftshader-webgl -
Z      710   635 chrome           [chrome]
  1. 10 seconds after sending the request - everything back to normal thanks to dumb-init
STAT PID   PPID  COMMAND          COMMAND
S        1     0 dumb-init        /usr/local/bin/dumb-init -- node server.js
S        7     1 node             node server.js
S      135     0 sh               sh

My question is why processes 655,659,710 became zombie process ? what in the flow is not closing them right?
I prefer to avoid using 3rd party tool like dumb-init to overcome this issue

I'm using the Docker image node:16.14.2-alpine3.15 on AWS EKS 1.22
Command to list processes: ps -A -ostat,pid,ppid,comm,args

It seems that using the arg --single-process can solve this, because it will use only 1 process instead of 7,but due to some issues that mentioned above with the arg --single-process I prefer not to use that

@BoD
Copy link

BoD commented Oct 14, 2022

Sorry for the noise, I just wanted to confirm in case it can help somebody: in my case simply adding --init to the docker command did work indeed.

@heaven
Copy link

heaven commented Nov 14, 2023

Nothing of this helps when the browser crashes, which happens quite often after we upgraded to Node 18 and newer Puppeteer.

The child process isn't properly detached from the parent I guess, and thus will stay a zombie until the parent is finished.

@rajeshpal53
Copy link

I have thoroughly reviewed the documentation and exhausted all available solutions in an attempt to resolve the zombie process issue. Despite my efforts, the problem persisted. I attempted to terminate process IDs, but within the pods, the zombie processes remained resilient. Devoting several consecutive days to diligently updating every package eventually proved successful. The issue was ultimately resolved by making key adjustments: switching the operating system from Node Alpine to Node Slim Linux and transitioning from Chromium to Chrome as the browser. The specific changes implemented to rectify the problem are outlined below.

If you are working with Puppeteer and encountering zombie process issues, consider employing the following Docker commands. These commands have proven effective in preventing the creation of zombie processes.

FROM node:18-slim
RUN apt-get update
RUN apt-get upgrade

RUN apt-get update && apt-get install curl gnupg -y
&& curl --location --silent dl-ssl.google.com/linux/linux_sign... | apt-key add -
&& sh -c 'echo "deb [arch=amd64] dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
&& apt-get update
&& apt-get install google-chrome-stable -y --no-install-recommends
&& rm -rf /var/lib/apt/lists/*

RUN apt-get update &&
apt-get upgrade && apt-get install -y vim

ADD ./puppetron.tar /usr/share/
WORKDIR /usr/share/puppetron

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV SERVICE_PATH=/usr/share/puppetron

CMD node main.js;


Path of browser change to
executablePath: '/usr/bin/google-chrome',

@v-dev-cl
Copy link

v-dev-cl commented May 1, 2024

I've overcome these issues by adding the flags for chrome headless:

const chromeFlags = [
    '--headless',
    '--no-sandbox',
    "--disable-gpu",
    "--single-process",
    "--no-zygote"
]

I think the child processes are orphaned when the parent is killed and that leads to the zombies. With this, I only get one process and it works pretty well

adding "--headless" worked for me, i had a lot of chrome_crashpad that were created per each request even after closing without errors the page and browser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chromium Issues with Puppeteer-Chromium
Projects
None yet
Development

Successfully merging a pull request may close this issue.