Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downtime when using reload and wait_ready #2478

Closed
bjrnt opened this issue Oct 28, 2016 · 9 comments
Closed

Downtime when using reload and wait_ready #2478

bjrnt opened this issue Oct 28, 2016 · 9 comments
Assignees

Comments

@bjrnt
Copy link

bjrnt commented Oct 28, 2016

Expected behaviour

When pm2 startOrReload ./ecosystem.json runs, pm2 starts new nodes then waits for them to come online, and then kills the old ones.

Actual behaviour

pm2 kills the old nodes right away, two new ones are started, but since there's a delay until they come online, there is downtime.

screen shot 2016-10-28 at 3 17 10 pm
(Taken from Apache JMeter, error % here shows that the server was unreachable for some time)

Steps to reproduce

app.js

const http = require('http');

const port = 3000;

const app = http.createServer((_, res) => {
  res.writeHead(200);
  res.end('Hello!');
});

console.log('Starting app...');

process.on('SIGINT', (msg) => {
  console.log('Just got SIGINTed, but I dont care');
});

setTimeout(() => {
  app.listen(port, () => {
    console.log(`Listening on ${port}`);
    if (process.send) {
      process.send('ready');
    }
  });
}, 10000);

ecosystem.json

{
  "apps": [{
    "name": "app",
    "script": "app.js",
    "instances": 2,
    "exec_mode": "cluster",
    "wait_ready": true
  }]
}

Run pm2 startOrReload ./ecosystem.json twice.

Software versions used

OS         : OS X
node.js    : 5.7.0
PM2        : pm2@next (includes the `wait_ready` PR)

PM2 Log output

PM2 Process 0 in a stopped status, starting it
PM2 Stopping app:app id:0
PM2 Process 1 in a stopped status, starting it
PM2 Stopping app:app id:1
PM2 App name:app id:1 disconnected
PM2 App [app] with id [1] and pid [33386], exited with code [0] via signal [SIGINT]
PM2 Starting execution sequence in -cluster mode- for app name:app id:1
PM2 App name:app id:0 disconnected
PM2 App [app] with id [0] and pid [33382], exited with code [0] via signal [SIGINT]
PM2 Starting execution sequence in -cluster mode- for app name:app id:0
app-1 Starting app...
app-0 Starting app...
app-1 Listening on 3000
PM2 App name:app id:1 online
app-0 Listening on 3000
PM2 App name:app id:0 online

As you can see, both app id:1 and app id:0 are disconnected before the new nodes are online.

Also even though it says the app existed via signal SIGINT, the handler in app.js was never run.

@Unitech
Copy link
Owner

Unitech commented Oct 31, 2016

Alright, the stop/SIGINT/wait ready/reload functions has been refactored and are working much better.

Could you please have another try:

$ npm install pm2@next -g
$ pm2 update

For long exit (like in your example > 10s), make sure your increase the listen-timeout option:

$ pm2 start app.js --wait-ready --listen-timeout 15000

@bjrnt
Copy link
Author

bjrnt commented Nov 1, 2016

Thanks for the quick response!

I didn't get it to work with the --listen-timeout option. To be clear, what I'm doing is running the following command twice: pm2 startOrReload ./ecosystem.config.js --watch-ready --listen-timeout 15000 with sufficient time in between the runs to let the app start. This produced some downtime when doing the reload.

However, I managed to get zero downtime reloads with the following command: PM2_GRACEFUL_LISTEN_TIMEOUT=15000 pm2 startOrReload ./ecosystem.config.js --wait-ready.

Log output produced by the second command:

2016-11-01 12:14:10: Starting execution sequence in -cluster mode- for app name:app id:0
2016-11-01 12:14:10: App name:app id:0 online
2016-11-01 12:14:10: Starting execution sequence in -cluster mode- for app name:app id:1
2016-11-01 12:14:10: App name:app id:1 online
2016-11-01 12:14:45: Starting execution sequence in -cluster mode- for app name:app id:0
2016-11-01 12:14:45: Starting execution sequence in -cluster mode- for app name:app id:1
2016-11-01 12:14:46: App name:app id:0 online
2016-11-01 12:14:46: App name:app id:1 online
2016-11-01 12:14:56: -reload- New worker listening
2016-11-01 12:14:56: Stopping app:app id:_old_1
2016-11-01 12:14:56: -reload- New worker listening
2016-11-01 12:14:56: Stopping app:app id:_old_0
2016-11-01 12:14:57: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:57: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:57: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:57: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:57: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:57: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:57: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67481 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:58: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:59: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:59: Process with pid 67482 still not killed, retrying...
2016-11-01 12:14:59: Process with pid 67482 still alive after 1600ms, sending it SIGKILL now...

@Unitech
Copy link
Owner

Unitech commented Nov 2, 2016

--listen-timeout overrides the PM2_GRACEFUL_TIMEOUT (here)

You should not pass extra options to the CLI (pm2 start ecosystem.conf.js).
listen_timeout and wait_ready should be declared into your configuration file instead.

module.exports = {
  apps : [{
      name      : "HTTP",
      script    : "http.js",
      listen_timeout: 10000,
      exec_mode : 'cluster',
      wait_ready : true
  }]
}

With JMeter and your test scenario, what was the displayed result in the interface when you succeed to gracefully reload your app?

Thanks!

@bjrnt
Copy link
Author

bjrnt commented Nov 3, 2016

I tried editing ecosystem.conf.js as you specified, but that strangely produced downtime.

Here are the logs:

PM2 Process 0 in a stopped status, starting it
PM2 Stopping app:app id:0
PM2 Process 1 in a stopped status, starting it
PM2 Stopping app:app id:1
PM2 App name:app id:1 disconnected
PM2 App [app] with id [1] and pid [95376], exited with code [0] via signal [SIGINT]
PM2 App name:app id:0 disconnected
PM2 App [app] with id [0] and pid [95265], exited with code [0] via signal [SIGINT]
PM2 Starting execution sequence in -cluster mode- for app name:app id:1
PM2 Starting execution sequence in -cluster mode- for app name:app id:0
app-0 Starting app...
app-1 Starting app...
app-1 Listening on 3000
app-0 Listening on 3000
PM2 App name:app id:0 online
PM2 App name:app id:1 online

Specifically, what I'm looking at in JMeter is the Error % field. When the app was successfully reloaded, the value stays at 0.00%.

Here's my full app:

ecosystem.conf.js

module.exports = {
  apps: [{
    name: 'app',
    script: 'app.js',
    instances: 2,
    exec_mode: 'cluster',
    wait_ready: true,
    listen_timeout: 15000
  }]
};

app.js

const http = require('http');

const port = 3000;

const app = http.createServer((_, res) => {
  res.writeHead(200);
  res.end('Hello!');
});

console.log('Starting app...');

setTimeout(() => {
  app.listen(port, () => {
    console.log(`Listening on ${port}`);
    if (process.send) {
      process.send('ready');
    }
  });
}, 10000);

Command: pm2 startOrReload ./ecosystem.config.js

@vmarchaud
Copy link
Contributor

@bjrnt Could you try with pm2 startOrGracefulReload ecosystem.config.js ?

@bjrnt
Copy link
Author

bjrnt commented Nov 4, 2016

@vmarchaud Getting the same results with graceful reloading.

Logs

PM2 Process 0 in a stopped status, starting it
PM2 Stopping app:app id:0
PM2 Process 1 in a stopped status, starting it
PM2 Stopping app:app id:1
PM2 App name:app id:0 disconnected
PM2 App [app] with id [0] and pid [8219], exited with code [0] via signal [SIGINT]
PM2 Starting execution sequence in -cluster mode- for app name:app id:0
PM2 App name:app id:1 disconnected
PM2 App [app] with id [1] and pid [8651], exited with code [0] via signal [SIGINT]
PM2 Starting execution sequence in -cluster mode- for app name:app id:1
app-0 Starting app...
app-1 Starting app...
app-0 Listening on 3000
PM2 App name:app id:0 online
app-1 Listening on 3000
PM2 App name:app id:1 online

@Unitech
Copy link
Owner

Unitech commented Nov 4, 2016

In reference to you app.js and ecosystem.conf.js could you do:

$ pm2 kill
$ pm2 start ecosystem.conf.js
$ pm2 reload ecosystem.conf.js

And see with a clean environment if it reload without issue

@Unitech
Copy link
Owner

Unitech commented Nov 4, 2016

Just tried the exact same example with wrk, it works (pm2@2.1.3)

@Unitech
Copy link
Owner

Unitech commented Nov 7, 2016

Patch available on pm2@2.1.4 (main):

$ npm install pm2 -g
$ pm2 update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants