Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple issues with graceful reload #2874

Closed
jiripospisil opened this issue May 10, 2017 · 11 comments
Closed

Multiple issues with graceful reload #2874

jiripospisil opened this issue May 10, 2017 · 11 comments

Comments

@jiripospisil
Copy link

What's going wrong?

It seems the graceful reload command doesn't work properly anymore. Based on the logs, it seems to be confused about the state of the individual processes. It also doesn't seem to be "graceful", as in it doesn't seem to wait until the new process is ready to replace the old one.

How could we reproduce this issue?

'use strict';

const log = (msg) => console.log(`${new Date().toISOString()} - ${msg}`);
let workInterval = null;

log('Started');

setTimeout(() => {
    log('Ready');
    process.send('ready');
    workInterval = setInterval(() => log('Working'), 1000);
}, 5000);

process.once('SIGINT', () => {
    clearInterval(workInterval)
    log('Received SIGINT, exiting in 5 seconds');
    setTimeout(() => {
        log('Exiting...'); // This might not display because of EL delay, that's fine
        process.exit(0);
    }, 5000);
});
pm2 start --instances=2 --wait-ready --listen-timeout=15000 --kill-timeout=15000 test1.js

The script is supposed to log "Started" at the very beginning and then send "ready" (and print it) to the master process after 5 seconds. It then logs "Working" every 1s. When the process is being gracefulReloaded, it's supposed to log the message and exit in 5 seconds.

My understanding of gracefulReload is that for each existing process A (in sequence, not in parallel, basically a sequential reload) it should start a new process A1, wait for it to be "ready", then send SIGINT to A and wait for it to exit.

This is what I'm seeing instead:

...
0|test1    | 2017-05-10T15:05:36.110Z - Working...
1|test1    | 2017-05-10T15:05:36.154Z - Working...
0|test1    | 2017-05-10T15:05:37.115Z - Working...
1|test1    | 2017-05-10T15:05:37.158Z - Working...
0|test1    | 2017-05-10T15:05:38.120Z - Working...
1|test1    | 2017-05-10T15:05:38.163Z - Working...

<pm2 gracefulReload test1 at this point>

PM2        | Process 0 in a stopped status, starting it
PM2        | Stopping app:test1 id:0
PM2        | Process 1 in a stopped status, starting it
PM2        | Stopping app:test1 id:1
0|test1    | 2017-05-10T15:05:38.291Z - Received SIGINT, exiting in 5 seconds
1|test1    | 2017-05-10T15:05:38.294Z - Received SIGINT, exiting in 5 seconds
PM2        | pid=79002 msg=failed to kill - retrying in 100ms
PM2        | pid=79003 msg=failed to kill - retrying in 100ms
PM2        | pid=79002 msg=failed to kill - retrying in 100ms
PM2        | pid=79003 msg=failed to kill - retrying in 100ms
PM2        | pid=79002 msg=failed to kill - retrying in 100ms
PM2        | pid=79003 msg=failed to kill - retrying in 100ms
PM2        | pid=79002 msg=failed to kill - retrying in 100ms

<a lot of "retrying in 100ms" messages>

0|test1    | 2017-05-10T15:05:43.294Z - Exiting...
1|test1    | 2017-05-10T15:05:43.295Z - Exiting...
PM2        | App name:test1 id:0 disconnected
PM2        | App name:test1 id:1 disconnected
PM2        | App [test1] with id [0] and pid [79002], exited with code [0] via signal [SIGINT]
PM2        | App [test1] with id [1] and pid [79003], exited with code [0] via signal [SIGINT]
PM2        | pid=79002 msg=process killed
PM2        | Starting execution sequence in -cluster mode- for app name:test1 id:0
PM2        | pid=79003 msg=process killed
PM2        | Starting execution sequence in -cluster mode- for app name:test1 id:1
PM2        | App name:test1 id:0 online
PM2        | App name:test1 id:1 online
0|test1    | 2017-05-10T15:05:43.558Z - Started
1|test1    | 2017-05-10T15:05:43.559Z - Started
0|test1    | 2017-05-10T15:05:48.561Z - Ready
1|test1    | 2017-05-10T15:05:48.562Z - Ready
1|test1    | 2017-05-10T15:05:49.564Z - Working...
0|test1    | 2017-05-10T15:05:49.565Z - Working...

The log mentions that the processes were in "stopped" state but obviously that was not the case because they were logging at the time. It also sent the SIGINT signal to them immediately instead of waiting for the new ones to be ready and replacing them. I'm also a bit confused about the number of "msg=failed to kill" messages but that might a side effect of the above. Any ideas what's wrong? Thanks!

Supporting information

PM2 version: 2.4.6
Node version: v6.10.3
Windows? Mac? Linux? Mac 10.12.04
@jiripospisil
Copy link
Author

Looking around the source code, this seems to be reason for the incorrect behavior. The commit which introduced the change is a53fd17a70 and its commit message seems to suggest it's actually intended to behave this way. If so, is it correct to say that gracefulReload is not supported when the wait-ready functionality is used?

@vmarchaud
Copy link
Contributor

@jiripospisil Indeed thats not normal, we will issue a fix for it.

@vmarchaud vmarchaud self-assigned this May 11, 2017
@Unitech
Copy link
Owner

Unitech commented May 11, 2017

gracefulReload is a deprecated command use reload instead

@jiripospisil
Copy link
Author

Okay, after playing with it for a while it seems to do the right thing, I will use it instead.

One other thing I noticed is that PM2 allows to set the number of concurrent actions (I would like to set it to 1 to go really easy on the machine) but the way it's coded makes it impossible to actually set the user defined value because it will always be overwritten by the Node.js version check (at least in my case). I noticed there's #2690 so maybe you want to tackle that in one go.

@vmarchaud
Copy link
Contributor

@jiripospisil Yeah its planned to fix it, but feel free to make a PR for it :)

@KaiSchwarz-cnic
Copy link

KaiSchwarz-cnic commented Nov 2, 2017

gracefulReload is a deprecated command use reload instead

Does this also apply to startOrGracefulReload? So that using startOrReload is the correct way?

@KaiSchwarz-cnic
Copy link

@Unitech Can you please be so kind to provide us feedback on that topic? TIA!

@vmarchaud
Copy link
Contributor

You should always use reload, even with the startOrReload :)

@KaiSchwarz-cnic
Copy link

Thanks @vmarchaud, very appreciated.

@KaiSchwarz-cnic
Copy link

KaiSchwarz-cnic commented Nov 9, 2017

@vmarchaud we are unsure by the used wording what you exactly suggest. reload if I am right? or startOrReload? Thanks for clarifying this :)

@KaiSchwarz-cnic
Copy link

@vmarchaud please :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants