Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent application from crashing if child process crashes - Closes #3142 #3168

Conversation

lsilvs
Copy link
Contributor

@lsilvs lsilvs commented Mar 22, 2019

What was the problem?

Application was crashing if a child process exited

How did I fix it?

I've implemented a reload mechanism that reload child process when it crashes without impacting the rest of application

How to test it?

Run the application with httpApi and chain modules as child process:
NODE_ENV=test LISK_CHILD_PROCESS_MODULES=httpApi,chain node src/index.js -n devnet

Kill httpApi process (it should reload the module and keep running normally):
pgrep -f http_api | xargs kill -9

Kill chain process (it should reload the module and keep running normally):
pgrep -f chain | xargs kill -9

Review checklist

@lsilvs lsilvs marked this pull request as ready for review March 25, 2019 15:01
@@ -127,6 +130,7 @@ class Controller {
'lisk',
['ready'],
{
isLiskReady: () => this.liskReady,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't need this action? We must not create actions preferably if the it can be sorted through events. And lisk:ready event is the right solution for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this when reloading the module

await this.bus.unregisterChannel(moduleAlias);

// Reload the module
this._loadChildProcessModule(alias, Klass, options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once child process is exit, we must not load it again at least for this release. We can develop more dedicated reload strategy later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed and decided to reload the faulty module. It can be improved in next versions but it works pretty well as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No its not the right solution, it can lead into infinite reloading and exhaust the system memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For current release its good if parent process exit wit proper logs if child process is crashed. Later we can develop a strategy, how to reload modules, or even have configurable options with module load retry etc. This topic needed to have more thought process and a refined solution.

Copy link
Contributor

@MaciejBaj MaciejBaj Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can improve the solution later on, but the resilience should be there in 1.6.0.


channel.once('lisk:ready', () => {
if (isLiskReady) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the right way to do it, lisk:ready event is triggered after load is called. So it must be listened as of previous implementation. I suspect you changed this because of module reload, if yes then as I told earlier, we should develop that strategy as an enhancement later. And leave that implication as it was before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was introduced to allow modules to be reloaded in case it crashes and I don't see the reason of reverting this implementation if it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If child process cause some memory leaks crash, then this infinite loading will exhaust the resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and we plan to improve this already. This is the first implementation of child process and wont be the final one. Knowing that this implementation can be unstable, I've created a property in the config file to completely disable it if required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is not related to stability. If there is an issue and any of code crash the whole application exited. That's exactly the default behaviour we have from start and we should keep it consistent in this release. In next release we develop resilience with a detailed and extensive approach.

framework/src/controller/controller.js Show resolved Hide resolved
framework/src/controller/bus.js Show resolved Hide resolved
framework/src/controller/channels/child_process_channel.js Outdated Show resolved Hide resolved
framework/src/controller/controller.js Show resolved Hide resolved
framework/src/controller/controller.js Outdated Show resolved Hide resolved
framework/src/modules/chain/index.js Outdated Show resolved Hide resolved
framework/src/modules/chain/index.js Outdated Show resolved Hide resolved
framework/src/modules/http_api/index.js Outdated Show resolved Hide resolved
framework/src/modules/http_api/index.js Outdated Show resolved Hide resolved
@MaciejBaj
Copy link
Contributor

Requested changes are addressed.

@MaciejBaj MaciejBaj merged commit a913216 into feature/child_process_module Mar 25, 2019
@MaciejBaj MaciejBaj deleted the 3142-prevent-application-crashing-with-child branch March 25, 2019 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants