[Feature Request]: Disable keep-alive feature for containers, when not in maintenance mode #422

systemofapwne · 2024-01-06T17:38:10Z

Description / Beschreibung

When ioBroker crashes, the container keeps running. This is a side effect of maintenance mode, as one can see on this line.
If the container keeps running, even though ioBroker crashed, the docker daemon will not automatically restart the container (if configured so).
One should therefore either make sure, that ioBroker automatically restarts, when it crashes or make the container stop running in that case.

Backgroud: I have two ioBroker instances, my main instance running on my server (incl. redis for states and objects) and a satellite. When the satellite loses connection (restart of main server or loss of network), it times out and ioBroker simply stops running and leaves behind an "empty container".

Image version

v8.1.0

Docker logs / Docker Protokoll

2024-01-03 22:50:53.493  - �[33mwarn�[39m: maxcul.0 (153) get state error: DB closed
2024-01-03 22:50:53.493  - �[33mwarn�[39m: maxcul.0 (153) get state error: DB closed
2024-01-03 22:50:53.493  - �[33mwarn�[39m: maxcul.0 (153) get state error: DB closed
2024-01-03 22:50:53.494  - �[33mwarn�[39m: maxcul.0 (153) get state error: DB closed
2024-01-03 22:50:53.494  - �[33mwarn�[39m: maxcul.0 (153) get state error: DB closed

...

    at Timeout._onTimeout (/opt/iobroker/node_modules/iobroker.maxcul/main.js:1930:42)
    at listOnTimeout (node:internal/timers:569:17)
    at process.processTimers (node:internal/timers:512:7)
2024-01-03 22:53:15.237  - �[31merror�[39m: maxcul.0 (153) Cannot read properties of null (reading 'getCredits')
2024-01-03 22:53:20.236  - �[31merror�[39m: maxcul.0 (153) uncaught exception: Cannot read properties of null (reading 'getCredits')
2024-01-03 22:53:20.237  - �[31merror�[39m: maxcul.0 (153) TypeError: Cannot read properties of null (reading 'getCredits')
    at Timeout._onTimeout (/opt/iobroker/node_modules/iobroker.maxcul/main.js:1930:42)
    at listOnTimeout (node:internal/timers:569:17)
    at process.processTimers (node:internal/timers:512:7)
2024-01-03 22:53:20.237  - �[31merror�[39m: maxcul.0 (153) Cannot read properties of null (reading 'getCredits')

...

2024-01-03 22:53:24.523  - �[31merror�[39m: socketio.1 (134) Objects database error: read ECONNRESET
2024-01-03 22:53:24.520  - �[32minfo�[39m: host.nuc terminated. Could not reset alive status for instances
2024-01-03 22:53:24.539  - �[31merror�[39m: maxcul.0 (153) Objects database error: read ECONNRESET
2024-01-03 22:53:24.668  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.
2024-01-03 22:53:24.668  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.
2024-01-03 22:53:24.668  - �[31merror�[39m: host.nuc Cannot find view "system" for search "instance" : Connection is closed.
2024-01-03 22:53:24.671  - �[32minfo�[39m: host.nuc instance system.adapter.wireless-mbus.0 terminated with code 156 (START_IMMEDIATELY_AFTER_STOP)
2024-01-03 22:53:24.671  - �[32minfo�[39m: host.nuc All instances are stopped.
2024-01-03 22:53:24.679  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.
2024-01-03 22:53:24.681  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.
2024-01-03 22:53:24.681  - �[31merror�[39m: host.nuc Cannot find view "system" for search "instance" : Connection is closed.
2024-01-03 22:53:24.685  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.
2024-01-03 22:53:24.685  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.
2024-01-03 22:53:24.685  - �[32minfo�[39m: host.nuc instance system.adapter.samsung_tizen.0 terminated with code 0 (NO_ERROR)
2024-01-03 22:53:24.685  - �[32minfo�[39m: host.nuc All instances are stopped.
2024-01-03 22:53:24.685  - �[31merror�[39m: host.nuc Cannot find view "system" for search "instance" : Connection is closed.
2024-01-03 22:53:24.698  - �[32minfo�[39m: host.nuc instance system.adapter.socketio.1 terminated with code 156 (START_IMMEDIATELY_AFTER_STOP)
2024-01-03 22:53:24.699  - �[32minfo�[39m: host.nuc All instances are stopped.
2024-01-03 22:53:24.711  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.
2024-01-03 22:53:24.712  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.
2024-01-03 22:53:24.712  - �[31merror�[39m: host.nuc Cannot find view "system" for search "instance" : Connection is closed.
2024-01-03 22:53:24.712  - �[32minfo�[39m: host.nuc instance system.adapter.maxcul.0 terminated with code 156 (START_IMMEDIATELY_AFTER_STOP)
2024-01-03 22:53:24.714  - �[32minfo�[39m: host.nuc All instances are stopped.
2024-01-03 22:53:24.716  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.
2024-01-03 22:53:24.716  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.
2024-01-03 22:53:24.716  - �[31merror�[39m: host.nuc Cannot find view "system" for search "instance" : Connection is closed.
2024-01-03 22:53:24.718  - �[32minfo�[39m: host.nuc instance system.adapter.javascript.1 terminated with code 0 (NO_ERROR)
2024-01-03 22:53:24.718  - �[32minfo�[39m: host.nuc All instances are stopped.
2024-01-03 22:53:24.718  - �[33mwarn�[39m: host.nuc get state error: Connection is closed.

buanet · 2024-01-06T23:35:18Z

Since this behavior is intentional, it is not a bug but a feature request.
However, you are right that it has a specific negative impact on ioBroker Slaves. Therefore, I will check if we can safely modify this behavior, at least for Slave Containers.
It might also be possible to link the "keep-alive function" of the container to the maintenance mode.
Thank you very much for your suggestion for improvement.

Regards,
André

systemofapwne · 2024-01-07T01:48:30Z

The issue is not limited to satellite instances, since any instance, that somehow crashes, has this fate.
So I propose, as you suggested, to rewire/fix the keep-alive functionality such that it works for all instances and not only for satellite ones.

buanet · 2024-01-07T21:53:46Z

It is not as easy as it sounds.
As using the maintenance script is not mandatory yet, people still using the "hacky way" described in the docs (https://docs.buanet.de/iobroker-docker-image/docs/#iobroker-js-controller-core-updates) to update js-controller.
So making the keep-alive dependent on the maintenance mode this will break the familiar way a lot of users use.
Not sure if we should do this.

systemofapwne · 2024-01-08T20:58:34Z

Calling it yourself "the hacky way" sounds like an ideal candidate for deprecation, especially since this is in this very containers documentation, that can easily be modified ;)

Anyhow, I propose two ideas:

Method1: One way would be making users aware about the maintenance mode via a shutdown message:

pkill -u iobroker   # user is about to use the old 'hacky' way
# entrypoint.sh then shows the message below...
ioBroker has stopped working. If you intended to perform maintenance such as upgrading the js-controller, use "maintenance on" before and "maintenance off" after your intended action

This can be of course written in a better way - including links to the docs. This would nudge people towards using maintenance mode.

Method2: If you on the other hand would like to keep things the way they are (for now), I propose an ENV variable (e.g. IOB_MAINTENANCE_MODE_MANDATORY):

When not set or false, the container behaves as it is right now
When set, the keep-alive function is only present, when in maintenance mode. If maintenance mode is not toggled and ioBroker stops working (e.g. due to a crash), the container simply stops (so docker can restart it, if set so).

I think, the second approach would satisfy both of us, even though it wouldn't fix the underlying issue.

buanet · 2024-01-08T22:09:52Z

Calling it yourself "the hacky way" sounds like an ideal candidate for deprecation, especially since this is in this very containers documentation, that can easily be modified ;)

deprecation sounds good, but your are not the one answering the questions and issues of people not reading the docs or changelogs...

making users aware about the maintenance mode via a shutdown message

pkill -u iobroker kills all processes running as user iobroker, so it might be quite a challenge to display a message in that situation...

I propose an ENV variable

No. I would not support the idea of another env for this. In the end the container has to stay administrable and supportable. Every new env makes it more complicated.
It would be just a repetition of a short story when I added an env to deactivate the permission check at startup (because "it takes so long 😮") and getting issues a year later about permissions errors when running/ updating the container...

Your idea is stated here as enhancement. We will see.

Regards,
André

systemofapwne · 2024-01-09T21:47:22Z

Calling it yourself "the hacky way" sounds like an ideal candidate for deprecation, especially since this is in this very containers documentation, that can easily be modified ;)

deprecation sounds good, but your are not the one answering the questions and issues of people not reading the docs or changelogs...

making users aware about the maintenance mode via a shutdown message

pkill -u iobroker kills all processes running as user iobroker, so it might be quite a challenge to display a message in that situation...

If you echo a message to /dev/pts/0, it would appear on a users screen, if he entered the container in interactive and terminal mode (docker exec -it /bin/bash). One can even loop over multiple /dev/pts/* devices to send the message to all ttys, that have been spawned when entering the container.
See my workaround Dockerfile below

I propose an ENV variable

No. I would not support the idea of another env for this. In the end the container has to stay administrable and supportable. Every new env makes it more complicated. It would be just a repetition of a short story when I added an env to deactivate the permission check at startup (because "it takes so long 😮") and getting issues a year later about permissions errors when running/ updating the container...

I can understand that you want to keep complexity and workarounds as low a possible, which is a good thing IMHO.

Another method would be running iobroker via s6 and ensuring, that it keep running on a crash. But I yet have to come around with an idea on how to identify, if a user purposely "crashed" it via pkill. Maybe by checking for the reason of the crash:
If it crashed due to missing db connections, just restart iobroker (accessible via the logs). But this all sounds a bit hacky too.

For the mean time, I wrote my own Docker container to mitigate crashes, since for an unknown reason, my satellite keeps on crashing randomly for the same reason all the time (sometimes it runs for weeks, sometimes just a few days or only hours).

FROM buanet/iobroker:latest
# WORKAROUND to disable maintenance mode keep-alive (+ reminder for myself, that I disabled it)
RUN sed -i '/^gosu iobroker tail/d' /opt/scripts/iobroker_startup.sh && echo 'if [ -c "/dev/pts/0" ]; then echo "ioBroker has stopped working. If you intended to perform maintenance such as upgrading the js-controller, use \"maintenance on\" before and \"maintenance off\" after your intended action" > /dev/pts/0; fi' >> /opt/scripts/iobroker_startup.sh

This basically modifyes the iobroker_startup.sh script such that instead of gosu iobroker tail -f /dev/null it reads

echo "ioBroker has stopped working. If you intended to perform maintenance such as upgrading the js-controller, use \"maintenance on\" before and \"maintenance off\" after your intended action" > /dev/pts/0; fi'

buanet added the enhancement It's not a bug, but it's worth an enhancement. label Jan 6, 2024

buanet changed the title ~~[Problem]: Container keeps running when ioBroker crashes~~ [Feature Request]: Disable keep-alive feature for containers used as Slaves or make it dependent on the maintenance mode Jan 6, 2024

buanet mentioned this issue Jan 6, 2024

[Problem]: iobroker_startup.sh should respect traps #421

Closed

systemofapwne changed the title ~~[Feature Request]: Disable keep-alive feature for containers used as Slaves or make it dependent on the maintenance mode~~ [Feature Request]: Disable keep-alive feature for containers, when not in maintenance mode Jan 7, 2024

buanet self-assigned this Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Disable keep-alive feature for containers, when not in maintenance mode #422

[Feature Request]: Disable keep-alive feature for containers, when not in maintenance mode #422

systemofapwne commented Jan 6, 2024

buanet commented Jan 6, 2024

systemofapwne commented Jan 7, 2024

buanet commented Jan 7, 2024

systemofapwne commented Jan 8, 2024 •

edited

buanet commented Jan 8, 2024

systemofapwne commented Jan 9, 2024

[Feature Request]: Disable keep-alive feature for containers, when not in maintenance mode #422

[Feature Request]: Disable keep-alive feature for containers, when not in maintenance mode #422

Comments

systemofapwne commented Jan 6, 2024

Description / Beschreibung

Image version

Docker logs / Docker Protokoll

buanet commented Jan 6, 2024

systemofapwne commented Jan 7, 2024

buanet commented Jan 7, 2024

systemofapwne commented Jan 8, 2024 • edited

buanet commented Jan 8, 2024

systemofapwne commented Jan 9, 2024

systemofapwne commented Jan 8, 2024 •

edited