New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Healthcheck not restarting unhealthy module #6358
Comments
After doing more searching in this Git repo, I found #5746. That ticket talks about an |
@gri6507 - Yes, I'm afraid that policy is also not supported. Currently, our recommendation is that you use the docker health checks. Like the issue you linked suggested, you can post it as a suggestion and vote - https://aka.ms/iotedge-suggestion. I'll work with someone to make sure the documentation reflects this too. Sorry about that. |
As workaround, you can achieve the desired result without relying on IoT Edge runtime. For example for the health check below:
...use these createOptions:
If you use this approach, ensure |
@veyalla - thank you for that workaround idea. One thing that isn't clear is what is the HostPort 9070 or why |
9070 is just an example. The uber point is that only the module knows whether it is healthy or not. So it exposes a binary health signal on a local endpoint of its choosing (9070 in this example). Docker engine is configured to periodically call the health endpoint (per the Healthcheck parameters) and crashes the module if unhealthy status is returned. EdgeAgent will subsequently restart the module, and that hopefully unstucks the module from a "running but not working" state. |
@veyalla - I agree with your statement that
However, I disagree with your continuation
Docker does not stop the container. It simply marks it as unhealthy. If docker actually did stop the container, then I agree that EdgeAgent would have picked that up and restarted the Edge module. |
My expectation is that the module exits if the health endpoint indicates unhealthy, which is equivalent to crashing itself. Example from heath check above:
|
Hi @veyalla I am digging arount in the issues to find the right way for a working health check in iotedge v1.2. When i set the the restart policy, in the Manifest, like: modules.{moduleId}.restartPolicy = always, i get always the following result on the edgeHost, no matter what value i use: $ docker inspect --format '{{json .HostConfig.RestartPolicy}}' mycustommodule| jq
{
"Name": "",
"MaximumRetryCount": 0
}
# This is wrong for my opinion, the name Should have the value always, i dont know how to change this value, whenn i do it in the manifest, the container does not startup.
$ docker inspect --format '{{json .Config.Healthcheck}}' mycustommodule | jq
{
"Test": [
"CMD-SHELL",
"curl --fail http://localhost:80/health || exit 1"
],
"Interval": 30000000000,
"Timeout": 10000000000,
"Retries": 3
}
# the Healtcheck locks good to me and the calls are made by docker, i see the calls in the logs For information, the configured of the health is in the dockerfile: FROM mcr.microsoft.com/dotnet/aspnet:6.0-bullseye-slim AS base
WORKDIR /app
EXPOSE 80
HEALTHCHECK --interval=30s --timeout=10s --retries=3 CMD curl --fail http://localhost:80/health || exit 1
# Install prerequisites for healthcheck
RUN apt-get update && apt-get install -y \
curl && \
rm -rf /var/lib/apt/lists/*
... My Question is: What else i have to do for a working restart after my container goes to unhealthy state? |
I'm not sure what the expectation here is. The health-check mechanism is transparent to IoT Edge and controlled exclusively by the Docker engine. There will be nothing reflected in Edge Agent's twin. You should just confirm the container is indeed restarted by Docker when health endpoint reports unhealthy. |
that is also what i am expecting |
After found a few things, i think your solution doesn't work out of the box. Because moby-engine does not support the restart behavior on health probe, see moby/moby#22719 . The restart is triggered by orchestrators like docker swarm (docker-compose) or other tools. I Think it should be the job of the edgeAgent. Actually the RestartPolicyManager does only look at the state of the Container and not the Healt. See the remark in the code: iotedge/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/RestartPolicyManager.cs Line 24 in 719bffd
|
Thanks for investigation. I believe you are correct, Docker Swarm-like restart-on-unhealthy behavior is not implemented in IoT Edge. For feature asks, please file or upvote an idea at https://aka.ms/iotedge-suggestion. However, I was able to test a workaround that achieved the core requirement. The createOptions snippet below will call the health command every minute (controlled by the
|
I created an issue to ask for the implementation of restart "on-unhealthy" here: https://feedback.azure.com/d365community/idea/32f1a962-70a0-ed11-a81b-6045bd79fc6e Please upvote this if you think it is useful. |
Expected Behavior
When a custom module's healthcheck fails, thus putting the module into "unhealthy" status, edgeAgent should restart the module
Current Behavior
edgeAgent does not restart the module
Steps to Reproduce
modulesContent.$edgeAgent.properties.desired
sudo iotedge list && sudo docker container ls
and observe thatiotedge
lists the module asrunning
anddocker
shows the state of the container asUp 10 seconds (starting)
--interval=15s --retries=2
settings. Observe thatiotedge
lists the module asrunning
anddocker
shows the state of the container asUp 1 minute (unhealthy)
edgeAgent
run, it does not restart the custom module. Furthermore,sudo docker logs -f edgeAgent | grep "scheduled to restart after"
does not show any attempts to restart the custom module.Context (Environment)
Output of
iotedge check
Click to show
Device Information
Runtime Versions
docker version
]: 20.10.8+azureAdditional Information
I have also tried a deployment with an explicit Healthcheck specification, but that also did not work
The text was updated successfully, but these errors were encountered: