You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is related to #10634 but I now have more information what is going wrong.
I have a large internal docker-compose project with a lot of dependencies.
I run docker build and then press ctrl+c.
The terminal becomes broken (as described here).
The docker-build plugin process does not terminate. It becomes a parent of PID 1.
It does not react to any TERM signals. It seems that it will keep running forever and never terminate.
I compiled compose with debug simple and the stacktraces show that the process
always hangs while resolving dependencies, related to nodeCh.
I observed that it gets stuck in 2 different places, sometimes variant 1,
sometimes 2.
Variant 1: stuck when receiving from nodeCh
No graphTraversal.run go routines are runnning.
1 goroutine executes graphTraversal.visit, it hangs at for node := range nodeCh {.
It will hang there forever because there are no go-routines that will send something to the channel.
My wild guess is:
Some run functions skipped sending something to nodeCH because of:
matched.
It matches because cancelling the ctx, cancelled the start of children services.
No result for those services is send to nodeCH.
The expect counter in visit() then can not reach 0, nodeCh does not get closed and visit() gets stuck in the for node := range nodeCh { loop.
In a run were it happened expect had the value 99.
Multiple goroutines are executing graphTraversal.run, all of those got stuck
when trying to send to nodeCH.
No go-routine is running that executes graphTraversal.visit.
I can not reproduce it with Docker Compose version v2.17.3
DOCKER_BUILDKIT is enabled
It also gets stuck with --parallel=1
My first idea for a fix was to check if the context got cancelled when receiving or sending from nodeCh.
This works, the issue does not happen anymore.
But this solution now seems to me like a workaround for a bug in another place.
That only senders or receivers for nodeCh are still running should not happen. Maybe this situation can also happen when the ctx is not cancelled.
You can find that change here: fho@677c5fb
The text was updated successfully, but these errors were encountered:
fho
changed the title
[BUG] docker build process gets stuck when resolving deps and ignores SIGTERM
[BUG] docker build process gets stuck on terminate when resolving deps
Jun 5, 2023
Description
This is related to #10634 but I now have more information what is going wrong.
I have a large internal docker-compose project with a lot of dependencies.
I run
docker build
and then press ctrl+c.The terminal becomes broken (as described here).
The
docker-build
plugin process does not terminate. It becomes a parent of PID 1.It does not react to any TERM signals. It seems that it will keep running forever and never terminate.
I compiled compose with debug simple and the stacktraces show that the process
always hangs while resolving dependencies, related to
nodeCh
.I observed that it gets stuck in 2 different places, sometimes variant 1,
sometimes 2.
Variant 1: stuck when receiving from
nodeCh
No
graphTraversal.run
go routines are runnning.1 goroutine executes
graphTraversal.visit
, it hangs atfor node := range nodeCh {
.It will hang there forever because there are no go-routines that will send something to the channel.
My wild guess is:
Some run functions skipped sending something to
nodeCH
because of:compose/pkg/compose/dependencies.go
Lines 168 to 171 in 7c3fe35
It matches because cancelling the ctx, cancelled the start of children services.
No result for those services is send to
nodeCH
.The
expect
counter invisit()
then can not reach 0,nodeCh
does not get closed andvisit()
gets stuck in thefor node := range nodeCh {
loop.In a run were it happened
expect
had the value99
.Stacktraces of an occurence occurrence:
Variant 2: stuck when sending to
nodeCh
Multiple goroutines are executing
graphTraversal.run
, all of those got stuckwhen trying to send to
nodeCH
.No go-routine is running that executes
graphTraversal.visit
.Stacktraces:
Steps To Reproduce
Update, added on 5.6.23:
This is actually quite easy to reproduce and also happens with small docker compose projects:
compose.yml:
s0/Dockerfile:
docker compose build
ctrl+c
after ~2-3 secI expect that docker compose terminates, latest after ~20sec, when the sleep in the Dockerfiles expired.
But it will never terminate.
Compose Version
Docker Environment
Anything else?
--parallel=1
My first idea for a fix was to check if the context got cancelled when receiving or sending from nodeCh.
This works, the issue does not happen anymore.
But this solution now seems to me like a workaround for a bug in another place.
That only senders or receivers for nodeCh are still running should not happen. Maybe this situation can also happen when the ctx is not cancelled.
You can find that change here: fho@677c5fb
The text was updated successfully, but these errors were encountered: