Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] watch crashes when deleting file #11066

Closed
perosb opened this issue Oct 5, 2023 · 36 comments · Fixed by #11513
Closed

[BUG] watch crashes when deleting file #11066

perosb opened this issue Oct 5, 2023 · 36 comments · Fixed by #11513

Comments

@perosb
Copy link

perosb commented Oct 5, 2023

Description

When deleting a file (unsure if the file existed or not but should have since it synced it) watch command crashes and cannot be restarted.

develop:
  watch:
    - action: sync
      path: ${LOCAL_DEPLOY_PATH}\platform
      target: c:/inetpub/wwwroot/
Syncing cm after changes were detected:
  - C:\t\docker\deploy\platform\Web.config.xdt
☺�container 061ae4ad9ec878e7a259e45aa1d7b4bd0dc56468b05497fa18cd71dd5f1c0cbe encountered an error during hcs::System::CreateProcess: failure in a Windows system call: The system cannot find the file specified. (0x2)

Then when trying to restart it is locked:

> docker compose watch --no-up
cannot take exclusive lock for project "kermit": process with PID 20836 is still running

Killing the 20836 process still errors out the same.

Steps To Reproduce

It seem to be reproducable

watching [C:\t\docker\deploy\platform]
Syncing cm after changes were detected:
  - C:\t\docker\deploy\platform\images.jpg
☻Rtar: Removing leading drive letter from member names
x inetpub/wwwroot/images.jpg☻☻
Syncing cm after changes were detected:
  - C:\t\docker\deploy\platform\images.jpg
☺�container 7b26f6516e4c0ed000d0d71b1f01250411af2ddab0b17cd3f7f3b391a4ee97a0 encountered an error during hcs::System::CreateProcess: failure in a Windows system call: The system cannot find the file specified. (0x2)

Compose Version

Docker Compose version v2.22.0

Docker Environment

Client:
 Version:    24.0.6
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.3
    Path:     C:\Users\Administrator\.docker\cli-plugins\docker-buildx.exe
  compose: Docker Compose (Docker Inc.)
    Version:  v2.22.0
    Path:     C:\ProgramData\Docker\cli-plugins\docker-compose.exe
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  0.17.1
    Path:     C:\Users\Administrator\.docker\cli-plugins\docker-scout.exe

Server:
 Containers: 14
  Running: 0
  Paused: 0
  Stopped: 14
 Images: 861
 Server Version: 24.0.4
 Storage Driver: windowsfilter
  Windows:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: process
 Kernel Version: 10.0 20348 (20348.1.amd64fre.fe_release.210507-1500)
 Operating System: Microsoft Windows Server Version 21H2 (OS Build 20348.1970)
 OSType: windows
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.86GiB
 Name: kermit-dev
 ID: YP3Q:GBSN:NFJ3:QQMM:DB3Z:CD3V:7RBI:V445:473L:3WU3:VOP3:5DVB
 Docker Root Dir: C:\ProgramData\docker
 Debug Mode: false
 Username: kermit
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Anything else?

The file is not removed from container.
No idea of how/where the lock is kept.
A restart of docker-engine removed the lock.

@perosb perosb changed the title [BUG] watch crashes [BUG] watch crashes when deleting file Oct 5, 2023
@pbering
Copy link

pbering commented Oct 7, 2023

Same thing happens when copying many files into a synced folder like this:

    develop:
      watch:
        - action: sync
          path: ./web
          target: /inetpub/wwwroot

only way to recover from cannot take exclusive lock for project is to restart the host, not even restarting Docker Desktop or dockerd helps.

@mrbiggred
Copy link

@pbering and @perosb a possible workaround so you don't have to restart your host can be found in issue #11069:

#11069 (comment)

@ndeloof
Copy link
Contributor

ndeloof commented Oct 30, 2023

lock is managed by https://github.com/moby/moby/blob/master/pkg/pidfile/pidfile.go#L29
According to "process with PID 20836 is still running" message, the compose process is still reported by system as "alive". If you can reproduce this issue, could you please inspect the referred process ?

@pbering
Copy link

pbering commented Oct 30, 2023

When the issue happens and I see that message, then there is no process with that PID.

@ndeloof
Copy link
Contributor

ndeloof commented Oct 30, 2023

which OS are you running on ?

@perosb
Copy link
Author

perosb commented Oct 30, 2023

which OS are you running on ?

Kernel Version: 10.0 20348 (20348.1.amd64fre.fe_release.210507-1500)
Operating System: Microsoft Windows Server Version 21H2 (OS Build 20348.1970)
OSType: windows
Architecture: x86_64

@AlexeyPlodenko
Copy link

Same on

OS Name:                   Microsoft Windows 10 Pro
OS Version:                10.0.19045 N/A Build 19045

@mac-hel
Copy link

mac-hel commented Dec 20, 2023

@pbering and @perosb a possible workaround so you don't have to restart your host can be found in issue #11069:

#11069 (comment)

Another workarround (Linux) is to stop containers before exiting watch.
Ctrl+z to suspend watch
docker-compose down to stop and remove containers
fg to bring watch process into foreground
Ctrl+c to exit watch

@wclr
Copy link

wclr commented Jan 1, 2024

According to "process with PID 20836 is still running" message, the compose process is still reported by system as "alive". If you can reproduce this issue, could you please inspect the referred process ?

I've been implementing small script that watches docker-compose.yaml and restarts compose watch on its changes (to handle actual state), and run into this issue.

The question is why it reports that process with PID XXXX is still running when no process with this pid is running? I could net get its logic. What is it checking for when start compose watch, does it check the process?

@ndeloof
Copy link
Contributor

ndeloof commented Jan 1, 2024

@wclr process detection is implemented by https://github.com/moby/moby/blob/master/pkg/process/process_windows.go

@wclr
Copy link

wclr commented Jan 2, 2024

@wclr process detection is implemented by https://github.com/moby/moby/blob/master/pkg/process/process_windows.go

Well, here I believe it checks the process. But the fact is that when launching docker compose watch it reports that process with PID XXXX is still running while there is no XXXX in the tasklist (for example, in my case this PID was killed by the aforementioned script that spawned docker compose watch). Рeople above mention this too,, so you probably need to check the logic behind this report and check to ensure that it can not be the case.

@At4m4n
Copy link

At4m4n commented Jan 10, 2024

When the issue happens and I see that message, then there is no process with that PID.

Same. No such process at Windows host itself, neither at the container I watch (Ubuntu based app image). Seems to exist somewhere within the "docker engine space"

UPD. Managed to resolve this by updating to latest docker version. The installation itself said there's an assistance service process running and suggested to kill it. watch command works again after update, no host reboot needed.

@wclr
Copy link

wclr commented Jan 12, 2024

UPD. Managed to resolve this by updating to latest docker version. The installation itself said there's an assistance service process running and suggested to kill it. watch command works again after update, no host reboot needed.

It is not fixed, I used the latest version when run into this. And you don't need to reboot, you can just delete %LOCALAPPDATA%/docker-compose.[YOUR_COMPOSE_PROJECT_NAME].pid.

I eventually ended up writing my own custom watch script to fully replace docker compose watch functionality (for my case). It watches need file changes in the project and executes (inside the container) copying from the attached host volume to docker volumes to keep them in sync, this script as well initially runs rsync to make the initial sync with the host.

The problem mentioned in this issue is not the only one for current compose watch implementation, for example it also ignores /skips some file changes if they are made in a batch, so the custom solution can solve all this.

@AlexeyPlodenko
Copy link

I am deleting the image and the container and then run the docker-compose watch in one PowerShell script, to sync the files, before the containers are started:

Powershell.exe -noexit -command "cd ../..;  docker rm --force backoffice-php; docker rmi $(docker images --format '{{.Repository}}:{{.Tag}}'|findstr 'backoffice-php'); docker-compose watch"

@torrinworx
Copy link

Can confirm that I'm seeing this issue on docker version 24.0.7, and a temporary work around for it is to delete the .pid file in:
C:\Users<your-user-name>\AppData\Local\docker-compose<your-project-name>.pid

@princemaple
Copy link

Encountered this on Desktop 4.27.1 (136059), Engine: 25.0.2, Compose: v2.24.3-desktop.1

@glours
Copy link
Contributor

glours commented Feb 9, 2024

Hey @perosb
I'm not able to reproduce with the latest Docker Desktop release 4.27.2, can you give it a try?
If you still have the issue, can you give us a minimal & complete reproduction case?

For all the other, if you don't have the same issue as the original one, please:

  • Check the latest version of Docker Desktop
  • If you can reproduce your issue, please open a new one with a full repo case

Thanks all 🙏

@Piglow19
Copy link

Errors are still occurring (I've updated to the latest Docker version 4.27.2) :

$ docker compose watch
cannot take exclusive lock for project "": process with PID 43396 is still running

My OS :

OS Name: Microsoft Windows 11 Famille
OS Version: 10.0.22631 N/A build 22631

If necessary, I can open a repository.

@borgez
Copy link

borgez commented Feb 10, 2024

for me manual delete file in AppData\Local\docker-compose\ *.pid solove problem but very annoying

docker version
Client:
 Cloud integration: v1.0.35+desktop.10
 Version:           25.0.3
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        4debf41
 Built:             Tue Feb  6 21:13:02 2024
 OS/Arch:           windows/amd64
 Context:           default

Server: Docker Desktop 4.27.2 (137060)
 Engine:
  Version:          25.0.3
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       f417435e5f6216828dec57958c490c4f8bae4f98
  Built:            Wed Feb  7 00:39:16 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

@torrinworx
Copy link

Hey @perosb I'm not able to reproduce with the latest Docker Desktop release 4.27.2, can you give it a try? If you still have the issue, can you give us a minimal & complete reproduction case?

For all the other, if you don't have the same issue as the original one, please:

  • Check the latest version of Docker Desktop
  • If you can reproduce your issue, please open a new one with a full repo case

Thanks all 🙏

@glours I'm actually facing this issue with my repo here: https://github.com/torrinworx/Bitorch

To reproduce:

  1. $ docker compose -f .\dev.docker-compose.yml build --no-cache
  2. $docker compose -f .\dev.docker-compose.yml watch
  3. $ ctrl+c, and delete/remove running container/compose stacks
  4. $ docker compose -f .\dev.docker-compose.yml watch

Result:

PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch> docker compose -f .\dev.docker-compose.yml watch

cannot take exclusive lock for project "bitorch-development": process with PID 47156 is still running
PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch> tasklist /fi "PID eq 47156" 
INFO: No tasks are running which match the specified criteria.
PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch>

This is with docker desktop 4.27.2.

@FibreFoX
Copy link

FibreFoX commented Feb 13, 2024

Maybe this is related, but for me it looks like the %LOCALAPPDATA%\docker-compose\PROJECTNAME.pid-file is not getting removed properly. When exiting via CRTL + C, the exit code is 130 ($LastExitCode when using PowerShell), maybe thats the reason that watch-command is not working as intended?

I always need to delete that file, no other problems so far (but haven't played around with this new feature yet).

Running Docker Desktop 4.27.2 on Windows 10 Pro 22H2 using HyperV.

@ndeloof
Copy link
Contributor

ndeloof commented Feb 13, 2024

This file is not expected to be removed after command completion, but when executed compose command check the registered pid is alive (see https://github.com/moby/moby/blob/master/pkg/process/process_windows.go).

@glours
Copy link
Contributor

glours commented Feb 15, 2024

@torrinworx 👋
I used your repository but wasn't able to reproduce on my side.
I don't know what happens to be honest, can you share me a recording so I'll be able to check if I'm not missing a step?
Are you using WSL2 or HyperV as Docker Desktop Virtutal machine?

@torrinworx
Copy link

@torrinworx 👋 I used your repository but wasn't able to reproduce on my side. I don't know what happens to be honest, can you share me a recording so I'll be able to check if I'm not missing a step? Are you using WSL2 or HyperV as Docker Desktop Virtutal machine?

@glours I'm using WSL2 on Windows 11 Home 22H2 22621.3155. Here is a video with this issue happening with the Bitorch repository I linked above:

https://www.youtube.com/watch?v=PzrfWC825Rc

@glours
Copy link
Contributor

glours commented Feb 15, 2024

@torrinworx thank you very much!
Can I ask you an another question, I want to check if you don't have a old version of Compose in your path that could take the priority in favor of the embedded version of Desktop
Can you share the result of docker compose version please?

@torrinworx
Copy link

np!

Huh yeah it looks like it's still taking the old desktop version: Docker Compose version v2.24.5-desktop.1

Even though my docker desktop client is saying v4.27.2

@glours
Copy link
Contributor

glours commented Feb 15, 2024

Unfortunately no 😞 , Compose v2.24.5-desktop.1 is the version shipped in Docker Desktop 4.27.2

@glours
Copy link
Contributor

glours commented Feb 15, 2024

@torrinworx can you try something else please, instead of doing docker compose watch directly can you try the following steps:

  • docker compose -f .\dev.docker-compose.yml up -d
  • docker compose -f .\dev.docker-compose.yml watch
  • Then do the CTRL+C
  • docker compose -f .\dev.docker-compose.yml watch again

And a 2nd test:

  • docker compose -f .\dev.docker-compose.yml up -d
  • docker compose -f .\dev.docker-compose.yml watch
  • Then do the CTRL+C
  • Don't remove the containers in Docker Desktop
  • docker compose -f .\dev.docker-compose.yml watch again

@ndeloof
Copy link
Contributor

ndeloof commented Feb 15, 2024

Can you please check the status of the process listed in the lock file ?

Get-Process -Id 146328

@torrinworx
Copy link

@torrinworx can you try something else please, instead of doing docker compose watch directly can you try the following steps:

  • docker compose -f .\dev.docker-compose.yml up -d
  • docker compose -f .\dev.docker-compose.yml watch
  • Then do the CTRL+C
  • docker compose -f .\dev.docker-compose.yml watch again

And a 2nd test:

  • docker compose -f .\dev.docker-compose.yml up -d
  • docker compose -f .\dev.docker-compose.yml watch
  • Then do the CTRL+C
  • Don't remove the containers in Docker Desktop
  • docker compose -f .\dev.docker-compose.yml watch again

So both tests result in the same error:

PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch> docker compose -f .\dev.docker-compose.yml watch
cannot take exclusive lock for project "bitorch-development": process with PID 165984 is still running
PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch>

However when I delete the .pid file from the directory they both work just fine.

@ndeloof After I CTRL+C the watch command and delete the containers this is the result:

C:\Users\torri>tasklist /FI "PID eq 162632"
INFO: No tasks are running which match the specified criteria.

C:\Users\torri>

It only shows No tasks are running... after you CTRL+C the watch command, when it's still running it will show this, even when the containers are deleted:

C:\Users\torri>tasklist /FI "PID eq 162632"

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
docker-compose.exe          162632 Console                    1     47,308 K

C:\Users\torri>

@ndeloof
Copy link
Contributor

ndeloof commented Feb 15, 2024

Looking at the pidfile code which should detect process is done, I wonder GetExitCodeProcess could return without an error, and then code would return true

@glours
Copy link
Contributor

glours commented Feb 15, 2024

@ndeloof agree

ndeloof referenced this issue in moby/moby Feb 16, 2024
Using the implementation from pkg/pidfile for windows, as that implementation
looks to be handling more cases to check if a process is still alive (or to be
considered alive).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@glours
Copy link
Contributor

glours commented Feb 16, 2024

@torrinworx can you try this specific version of Compose (choose the right binary for you) and install it with the name docker-compose(.exe) under your ~/.docker/cli-plugins directory

@torrinworx
Copy link

torrinworx commented Feb 18, 2024

@glours That worked! I was able to re-watch the compose file and build the containers without seeing the PID error.

@milas
Copy link
Contributor

milas commented Mar 6, 2024

This fix has been included in Compose 2.24.7+. If you continue to have problems after upgrading, comment here and we can re-open the issue or create a new one as appropriate. Thanks for the report!

@julielerman
Copy link

julielerman commented Jun 28, 2024

I will be documenting all of these experiments in a blog post but just wanted to leave it here in case someone gets stuck.

Some data for you @milas , as I'm getting this on mac Monterey v 12.5
Docker Desktop Docker & verified Compose version matches: v2.27.1-desktop.1
Visual Studio Code 1.90.2
Docker extension v1.29.1

docker compose -f "docker-compose.yml" up -d --build --watch
docker compose down
(containers are gone)
docker compose watch

cannot take exclusive lock for project "xxx": process with PID 88950 is still running

I THINK IT's the detached flag that is causing this... ??? (sorry for the brain dump. I keep playing wtih it and understanding more and more and maybe its useful to you...maybe not LOL)

Note that --watch results in multiple terminal windows (as stated in the docs somewhere) : docker terminal and my zsh terminal window. *
*After compose down, the docker terminal is still there with the last message from the UP command. If I delete that terminal manually (using the trashcan icon) then it releases and I can run docker compose watch

Update: I feel like I'm going a little crazy. After manually clsoing the docker terminal, then running with up --watch again, there is no docker terminal. Just my main zsh. I hate inconsistent outcomes. I've done this many times and this is a new behavior.

With docker compose watch, docker compose down properly releases and gets me back to
"Terminal will be reused by tasks, press any key to close it." in the docker terminal. However after pressing any key , I still have to Ctrl-C (no message suggests that) to truly release the docker terminal and go back to zsh.

*Side question; I haven't been able to find any other guidance about using up --watch vs watch. Does it exist somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.