Windows docker volume lock file issue #385

awakecoding · 2020-02-06T00:39:03Z

I am using "library/mongo:4.2-windowsservercore-1809" on Windows Server 2019 with a docker volume. When launching the container on an empty volume, it all works fine, but I am unable to restart my container without encountering a lock file issue.

This issue is similar to this older issue described here: StefanScherer/dockerfiles-windows#322

I am using a docker volume instead of a bind mount to avoid file system issues, but it looks like docker volumes are problematic as well. What should I be verifying? Being unable to use MongoDB in a Windows container is a serious release blocker for me.

Here is the docker log:
2020-02-06T00:19:53.651+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none' 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] MongoDB starting : pid=5768 port=27017 dbpath=C:\data\db\ 64-bit host=224c0ea470d2 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] db version v4.2.3 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] git version: 6874650b362138df74be53d366bbefc321ea32d4 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] allocator: tcmalloc 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] modules: none 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] build environment: 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] distmod: 2012plus 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] distarch: x86_64 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] target_arch: x86_64 2020-02-06T00:19:53.991+0000 I CONTROL [initandlisten] options: { net: { bindIp: "*" } } 2020-02-06T00:19:54.066+0000 I STORAGE [initandlisten] Detected data files in C:\data\db\ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'. 2020-02-06T00:19:54.067+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=3583M,cache_overflow=(file_max=0M),session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress], 2020-02-06T00:19:54.070+0000 E STORAGE [initandlisten] WiredTiger error (16) [1580948394:70320][5768:140703784850016], wiredtiger_open: __win_file_lock, 241: C:\data\db\\WiredTiger.lock: handle-lock: LockFile: The process cannot access the file because another process has locked a portion of the file. : Resource device Raw: [1580948394:70320][5768:140703784850016], wiredtiger_open: __win_file_lock, 241: C:\data\db\\WiredTiger.lock: handle-lock: LockFile: The process cannot access the file because another process has locked a portion of the file. : Resource device 2020-02-06T00:19:54.070+0000 E STORAGE [initandlisten] WiredTiger error (16) [1580948394:70320][5768:140703784850016], wiredtiger_open: __conn_single, 1674: WiredTiger database is already being managed by another process: Resource device Raw: [1580948394:70320][5768:140703784850016], wiredtiger_open: __conn_single, 1674: WiredTiger database is already being managed by another process: Resource device 2020-02-06T00:19:54.071+0000 E STORAGE [initandlisten] WiredTiger error (16) [1580948394:71316][5768:140703784850016], wiredtiger_open: __win_file_lock, 241: C:\data\db\\WiredTiger.lock: handle-lock: LockFile: The process cannot access the file because another process has locked a portion of the file. : Resource device Raw: [1580948394:71316][5768:140703784850016], wiredtiger_open: __win_file_lock, 241: C:\data\db\\WiredTiger.lock: handle-lock: LockFile: The process cannot access the file because another process has locked a portion of the file. : Resource device 2020-02-06T00:19:54.071+0000 E STORAGE [initandlisten] WiredTiger error (16) [1580948394:71316][5768:140703784850016], wiredtiger_open: __conn_single, 1674: WiredTiger database is already being managed by another process: Resource device Raw: [1580948394:71316][5768:140703784850016], wiredtiger_open: __conn_single, 1674: WiredTiger database is already being managed by another process: Resource device 2020-02-06T00:19:54.072+0000 E STORAGE [initandlisten] WiredTiger error (16) [1580948394:72314][5768:140703784850016], wiredtiger_open: __win_file_lock, 241: C:\data\db\\WiredTiger.lock: handle-lock: LockFile: The process cannot access the file because another process has locked a portion of the file. : Resource device Raw: [1580948394:72314][5768:140703784850016], wiredtiger_open: __win_file_lock, 241: C:\data\db\\WiredTiger.lock: handle-lock: LockFile: The process cannot access the file because another process has locked a portion of the file. : Resource device 2020-02-06T00:19:54.072+0000 E STORAGE [initandlisten] WiredTiger error (16) [1580948394:72314][5768:140703784850016], wiredtiger_open: __conn_single, 1674: WiredTiger database is already being managed by another process: Resource device Raw: [1580948394:72314][5768:140703784850016], wiredtiger_open: __conn_single, 1674: WiredTiger database is already being managed by another process: Resource device 2020-02-06T00:19:54.072+0000 W STORAGE [initandlisten] Failed to start up WiredTiger under any compatibility version. 2020-02-06T00:19:54.072+0000 F STORAGE [initandlisten] Reason: 16: Resource device 2020-02-06T00:19:54.073+0000 F - [initandlisten] Fatal Assertion 28595 at src\mongo\db\storage\wiredtiger\wiredtiger_kv_engine.cpp 789 2020-02-06T00:19:54.073+0000 F - [initandlisten] ***aborting after fassert() failure

And the corresponding docker run command:
docker run --name den-mongo -d --network=den-network -v den-mongodata:c:\data\db

The text was updated successfully, but these errors were encountered:

tianon · 2020-02-06T00:48:37Z

Is the initial container shutting down cleanly? (Do the MongoDB logs show that it was able to shut down and close out the database files successfully?)

awakecoding · 2020-02-06T00:57:42Z

I shutdown the container with a simple "docker stop". My current code does not pass "--rm" to the docker run command, so the container lingers one after it's been stopped (easier to run docker logs afterwards). To compensate for not using --rm, I run docker stop + docker rm before docker run for all of my containers, so maybe something in this sequence messes things up.

Would it be possible that Windows accesses the files from the docker volume, causing them to be potentially locked by another process when it shouldn't?

I managed to find a temporary workaround by finding the real file path on the disk where the docker volume is stored, and deleting the WiredTiger.lock file while the container is not running. If the container is running, Windows won't let me delete it because another process is accessing it.

PS C:\Users\wayk-admin\den-test> $volume = $(docker volume inspect den-mongodata) | ConvertFrom-Json PS C:\Users\wayk-admin\den-test> $volume.MountPoint C:\ProgramData\docker\volumes\den-mongodata\_data PS C:\Users\wayk-admin\den-test> $WiredTigerLock = Join-Path $volume.MountPoint 'WiredTiger.lock' PS C:\Users\wayk-admin\den-test> Remove-Item $WiredTigerLock -Force

awakecoding · 2020-02-06T01:05:57Z

I launched my container, then instead of calling docker rm on it, I stopped it, then re-launched it with docker start (probably the best way to do it). I encountered the same issue, but since I didn't remove the container I could get a clean log of the previous shutdown. It all looks good to me, so I'm thinking it might be a timing issue related to docker and the fact that the lock file is one of the last operation done on the container shutdown:

2020-02-06T01:01:32.517+0000 I  CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten] MongoDB starting : pid=8480 port=27017 dbpath=C:\data\db\ 64-bit host=875be735f5ee
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten] db version v4.2.3
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten] git version: 6874650b362138df74be53d366bbefc321ea32d4
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten] allocator: tcmalloc
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten] modules: none
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten] build environment:
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten]     distmod: 2012plus
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten]     distarch: x86_64
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten]     target_arch: x86_64
2020-02-06T01:01:32.603+0000 I  CONTROL  [initandlisten] options: { net: { bindIp: "*" } }
2020-02-06T01:01:32.640+0000 I  STORAGE  [initandlisten] Detected data files in C:\data\db\ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2020-02-06T01:01:32.641+0000 I  STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=3583M,cache_overflow=(file_max=0M),session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress],
2020-02-06T01:01:32.922+0000 I  STORAGE  [initandlisten] WiredTiger message [1580950892:922421][8480:140704011735648], txn-recover: Recovering log 2 through 3
2020-02-06T01:01:33.034+0000 I  STORAGE  [initandlisten] WiredTiger message [1580950893:34523][8480:140704011735648], txn-recover: Recovering log 3 through 3
2020-02-06T01:01:33.194+0000 I  STORAGE  [initandlisten] WiredTiger message [1580950893:194483][8480:140704011735648], txn-recover: Main recovery loop: starting at 2/6528 to 3/256
2020-02-06T01:01:33.477+0000 I  STORAGE  [initandlisten] WiredTiger message [1580950893:477432][8480:140704011735648], txn-recover: Recovering log 2 through 3
2020-02-06T01:01:33.702+0000 I  STORAGE  [initandlisten] WiredTiger message [1580950893:702420][8480:140704011735648], txn-recover: Recovering log 3 through 3
2020-02-06T01:01:33.813+0000 I  STORAGE  [initandlisten] WiredTiger message [1580950893:812425][8480:140704011735648], txn-recover: Set global recovery timestamp: (0, 0)
2020-02-06T01:01:34.124+0000 I  RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2020-02-06T01:01:34.794+0000 I  STORAGE  [initandlisten] Timestamp monitor starting
2020-02-06T01:01:34.842+0000 I  CONTROL  [initandlisten]
2020-02-06T01:01:34.842+0000 I  CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2020-02-06T01:01:34.842+0000 I  CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2020-02-06T01:01:34.842+0000 I  CONTROL  [initandlisten]
2020-02-06T01:01:34.911+0000 I  SHARDING [initandlisten] Marking collection local.system.replset as collection version: <unsharded>
2020-02-06T01:01:34.947+0000 I  STORAGE  [initandlisten] Flow Control is enabled on this deployment.
2020-02-06T01:01:34.948+0000 I  SHARDING [initandlisten] Marking collection admin.system.roles as collection version: <unsharded>
2020-02-06T01:01:34.948+0000 I  SHARDING [initandlisten] Marking collection admin.system.version as collection version: <unsharded>
2020-02-06T01:01:35.000+0000 I  SHARDING [initandlisten] Marking collection local.startup_log as collection version: <unsharded>
2020-02-06T01:01:35.572+0000 I  FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory 'C:/data/db/diagnostic.data'
2020-02-06T01:01:35.577+0000 I  SHARDING [LogicalSessionCacheRefresh] Marking collection config.system.sessions as collection version: <unsharded>
2020-02-06T01:01:35.578+0000 I  SHARDING [LogicalSessionCacheReap] Marking collection config.transactions as collection version: <unsharded>
2020-02-06T01:01:35.579+0000 I  NETWORK  [listener] Listening on 0.0.0.0
2020-02-06T01:01:35.579+0000 I  NETWORK  [listener] waiting for connections on port 27017
2020-02-06T01:01:36.004+0000 I  SHARDING [ftdc] Marking collection local.oplog.rs as collection version: <unsharded>
2020-02-06T01:02:02.547+0000 I  CONTROL  [thread1] CTRL_SHUTDOWN_EVENT signal
2020-02-06T01:02:02.547+0000 I  CONTROL  [consoleTerminate] got CTRL_SHUTDOWN_EVENT, will terminate after current cmd ends
2020-02-06T01:02:02.548+0000 I  NETWORK  [consoleTerminate] shutdown: going to close listening sockets...
2020-02-06T01:02:02.587+0000 I  -        [consoleTerminate] Stopping further Flow Control ticket acquisitions.
2020-02-06T01:02:02.587+0000 I  CONTROL  [consoleTerminate] Shutting down free monitoring
2020-02-06T01:02:02.587+0000 I  FTDC     [consoleTerminate] Shutting down full-time diagnostic data capture
2020-02-06T01:02:02.595+0000 I  STORAGE  [consoleTerminate] Deregistering all the collections
2020-02-06T01:02:02.596+0000 I  STORAGE  [consoleTerminate] Timestamp monitor shutting down
2020-02-06T01:02:02.596+0000 I  STORAGE  [consoleTerminate] WiredTigerKVEngine shutting down
2020-02-06T01:02:02.623+0000 I  STORAGE  [consoleTerminate] Shutting down session sweeper thread
2020-02-06T01:02:02.623+0000 I  STORAGE  [consoleTerminate] Finished shutting down session sweeper thread
2020-02-06T01:02:02.623+0000 I  STORAGE  [consoleTerminate] Shutting down journal flusher thread
2020-02-06T01:02:02.688+0000 I  STORAGE  [consoleTerminate] Finished shutting down journal flusher thread
2020-02-06T01:02:02.688+0000 I  STORAGE  [consoleTerminate] Shutting down checkpoint thread
2020-02-06T01:02:02.688+0000 I  STORAGE  [consoleTerminate] Finished shutting down checkpoint thread
2020-02-06T01:02:02.906+0000 I  STORAGE  [consoleTerminate] shutdown: removing fs lock...
2020-02-06T01:02:02.906+0000 I  CONTROL  [consoleTerminate] now exiting
2020-02-06T01:02:02.906+0000 I  CONTROL  [consoleTerminate] shutting down with code:12

awakecoding · 2020-02-06T01:37:36Z

I just implemented the hack of deleting the WiredTiger.lock file in my PowerShell script. It's far from ideal, but at least it is no longer release blocking for me. However, we should definitely go to the bottom of this. Since "shutdown: removing fs lock..." is apparently the last operation made by the process, I highly suspect it might be a problem with file operations done right before docker terminates the container. I wouldn't be surprised that adding a 1 second sleep there would "solve" the problem.

tianon · 2020-02-07T21:14:58Z

I grabbed a Windows machine so I could test this out and I'm not actually able to reproduce -- I started MongoDB with a volume and did docker stop, then started a new container from the same volume and it started up without any issues. 😕

(I also played with db.shutdownServer(), but that test worked fine too, which is less surprising.)

awakecoding · 2020-02-08T00:51:58Z

I can easily reproduce the issue with my cmdlet (https://github.com/devolutions/waykden-ps). I will see if I can make a smaller reproduction test case. Otherwise, just starting and stopping with my cmdlet that wraps docker containers is sufficient if you're willing to give it a shot. My last version has the workaround to delete the lock file but it prints it out when it does so you'd see it.

awakecoding · 2020-02-08T00:55:44Z

I am quite sure I've encountered this issue in the past on various Windows environments without hitting it consistently. Maybe it would help to know that I could see it happen every time in a very specific Azure VM instance, a dev/test sized Windows Server 2019 Core + containers exactly like the one from this guide I wrote: https://github.com/Devolutions/WaykDen-ps/blob/master/docs/getting-started-azure.adoc

awakecoding · 2020-02-09T17:59:08Z

@tianon I have made a simple script to run a MongoDB Windows container with a docker volume, stop it, then check for the remaining WiredTiger.lock file and delete it if present. It happens every single time I run it, no effort required: https://gist.github.com/awakecoding/f24608fad3ed6e867df624cef62e28ce

Let me know if you can reproduce the issue with the script, but I'm guessing it won't happen since you already tried. If you're willing to spend some time on it, I would be willing to create a virtual machine in Azure on our company subscription and give you access to it. I'd make sure that the issue is easy to reproduce on the virtual machine I'd prepare for you. I'd create an identical Windows Server 2019 Core + Containers virtual machine as the one where it happens all the time.

awakecoding · 2020-02-10T13:32:30Z

Bingo, someone investigated this very same issue and got a smaller program to reproduce it: moby/moby#39088

It sounds like there is an issue with Windows containers and the APIs used for releasing the WiredTiger.lock file.

mattewre · 2020-10-07T15:48:37Z

Is there a workaround for this? I can't restart MongoDB container without manually deleting .lock files.
I'm experiencing the same problem on Windows Server 2019 Host + MongoDB Windows-based container.
Tested on clean VM with Docker CE or Azure IoT Edge (Moby).

yosifkit · 2020-10-07T18:30:40Z

The only workarounds are to automate deleting the lock file or to use MongoDB on a linux host.

Closing this in favor of the upstream issue in Windows Containers (microsoft/Windows-Containers#37)

awakecoding mentioned this issue Feb 9, 2020

Error with MongoDb image & volume mount StefanScherer/dockerfiles-windows#322

Closed

awakecoding mentioned this issue Feb 10, 2020

Windows Server 2019: LockFile lock is not released by OS when process exits moby/moby#39088

Open

thecloudtaylor mentioned this issue Jun 24, 2020

Windows Server 2019: LockFile lock is not released by OS when process exits microsoft/Windows-Containers#37

Closed

jkhenning mentioned this issue Aug 16, 2020

Error when starting docker allegroai/clearml-server#56

Open

yosifkit closed this as completed Oct 7, 2020

fdubois1 mentioned this issue Nov 17, 2020

Mongo upgrade 4.2 to 4.4: Windows docker volume lock file issue #435

Closed

awakecoding mentioned this issue Dec 9, 2020

WSL2 Support for Windows Server 2019 MicrosoftDocs/WSL#678

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows docker volume lock file issue #385

Windows docker volume lock file issue #385

awakecoding commented Feb 6, 2020

tianon commented Feb 6, 2020

awakecoding commented Feb 6, 2020

awakecoding commented Feb 6, 2020 •

edited by yosifkit

awakecoding commented Feb 6, 2020

tianon commented Feb 7, 2020

awakecoding commented Feb 8, 2020

awakecoding commented Feb 8, 2020

awakecoding commented Feb 9, 2020

awakecoding commented Feb 10, 2020

mattewre commented Oct 7, 2020

yosifkit commented Oct 7, 2020 •

edited

Windows docker volume lock file issue #385

Windows docker volume lock file issue #385

Comments

awakecoding commented Feb 6, 2020

tianon commented Feb 6, 2020

awakecoding commented Feb 6, 2020

awakecoding commented Feb 6, 2020 • edited by yosifkit

awakecoding commented Feb 6, 2020

tianon commented Feb 7, 2020

awakecoding commented Feb 8, 2020

awakecoding commented Feb 8, 2020

awakecoding commented Feb 9, 2020

awakecoding commented Feb 10, 2020

mattewre commented Oct 7, 2020

yosifkit commented Oct 7, 2020 • edited

awakecoding commented Feb 6, 2020 •

edited by yosifkit

yosifkit commented Oct 7, 2020 •

edited