New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLite on Network Share #1886
Comments
Do you have a link for that with some evidence? I'd expect a linux docker to behave largely the same. A linux docker in a windows hyper-v, that's different, since it's essentially a network share unless you mount a datacontainer or mobylinux mount, instead of a windows host mount/share.
It shouldn't, like never. Even other synchronization modes aren't reliable over networks, it's horrible for performance too.
Euh, no, rule 1 of the Fool Proof Handbook is to never add an option that the user must set to avoid breaking stuff. Either detect the edge-case and deal with it automatically, or throw a big fat warning saying it's unsupported. 😄 But as I said, need more info.... please |
Updated the description a bit. To be clear, as the title of the issue said, I'm only discussing docker for windows; which uses CIFS/SMB to mount host paths. It mounts them with the "nobrl" option, which causes lock requests not to be sent to the server (docker/for-win#11). This is unique to docker for windows, though similar problems arise on docker for osx. If your solution is that network paths are not supported for the database files, then that's fine; it just means that anyone using docker for windows will have massive problems; and perhaps a startup warning that the appdata filesystem must be local would be nice. I agree that requiring the user to add an option for normal behavior is bad; the flip side of that rule is that anything you set automatically should be able to be overridden by the user. Go ahead and set it on OSX; but let the end user override it if they want. I don't think application code should know about every edge case; that's what configuration files or advanced command line options are for. At any rate, there are various complaints of Sonarr (and Radarr, and Plex) not working right/at all/being corrupted on docker for windows; CIFS is, I believe, the root cause. |
That's my preferred solution for network shares, coz it's just inviting disaster regardless of sync mode.
In our experience you shouldn't. Yes, advanced users are quite capable of making those decisions. But Sonarr isn't intended for advanced users and any option is likely to be abused/misused (we have empirical evidence on that, and dozens of wasted support hours to drive the point home). So any (hidden/config-file only) option should be carefully considered, and avoided as much as possible. There usually is a better solution. |
My understanding of the teminology is that Docker for Windows uses Hyper-V (Windows 10 only), while Docker Toolbox for Windows uses Virtualbox (Windows 7+). In this issue, I'm discussing Hyper-V with MobyLinuxVM; the host paths are from the parent Windows host; using a config path in the MobyLinuxVM is an option, but there is no easy way to tell docker to do that; and afaik, all docker volume drivers on windows will use CIFS as well. I'm a long-time backend services coder, so don't think so well about normal user usability issues grin. That said, perhaps taking a progressive enhancement approach for things like WAL mode might be better: by default, use the most compatible journaling mode (DELETE, iirc); if you detect a supported filesystem, enable WAL mode. This will allow for usage on unknown filesystems without code changes. Still, I agree using sqlite on what is essentially a network filesystem is a bad idea; I just don't know of a better solution. The only other options I can think of involve rsync/unison with inotify; and that has it's own problems. Really, though, this isn't a Sonarr problem; it's a Docker for Windows problem, that they made worse by disabling file locking. |
Hi, look at this config file. Maybe it's worth a shot: https://system.data.sqlite.org/index.html/artifact?ci=trunk&filename=System.Data.SQLite/Configurations/System.Data.SQLite.dll.config And just for information: If docker is running on top of a Linux system (virtualized in Hyper-V or not), path mapping works as expected and the database works as expected. I'm running a Linux VM inside Hyper-V that contains a docker environment containing Sonarr. The storage backend is LVM and the config and data paths are mapped into the container. Works. Have a look at creater_container():
And then in /srv/data/sonarr/config:
Cu |
@Grimeton - of course it does. We're specifically discussing Docker for Windows, which uses MobyLinuxVM running on HyperVM, with paths on the windows host. In this (standard) configuration, any paths on the Windows host are mounted via SMB/CIFS. |
@lokkju Yeah the question came up, so I clarified it. |
I have an armv7 docker swarm cluster, running a sonarr container among a lot of other things. This cluster, have a glusterfs server on all the nodes, setup as a replication. I mount locally on all nodes using glusterfs fuse filesystem to localhost. In short, I have a local filesystem on all the nodes with the same data. This works for everything, but sonarr, that corrupt the sqlite3 database in average once each two days in average. My workaround is to backup the database (.dump) every hour. If database corrupts, it automatically remove all sqlite databases and restore a new one from the last working dump. Would be nice to have an option "use WAL" or something like that on the configuration to get rid of this. Or support external relational databases (mysql, postgres, ...). I think external databases would be a lot of work mainly because of version migrations, advanced selects, so on, but the option to use wal or not, should be simple to add. |
Just some update, the latest way to use docker on windows is LCOW, which uses "linuxkit" running inside hyper-v. It seems they now use 9p to share the volumes, which also results in a lot of errors and makes any container that uses sqlite in WAL mode unusable or any locking operation for that matter. I got the answer #1385 that this is a Microsoft problem. Its still crazy to me, that after all those years docker can't handle sqlite + WAL on windows. Its a real shame since LCOW works great otherwise and is a huge improvement over the old mode and docker toolkit. |
This isn't a Docker/Windows/CIFS issue. I get the same behaviour on Docker Swarm on Ubuntu using NFS. Oddly, this worked fine with Kubernetes even though the NFS server was the same. |
As others have mentioned, there actually are valid scenarios in which you may have to mount configuration and the database from a network share. |
This isn't a Windows only issue. I get the same errors on Rancher 2.1 Kubernetes, using NFS Persistent Volume to a ReadyNAS NFSv4. My research shows it is a known issue with sqlite not playing nice with NFS's locking and the answer might be to allow nolock as an option. |
I'd like to chime in too with this problem. I use a Docker container for Sonarr. It only happens when I use NFS as the datastore. This would be great to get working as others also want their persistent data stored on a NFS server. The My NFS mount options are:
|
Oh well, it was a stab in the dark and thanks for eliminating that. |
I would love to have the priority bumped on this issue. Out of 30 containers, Lidarr, Radarr & Sonarr are the only applications I run that cannot use NFS for application data. :( For now I have just stored their application data to the VM instead of my NFS share. |
Can confirm - this is still an issue in the Sonarrv3 previews. |
Just wanted to give my $.02. I have the same issue. Tryong to run sonarr in a kubernetes cluster has been... Painful. I ended up having a container first grabbing a copy of the data from the nfs share and then putting it on a local share. Then start sonarr and have another container do a copy back to the nfs share to have a somewhat reliable backup of it. It's gross. The db is going to get corrupted someday because the container is gonna crash in the middle of the transfer. And while it's in no way sonarr's fault that sqlite is garbage over network share... It would be really nice to have a fix, or be able to run against mysql/postgres/... And yeah like mentioned before, the nolock option doesn't solve that issue. |
Looks like it does this with sqlite on nfs for me too. nolock did not fix the issue. @Xaelias I will probably do the same thing as you. |
@markus101 @Taloth would it be possible to include a start up argument to disable |
As linked in the comment above, I'm getting 'database disk image is malformed' errors I have my persistent docker storage mounted by a glusterfs share. I tried using the local disk as suggested by @markus101 and I'm not getting the errors anymore, but I really want the safety and redundancy of the glusterfs server I painstakingly setup. Can't we opt for a separate mariadb or postgres db instead of sqlite? |
FYI to everyone saying " |
@fergalmoran I may have misread your proposition. Yeah we probably agree then. |
Update for the Windows/docker users out there. This is using the default hyper-v "Linux Container" backend + "shared drive" feature, not Lcow/WSL2. So maybe give it a try again and see if the sqlite db's don't corrupt anymore and maybe even network shares might work, not sure how smb over those new "shared drives" behaves. I did not test the latest inotify stuff, but if it works as well a lot of containers should now run correctly from windows bind mounts. 2.1.5.0 introduced this:
2.1.6.0 this:
All my quick tests that failed before now work correctly, while using a bind mount from my host NTFS drive. (You need to add your drive to share via settings and than can directly use it, but make sure the folders exists.) Examples via PowerShell:
PS: I assume the same stack (gRPC, FUSE, and Hypervisor sockets) is utilized for there WSL2 backend, while the old experimental Lcow backend (Windows Containers) will not use it, since it has no "shared drives" option. |
Just set up a homelab cluster based on Nomad and NFS as a shared data store. Quite discouraged after hours of efforts to find this issue and realize I can't (at least with my skills) get Sonarr up and running. One more vote for some movement on a "real" solution for this this or a flag that can be set. |
@natelandau If iSCSI is an option for you, using it as persistent volume will avoid this problem with SQlite and NFS. |
iscsi turned out to be a pita (k3s-io/k3s#1567), so I went to an (admittedly hacky) fallback: Sonarr/Sonarr#1886 (comment). Ugh.
So I came here from a thread on reddit about Bazarr. I have the following setup:
So the interesting part is that Sonarr, Radarr and Lidarr are all running fine on the virtual DSM, with the configuration stored on the NFS share. I installed Bazarr, and it immediately failed with a locking error which is obviously related to WAL. Moving Bazarr container onto the 'real' DSM, and storing the config in exactly the same place on the volume, just using the direct path rather than mounting that folder as an NFS share, works just fine. What's weird is that in theory, Sonarr, Radarr and Lidarr should all fail with the NFS share, if they have WAL enabled... Either way, another vote here for a DISABLE_SQLITE_WAL option for all of these containers. :) |
Came here looking for a solution for the same type of SQLite WAL database corruption issues on Gluster... Seems like the above workaround from @putty182 may be a good idea... Now just have to try to figure out how to translate the workaround to GlusterFS using docker swarm services? |
Today I moved from glusterfs 3.x to 7.x, both Sonarr and Radarr are no longer corrupting their databases for me it seems so far, doing some testing at the moment to confirm. |
I feel like SQLite's WAL-mode is might be unfairly attacked in this thread. The suggestion that Sonarr should introduce an option to disable it entirely seems like an unnecessarily blunt instrument. The SQLite page on WAL does indeed say that "WAL does not work over a network filesystem". I think it says this though because the way WAL is implemented is by using shared memory, in this case though a memory-mapped file. Since this method of sharing memory though a memory-mapped file isn't well-supported by network filesystems, SQLite can't make the necessary correctness guarantees for separate hosts that are reading the SQLite database off a network share. However, if you can guarantee that you have no more than one machine using the SQLite database, I don't think that there's anything inherent to the way WAL mode works that it should cause corruption. I imagine that for most deployments of Sonarr, having a single instance deployed is reasonable (I can't imagine many people are load-balancing Sonarr or have it set up in HA configuration). I experience I'm interested in reproducing the corruption and locking errors that we see in Sonarr, but I think I need to learn more about how Sonarr interacts with its SQLite database in order to do it. Specifically: does Sonarr read from the SQLite database concurrently from separate threads? Does it write to the SQLite database concurrently from separate threads? Does it use WAL in EXCLUSIVE locking mode? |
no plans to change this at this time; closing per markus |
IMO this is a big issue for some users @bakerboy448 & @markus101. I know having an external db may never be supported but maybe some future versions of sqlite may have some features to mitigate this. Anyways, would it make sense to document this on the FAQ that Sonarr's application data is not supported over NFS/network shares and link to this issue? |
Has anybody tried putting the DB on a GFS2 share? I have a 3-node Kubernetes cluster, and I fancy creating a "local" PV as a shared LVM thin partition formatted to GFS2 attached to the nodes. In theory, only one pod will access the DB, so it would not matter which node writes the SQLite file on the shared partition. This filesystem was specifically created for shared access, I wonder if SQLite would work on it fine. |
@immanuelfodor if running in a kubernetes cluster I suggest iSCSI or other block storage like rook-ceph, longhorn or openebs. |
Is iSCSI a solution that will allow the database on shared storage to work without these issues? |
Based on @onedr0p 's suggestion, I've started to experiment with Piraeus (wrapper of Linstor which is wrapper of DBRD) to provide high speed NVMe storage for my cluster. It can also use iSCSI under the hood as network block storage protocol. Find some of my questions about Piraeus usage here: piraeusdatastore/piraeus-operator#125 My experiment is not yet complete to share the final conclusions regarding SQLite, I have had a busy week since then. |
@2fst4u not using kubernetes, but on Docker my issue had go away once I switched from NFS to iSCSI |
We're starting to get off in the weeds, but yes, iSCSI will "solve" this issue because under the hood, it works with local copies of the files. It just uses network (async) to report these changes to the NAS. |
Not sure about that. I just tested it with Longhorn (which is a block level storage), and it still has the same "database is locked" problem. My guess is that it could be hardware/latency dependent. |
From my use case (and others before) it solved the issue. |
I used to use longhorn and I never had an issue. I'm back on rook-ceph because of other unrelated issues though. |
For anyone struggling with this, as a workaround you can run |
Just to add some information to this extensive thread - In my experience, I'm using Ceph 15.2 and docker swarm to run Sonarr and other various containers - storing the SQLite Databases and other configuration on a Ceph backed filesystem (CephFS) - mounted via a Ceph Fuse driver on each docker host. This setup works well and does not present any issues with locks. So it appears the advice echoed above to use block-level storage systems (Gclusterfs, rook-ceph, etc) - and mount via ISCSI or other methods is probably the best way to avoid NFS file locks. |
May break some features but the core loop is working a lot better with this for me without it choking on locked databases every 10 seconds. Still trying to figure out a way for Radarr that doesn't involve recompiling. |
Sonarr currently uses WAL mode for journals with SQLite. WAL mode has some advantages, but one major disadvantage is that it can not safely be used over non-local filesystems (https://sqlite.org/wal.html); docker for windows and other virtualization systems using CIFS mounted host paths often fail with sqlite locking or corruption errors when using WAL with sqlite file on host shared paths.
Providing an option to disable WAL mode (perhaps using standard DELETE mode) for transactions would be very useful for virtualizing Sonarr, or other cases where the config files and sqlite databases need to live on a SMB/CIFS/NFS path.
Could we have some sort of config file option or command line option that disables WAL mode journaling throughout the program?
This was already addressed for OSX in #167 - may as well just make it an advanced option, so we can set it when needed, and by default on OSX.
The text was updated successfully, but these errors were encountered: