[BUG] Web App Not Functioning - Database Locked #341

joe-eklund · 2022-07-24T01:30:15Z

Describe the bug
After the web app is running for a little while, I am getting a database locked error message in the logs and the web application is not loading. Spinning down the container and spinning it back up makes the app functional for another few day(s) until I decide to check it again and it's broken.

Expected behavior
The web app loads.

Log Files

Chrome Console:

main-es2015.9aa07eb67b5be2859df7.js:1 ERROR Error: Uncaught (in promise): x: {"headers":{"normalizedNames":{},"lazyUpdate":null},"status":500,"statusText":"OK","url":"https://scrutiny.domain.tld/api/summary","ok":false,"name":"HttpErrorResponse","message":"Http failure response for https://scrutiny.domain.tld/api/summary: 500 OK","error":{"success":false}}
    at T (polyfills-es2015.5483988857642b9a2626.js:1:59069)
    at T (polyfills-es2015.5483988857642b9a2626.js:1:58597)
    at polyfills-es2015.5483988857642b9a2626.js:1:59878
    at l.invokeTask (polyfills-es2015.5483988857642b9a2626.js:1:54115)
    at Object.onInvokeTask (main-es2015.9aa07eb67b5be2859df7.js:1:543551)
    at l.invokeTask (polyfills-es2015.5483988857642b9a2626.js:1:54036)
    at s.runTask (polyfills-es2015.5483988857642b9a2626.js:1:49524)
    at v (polyfills-es2015.5483988857642b9a2626.js:1:56109)
    at u.invokeTask [as invoke] (polyfills-es2015.5483988857642b9a2626.js:1:55194)
    at f (polyfills-es2015.5483988857642b9a2626.js:1:67448)

Scrutiny Web Logs

time="2022-07-24T01:18:21Z" level=info msg="<redacted> - d18585b0e63e [24/Jul/2022:01:18:21 +0000] \"GET /\" 302 27 \"https://domain.tld/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\" (1ms)" clientIP=<redacted> hostname=d18585b0e63e latency=1 method=GET path=/ referer="https://domain.tld/" respLength=27 statusCode=302 userAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

time="2022-07-24T01:18:22Z" level=error msg="An error occurred while retrieving device summary Could not get device summary from DB: database is locked (5) (SQLITE_BUSY)"

time="2022-07-24T01:18:22Z" level=error msg="<redacted> - d18585b0e63e [24/Jul/2022:01:18:22 +0000] \"GET /api/summary\" 500 17 \"https://scrutiny.domain.tld/web/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\" (1ms)" clientIP=<redacted> hostname=d18585b0e63e latency=1 method=GET path=/api/summary referer="https://scrutiny.domain.tld/web/" respLength=17 statusCode=500 userAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
2022/07/24 01:18:22 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:33 database is locked (5) (SQLITE_BUSY)
[0.203ms] [rows:0] SELECT * FROM `devices`

Any ideas? Thanks.

The text was updated successfully, but these errors were encountered:

AnalogJ · 2022-07-24T01:40:12Z

Hey @joe-eklund
I'll need a bit more information before I can help you:

what version of scrutiny are you running?
are you deploying via docker (hub/spoke or omnibus)
arm vs amd64 arch
do you have anything else connected to that sqlite DB? (backups, other apps, etc)

joe-eklund · 2022-07-24T02:01:49Z

Hey @joe-eklund I'll need a bit more information before I can help you:

what version of scrutiny are you running?

Web app says master#c34ee85. scrutiny --version says scrutiny version 0.4.16.

are you deploying via docker (hub/spoke or omnibus)

Using Hub Spoke deployed through Docker.

arm vs amd64 arch

amd64. Using the following images:

ghcr.io/analogj/scrutiny:master-collector
ghcr.io/analogj/scrutiny:master-web
influxdb:2.2

do you have anything else connected to that sqlite DB? (backups, other apps, etc)

I do have hourly backups that are run through duplicacy. Duplicacy backs up my home folder (where the folder for Scrutiny lives) to a ZFS pool on the same host.

AnalogJ · 2022-07-24T03:35:48Z

From what I've read, this shouldn't be happening unless there's multiple connections to the SQLite DB.
To confirm, there's no other applications reading from that SQLite DB file, correct? No other versions/containers of scrutiny running on your server?

The really weird thing is that even if there was multiple processes locking up the DB file, it should be a transient error, eventually the other app should close its connection to the app.

AnalogJ · 2022-07-24T03:36:33Z

just some notes for myself:

joe-eklund · 2022-07-27T15:56:19Z

From what I've read, this shouldn't be happening unless there's multiple connections to the SQLite DB. To confirm, there's no other applications reading from that SQLite DB file, correct? No other versions/containers of scrutiny running on your server?

Sorry took me a few days to get back to you. On vacation. :)

The only other thing is I do have another scrutiny collector from another machine (so two total) that is reporting back to the single scrutiny web instance. But I thought that was reporting directly through the scrutiny web container, so it shouldn't really be another "connection" to the DB right?

The really weird thing is that even if there was multiple processes locking up the DB file, it should be a transient error, eventually the other app should close its connection to the app.

Yes I agree. I will note the web app has been working since I restarted it on Saturday (4 days ago), but it has become frozen twice before over the past couple weeks. I will continue monitoring it and checking the logs to see if something more useful comes up to help with reproduction.

hapylestat · 2022-07-28T23:34:15Z

can confirm the issue, using podman - everything are latest version. At start everything works, but giving the tool to work for several hours - make it in the same DB locked state. As i'm running containers, access to a file is guaranteed to be exclusive to only one application.

Configuration - 1 UI, 2 collectors. What more interesting, being unable to push metrics - collectors crashing as well.

2022/07/28 23:20:51 /home/runner/work/scrutiny/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:33 database is locked (5) (SQLITE_BUSY)
[1.065ms] [rows:0] SELECT * FROM `devices`
time="2022-07-28T23:20:51Z" level=error msg="An error occurred while retrieving device summary Could not get device summary from DB: database is locked (5) (SQLITE_BUSY)"

In theory, if application accessing the database from several different threads - it is same as it is accessed in a parallel. Given that UI reader and another thread writer might work in different threads, it could cause an issue when threads meats in write/read operation. It it is indeed the case, moving from Journal to WAL might help( i.e. PRAGMA journal_mode=WAL;)

When a transaction cannot lock the database, because it is already locked by another one, SQLite by default throws an error: database is locked. This behavior is usually not appropriate when concurrent access is needed, typically when multiple processes write to the same database. PRAGMA busy_timeout lets you set a timeout or a handler for these events. When setting a timeout, SQLite will try the transaction multiple times within this timeout. https://rsqlite.r-dbi.org/reference/sqlitesetbusyhandler retrying for 30000 milliseconds, 30seconds - this would be unreasonable for a distributed multi-tenant application, but should be fine for local usage. added mechanism for global settings (PRAGMA and DB level instructions). fixes #341

AnalogJ · 2022-08-03T05:44:13Z

quick update for everyone. I've been able to reproduce this error locally. Its definitely caused by concurrent requests to the db. I've fixed the issue via the following commit in the beta branch - a1b0108

The beta branch is still a bit unstable, so please don't switch to it, but you should see the fix in v0.5.0, which should be released in the next couple of days. Thanks for all your help debugging this!

AnalogJ · 2022-08-04T15:43:00Z

v0.5.0 has been released 🥳
If you wouldn't mind testing it out and verifing that the SQLITE_BUSY error is no longer occurring, that would be appreciated!

KF5JWC · 2022-08-05T00:51:39Z

Just tried it out a few times, no longer hitting this issue!

joe-eklund added the bug Something isn't working label Jul 24, 2022

AnalogJ added the waiting for response label Jul 24, 2022

AnalogJ mentioned this issue Aug 2, 2022

[BUG] ApiServerCommunicationError appears when SQLITE db is busy when serving concurrent clients (collectors) #347

Closed

AnalogJ mentioned this issue Aug 3, 2022

pre v0.5.0 #352

Merged

AnalogJ closed this as completed in #352 Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Web App Not Functioning - Database Locked #341

[BUG] Web App Not Functioning - Database Locked #341

joe-eklund commented Jul 24, 2022

AnalogJ commented Jul 24, 2022

joe-eklund commented Jul 24, 2022

AnalogJ commented Jul 24, 2022

AnalogJ commented Jul 24, 2022

joe-eklund commented Jul 27, 2022

hapylestat commented Jul 28, 2022 •

edited

AnalogJ commented Aug 3, 2022

AnalogJ commented Aug 4, 2022

KF5JWC commented Aug 5, 2022

[BUG] Web App Not Functioning - Database Locked #341

[BUG] Web App Not Functioning - Database Locked #341

Comments

joe-eklund commented Jul 24, 2022

AnalogJ commented Jul 24, 2022

joe-eklund commented Jul 24, 2022

AnalogJ commented Jul 24, 2022

AnalogJ commented Jul 24, 2022

joe-eklund commented Jul 27, 2022

hapylestat commented Jul 28, 2022 • edited

AnalogJ commented Aug 3, 2022

AnalogJ commented Aug 4, 2022

KF5JWC commented Aug 5, 2022

hapylestat commented Jul 28, 2022 •

edited