Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Web App Not Functioning - Database Locked #341

Closed
joe-eklund opened this issue Jul 24, 2022 · 9 comments · Fixed by #352
Closed

[BUG] Web App Not Functioning - Database Locked #341

joe-eklund opened this issue Jul 24, 2022 · 9 comments · Fixed by #352
Labels
bug Something isn't working waiting for response

Comments

@joe-eklund
Copy link

Describe the bug
After the web app is running for a little while, I am getting a database locked error message in the logs and the web application is not loading. Spinning down the container and spinning it back up makes the app functional for another few day(s) until I decide to check it again and it's broken.

Expected behavior
The web app loads.

Log Files

Chrome Console:

main-es2015.9aa07eb67b5be2859df7.js:1 ERROR Error: Uncaught (in promise): x: {"headers":{"normalizedNames":{},"lazyUpdate":null},"status":500,"statusText":"OK","url":"https://scrutiny.domain.tld/api/summary","ok":false,"name":"HttpErrorResponse","message":"Http failure response for https://scrutiny.domain.tld/api/summary: 500 OK","error":{"success":false}}
    at T (polyfills-es2015.5483988857642b9a2626.js:1:59069)
    at T (polyfills-es2015.5483988857642b9a2626.js:1:58597)
    at polyfills-es2015.5483988857642b9a2626.js:1:59878
    at l.invokeTask (polyfills-es2015.5483988857642b9a2626.js:1:54115)
    at Object.onInvokeTask (main-es2015.9aa07eb67b5be2859df7.js:1:543551)
    at l.invokeTask (polyfills-es2015.5483988857642b9a2626.js:1:54036)
    at s.runTask (polyfills-es2015.5483988857642b9a2626.js:1:49524)
    at v (polyfills-es2015.5483988857642b9a2626.js:1:56109)
    at u.invokeTask [as invoke] (polyfills-es2015.5483988857642b9a2626.js:1:55194)
    at f (polyfills-es2015.5483988857642b9a2626.js:1:67448)

Scrutiny Web Logs

time="2022-07-24T01:18:21Z" level=info msg="<redacted> - d18585b0e63e [24/Jul/2022:01:18:21 +0000] \"GET /\" 302 27 \"https://domain.tld/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\" (1ms)" clientIP=<redacted> hostname=d18585b0e63e latency=1 method=GET path=/ referer="https://domain.tld/" respLength=27 statusCode=302 userAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

time="2022-07-24T01:18:22Z" level=error msg="An error occurred while retrieving device summary Could not get device summary from DB: database is locked (5) (SQLITE_BUSY)"

time="2022-07-24T01:18:22Z" level=error msg="<redacted> - d18585b0e63e [24/Jul/2022:01:18:22 +0000] \"GET /api/summary\" 500 17 \"https://scrutiny.domain.tld/web/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\" (1ms)" clientIP=<redacted> hostname=d18585b0e63e latency=1 method=GET path=/api/summary referer="https://scrutiny.domain.tld/web/" respLength=17 statusCode=500 userAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
2022/07/24 01:18:22 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:33 database is locked (5) (SQLITE_BUSY)
[0.203ms] [rows:0] SELECT * FROM `devices`

Any ideas? Thanks.

@joe-eklund joe-eklund added the bug Something isn't working label Jul 24, 2022
@AnalogJ
Copy link
Owner

AnalogJ commented Jul 24, 2022

Hey @joe-eklund
I'll need a bit more information before I can help you:

  • what version of scrutiny are you running?
  • are you deploying via docker (hub/spoke or omnibus)
  • arm vs amd64 arch
  • do you have anything else connected to that sqlite DB? (backups, other apps, etc)

@joe-eklund
Copy link
Author

Hey @joe-eklund I'll need a bit more information before I can help you:

  • what version of scrutiny are you running?

Web app says master#c34ee85. scrutiny --version says scrutiny version 0.4.16.

  • are you deploying via docker (hub/spoke or omnibus)

Using Hub Spoke deployed through Docker.

  • arm vs amd64 arch

amd64. Using the following images:

  • ghcr.io/analogj/scrutiny:master-collector
  • ghcr.io/analogj/scrutiny:master-web
  • influxdb:2.2
  • do you have anything else connected to that sqlite DB? (backups, other apps, etc)

I do have hourly backups that are run through duplicacy. Duplicacy backs up my home folder (where the folder for Scrutiny lives) to a ZFS pool on the same host.

@AnalogJ
Copy link
Owner

AnalogJ commented Jul 24, 2022

From what I've read, this shouldn't be happening unless there's multiple connections to the SQLite DB.
To confirm, there's no other applications reading from that SQLite DB file, correct? No other versions/containers of scrutiny running on your server?

The really weird thing is that even if there was multiple processes locking up the DB file, it should be a transient error, eventually the other app should close its connection to the app.

@joe-eklund
Copy link
Author

From what I've read, this shouldn't be happening unless there's multiple connections to the SQLite DB. To confirm, there's no other applications reading from that SQLite DB file, correct? No other versions/containers of scrutiny running on your server?

Sorry took me a few days to get back to you. On vacation. :)

The only other thing is I do have another scrutiny collector from another machine (so two total) that is reporting back to the single scrutiny web instance. But I thought that was reporting directly through the scrutiny web container, so it shouldn't really be another "connection" to the DB right?

The really weird thing is that even if there was multiple processes locking up the DB file, it should be a transient error, eventually the other app should close its connection to the app.

Yes I agree. I will note the web app has been working since I restarted it on Saturday (4 days ago), but it has become frozen twice before over the past couple weeks. I will continue monitoring it and checking the logs to see if something more useful comes up to help with reproduction.

@hapylestat
Copy link

hapylestat commented Jul 28, 2022

can confirm the issue, using podman - everything are latest version. At start everything works, but giving the tool to work for several hours - make it in the same DB locked state. As i'm running containers, access to a file is guaranteed to be exclusive to only one application.

Configuration - 1 UI, 2 collectors. What more interesting, being unable to push metrics - collectors crashing as well.

2022/07/28 23:20:51 /home/runner/work/scrutiny/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:33 database is locked (5) (SQLITE_BUSY)
[1.065ms] [rows:0] SELECT * FROM `devices`
time="2022-07-28T23:20:51Z" level=error msg="An error occurred while retrieving device summary Could not get device summary from DB: database is locked (5) (SQLITE_BUSY)"

In theory, if application accessing the database from several different threads - it is same as it is accessed in a parallel. Given that UI reader and another thread writer might work in different threads, it could cause an issue when threads meats in write/read operation. It it is indeed the case, moving from Journal to WAL might help( i.e. PRAGMA journal_mode=WAL;)

AnalogJ added a commit that referenced this issue Aug 3, 2022
When a transaction cannot lock the database, because it is already locked by another one,
SQLite by default throws an error: database is locked. This behavior is usually not appropriate when
concurrent access is needed, typically when multiple processes write to the same database.
PRAGMA busy_timeout lets you set a timeout or a handler for these events. When setting a timeout,
SQLite will try the transaction multiple times within this timeout.
https://rsqlite.r-dbi.org/reference/sqlitesetbusyhandler

retrying for 30000 milliseconds, 30seconds - this would be unreasonable for a distributed multi-tenant application,
but should be fine for local usage.

added mechanism for global settings (PRAGMA and DB level instructions).

fixes #341
@AnalogJ AnalogJ mentioned this issue Aug 3, 2022
@AnalogJ
Copy link
Owner

AnalogJ commented Aug 3, 2022

quick update for everyone. I've been able to reproduce this error locally. Its definitely caused by concurrent requests to the db. I've fixed the issue via the following commit in the beta branch - a1b0108

The beta branch is still a bit unstable, so please don't switch to it, but you should see the fix in v0.5.0, which should be released in the next couple of days. Thanks for all your help debugging this!

@AnalogJ
Copy link
Owner

AnalogJ commented Aug 4, 2022

v0.5.0 has been released 🥳
If you wouldn't mind testing it out and verifing that the SQLITE_BUSY error is no longer occurring, that would be appreciated!

@KF5JWC
Copy link
Contributor

KF5JWC commented Aug 5, 2022

Just tried it out a few times, no longer hitting this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting for response
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants