-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unexpected SQLITE_INTERRUPT #18
Comments
The errors from those methods should be repeated by anything that is called on the returned Stmt after prepare. As those errors are properly annotated with source, it should be safe to return them. I'll make a change to do that. If you're seeing SQLITE_INTERRUPT, that almost certainly means the context associated with your connection has been cancelled. |
Should not affect program operation but may be helpful in debugging. For #18.
I made sure it wasn't that! see https://github.com/fasterthanlime/chao and this long-ass twitter thread tl;dr I was seeing a lot of SQLITE_INTERRUPT before the context was cancelled, I'm now pretty sure what I'm seeing is a duplicate of #17 |
Could you add a printf here and see if you're hitting this code path? Line 242 in 3f55bce
From what you're describing I don't think that's the problem, but after staring at it for a while I think that code path is problematic. |
Ah I see from your twitter thread that the SQLITE_INTERRUPT is coming from the |
One thought is these sqlite docs:
This makes it sound like the problem is your connection has an in-progress Stmt somewhere when you that is keeping the underlying connection interrupted when you go to prep another statement. One way to test this: when you go to reset the context on the connection, do it by calling (I should do the same thing when someone calls |
At least it did at some point! I've also seen SQLITE_INTERRUPT returned from _step(), from _reset(), from _clearbinding(), etc.
Yep, that + the ..but then I ran into other issues, like |
Update: I've updated 🔥 https://github.com/fasterthanlime/chao so it:
What chao doeschao attempts to flirt with the amount of The goal is to have part of the queries succeed and part of the queries be interrupted, hopefully at various stages:
It tries to use a small (but > 2) connection pool (so it's easy to investigate in
It runs several "rounds" of heavy querying, each separated by a few seconds of "recess". After all rounds are done, it runs a very simple query on each connection of the pool, with a 2-second timeout. If this query succeeds, the connection is deemed "healthy". Without the workaroundWithout the workaround, a lot of queries get interrupted, because we intentionally try to push the workload until we can't complete it within the deadline anymore:
At the final connection health check, I'm usually seeing "1 / 10 healthy conns":
With the workaroundThe workaround is to just... not set a deadline, ie. never cancel the connection's context. So it's not doing the same thing, it effectively robs us of the ability to cancel requests early (for my use case, that's ok). As a result, all the queries succeed (the only failure condition is us not waiting on the pool long enough):
At the end, all connections are healthy:
What could cause SQLite to return SQLITE_INTERRUPT?The main suspect (as far as I'm concerned) is this:
What does SQLite consider a "running statement" ?
What about the
|
func (stmt *Stmt) Step() (rowReturned bool, err error) { | |
defer func() { | |
stmt.lastHasRow = rowReturned | |
}() | |
if stmt.bindErr != nil { | |
err = stmt.bindErr | |
stmt.bindErr = nil | |
return false, err | |
} | |
for { | |
stmt.conn.count++ | |
if err := stmt.interrupted("Stmt.Step"); err != nil { | |
return false, err | |
} | |
switch res := C.sqlite3_step(stmt.stmt); uint8(res) { // reduce to non-extended error code | |
case C.SQLITE_LOCKED: | |
if res := C.wait_for_unlock_notify(stmt.conn.conn, stmt.conn.unlockNote); res != C.SQLITE_OK { | |
return false, stmt.conn.reserr("Stmt.Step(Wait)", stmt.query, res) | |
} | |
C.sqlite3_reset(stmt.stmt) | |
// loop | |
case C.SQLITE_ROW: | |
return true, nil | |
case C.SQLITE_DONE: | |
return false, nil | |
case C.SQLITE_BUSY, C.SQLITE_INTERRUPT, C.SQLITE_CONSTRAINT: | |
// TODO: embed some of these errors into the stmt for zero-alloc errors? | |
return false, stmt.conn.reserr("Stmt.Step", stmt.query, res) | |
default: | |
return false, stmt.conn.extreserr("Stmt.Step", stmt.query, res) | |
} | |
} | |
} |
In the codepath where we get SQLITE_INTERRUPT
from wait_for_unlock_notify
:
- we return
false
forrowReturned
lastHasRow
is set tofalse
- ...even if a previous
Step()
call did return a row
The same thing happens (I think) when stmt.interrupted
returns a non-nil error.
What doesn't fix this?
Things I've tried:
- Resetting all cached statements (conn.stmts) on
Pool.Put
- Resetting all cached statements (conn.stmts) on
Pool.Get
defer stmt.Reset()
inexec
- although I suspect this doesn't work because
stmt.Reset()
never callssqlite3_reset
if the conn is interrupted bydoneCh
- although I suspect this doesn't work because
What might fix this?
As I'm writing this (I haven't tried it yet), I suspect something along the lines of:
defer func() {
if rowReturned {
stmt.lastHasRow = true
}
}()
...might fix it. I'll try it now, but I did want to leave my findings somewhere.
How do I run chao myself?
- Make sure you have the latest
crawshaw/sqlite
in your $GOPATH - Review the code to make sure I didn't sneak in any harmful stuff
- Heck, run it in a VM
- Review the
Makefile
- It runs
go get -v -x
with gcflags to disable inlining and optimization
- It runs
- Run
make
- Run
chao
You might want to run rr chao
to capture an execution trace instead. mozilla's rr is Linux-only afaict, and your distro probably has outdated packages for it:
- https://github.com/mozilla/rr
- Note that I've seen
rr chao
get stuck for seemingly no reason - You won't be able to
dlv attach
to an instance of chao already being recorded byrr
Then:
- Study
vars.go
- Enable the workaround
- Run
make
andchao
again - All connections should be healthy at the end
If you want to debug:
- Run
dlv replay ~/.local/share/rr/chao-XXX
if you captured an execution trace (recommended)
or:
- Enable
facilitateDebugging
invars.go
- Rebuild (
make
) - chao will print a dlv command you can copy-paste, with the right PID
- Run the command (something like
dlv attach 999
) - Weep whenever delve says it can't read part of the memory
I was able to debug with delve on both Windows 10 64-bit & Linux 64-bit, but I wasn't able to gather any surprising information.
According to the SQLite docs: Any new SQL statements that are started after the sqlite3_interrupt() call and before the running statements reaches zero are interrupted as if they had been running prior to the sqlite3_interrupt() call. When we detect a closed interrupt and return false from Step without resetting a statemnt, we leave open a statement. This means future calls to SetInterrupt don't succeed, because of an outstanding sqlite statement. To fix that conditon, reset stmts when we interrupt Step without calling into cgo. To fix the more general condition, reset any open statements on call to SetInterrupt. Finally, make sure Reset always resets, as discovered by Amos Wenger. For #18 For #20
Lines 267 and 268 here:
stmt.Reset()
andstmt.ClearBindings()
can both return errors (notably, SQLITE_INTERRUPTED), but they're never returned from conn.Prepare()sqlite/sqlite.go
Lines 265 to 281 in 4e3e3e8
Is this on purpose? If so, can we add a comment that says it's on purpose and where the error will actually be handled?
(I'm currently trying to figure out why some of my connections always end up returning SQLITE_INTERRUPTED, this might or might not be related)
The text was updated successfully, but these errors were encountered: