-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix concurrent access for SQL storage. #1370
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-wise, this looks good. I'm curious to play with this a little, and I didn't get to it yet.
Thanks for digging in, and providing some domain-knowledge where it's needed! 😃
@@ -380,13 +386,23 @@ func (c *conn) UpdateKeys(updater func(old storage.Keys) (storage.Keys, error)) | |||
firstUpdate := false | |||
// TODO(ericchiang): errors may cause a transaction be rolled back by the SQL | |||
// server. Test this, and consider adding a COUNT() command beforehand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Is that comment still accurate now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember. Ha
storage/sql/sql.go
Outdated
@@ -22,6 +22,10 @@ type flavor struct { | |||
// Optional function to create and finish a transaction. | |||
executeTx func(db *sql.DB, fn func(*sql.Tx) error) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field could go away now, can't it?
@@ -140,10 +140,16 @@ func (c *conn) UpdateAuthRequest(id string, updater func(a storage.AuthRequest) | |||
return err | |||
} | |||
|
|||
err = c.flavor.lockForUpdate(tx, "auth_request", "id", r.ID) | |||
if err != nil { | |||
return fmt.Errorf("update auth request: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 All these fmt.Errorf
make me nervous -- they're hard to deal with in call sites -- but this is the style this package is written in, so it's fine at add a few. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can switch to github.com/pkg/errors in another PR :)
} | ||
return tx.Commit() | ||
lockForUpdate: func(tx *trans, table, column, value string) error { | ||
_, err := tx.Exec("SELECT 1 FROM "+table+" WHERE "+column+" = $1 FOR UPDATE NOWAIT;", value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For my own understanding,) NOWAIT
is used here because the intent of this PR is still to fail one transaction in the event of concurrent updates? i.e. same goal as before, just a different (less heavy-handed) method. And so the existing concurrency tests still pass with no changes necessary.
So this doesn't attempt to address #1341 (and associated PR #1342) as-is; that could be done in a separate PR and would also involve changes to the tests. I suppose that could either be done by removing the NOWAIT
or adding explicit retry logic, similar to #1342.
I'm happy to submit that PR after this is merged if no one else has plans to already, not trying to increase scope of your PR. Just making sure I understand. 🙂
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to add an inline comment :)
This works, but the lockForUpdate API seems a bit risky. Ideally it could be baked into ExecTx or something, but nothing's coming to mind at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes NOWAIT here is only because of tests. And yes, I think tests need to be redesigned a little bit, as changing the same auth_request concurrently wouldn't actually harm a login flow, and deadlock which current tests produce is not possible in real life situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably just update/remove the tests then, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't come to idea how to test concurrency better yet. Maybe later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the issue with interleaving connectors in the concurrent login flow mentioned in #1356 (comment) ?
Failing one of the transactions seems to be the only way to handle it in current implementation of handler/storage.
// | ||
// NOTE(ericchiang): For some reason using `SET SESSION CHARACTERISTICS AS TRANSACTION` at a | ||
// session level didn't work for some edge cases. Might be something worth exploring. | ||
executeTx: func(db *sql.DB, fn func(sqlTx *sql.Tx) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anything use this anymore? can we nuke it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nuked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather fix the tests to remove the requirement for that NOWAIT (if I understand the situation correctly). I'll try to come up with something. 🤔
storage/sql/crud.go
Outdated
return fmt.Errorf("get keys: %v", err) | ||
} | ||
|
||
old, err = getKeys(tx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Let's ditch line 389 and go with old, err :=
here, now that the code is rearranged.
Wait a moment, I've misunderstood something there, around #1370 (comment). So, what would be a better set of test cases? Which app-level conditions do we want to avoid, when it comes to concurrent actions? (To be explicit -- don't wait for me based on #1370 (review), I have nothing up my sleeves) |
Can we revive this PR? I'm happy to help. We have an issue with the MySQL storage backend where according to the old conformance tests (the ones currently in master) SERIALIZABLE isolation level is needed, which has quite some issues on different MySQL implementations (as in #1511). Percona XtraDB doesn't let you use it by default, AWS Aurora doesn't have it at all, on MariaDB transaction_isolation is called tx_isolation so it is a pain TBH. FYI: I have rebased this branch on the current master and copied the |
💯 I'd be very interested in getting this resolved. I think the main blocker here is that we have no test suite that makes the error appear -- our contributors, however, seem to have that. This would help evaluating the different approaches that have been put forward when this issue had been on the table last winter.... |
SERIALIZABLE isolation level is not viable solution, as it will slow down the whole Postgres. And again there must be retries as you will always receive errors like "could not serialize access...".
Previous pull request for retries like #1356 were wrong as it resulted in deadlock in Database (you open transaction inside another transaction in conformance tests).
In SQL world the better solution is to lock the row explicitly with SELECT FOR UPDATE.