RFC on implementation of _changes feed in FoundationDB #401

kocolosk · 2019-03-19T22:46:45Z

Opening a PR now to get comments. There were three open questions on the mailing list:

How do we listen for new updates in feed=continuous?
How hard do we try to preserve exactly-once semantics?
What's the right way to handle a commit_unknown_result from FoundationDB?

I think we're close to consensus on the first two and have selected options in this RFC. I have an opinion about the third one as well but have left all the options I could think of in the RFC since it's quite fresh.

We could choose to have per-DB transaction ID subspaces (or not), but the cleanup process is global, and clear txn IDs for multiple DBs in a single transaction.

kocolosk · 2019-03-21T16:01:00Z

Another topic to keep in mind here is that in having a single HTTP response include all the updates across an entire database we're limiting the top-end write throughput for which this endpoint would be usable. Looking at some simple benchmarks I would guess that somewhere between 10k and 50k writes/sec we'll find that a single consumer of the _changes feed will not be able to keep up.

We've had other discussions about parallel access to the _changes feed in the current implementation of CouchDB. If those go forward it would likely be good to figure out how to reuse that machinery in the FoundationDB world. I can imagine for example creating a few different buckets into which we'd drop change events using consistent hashing or whatever. The sequences of the events in those buckets would still be global, so a consumer that downloaded all those parallel feeds could subsequently shuffle them to get a totally-ordered list of events.

Net - I've focused this RFC on reimplementing the existing API in a FoundationDB world, but there's good reason to try to evolve the API going forward.

davisp · 2019-04-10T16:56:02Z

So I managed to fall down the rabbit hole of "Handling Unknown Commit Results". I'm gonna include some thoughts here but I actually think this belongs in the RFC for revision metadata handling as I'll explain in a bit.

So the first time I read this I kinda glossed over the section thinking that it was actually going to end up not being an issue when coupled with the revision handling since we'd end up with conflicts elsewhere that would prevent the duplicate entry issues. However, that's not the whole story of what's actually going on that the fdb level here.

Hopefully to help clarify the situation here, there are generally speaking three different scenarios related to the application of a transaction:

Transaction is applied 0 times
Transaction is applied 1 times
Transaction is applied N>1 times

I was originally focused on the third situation. That case is mostly (though not entirely it turns out!) handled by the way that we read from the ?REVISIONS key space. Any two writes that happen will end up with conflicting transactions and that leaves us in the happy land. Concurrent clients will either apply non-conflicting updates or all but one will end up detecting a CouchDB conflict and returning that to the client.

However, another aspect of this is detecting between situation 1 and 2 in the face of an unknown commit status. At this point we could just as well throw an error to the client but that's unlikely to lead to happy fun times for our users so attempting to resolve that issue is part of the motivation here.

The third option in the RFC about creating transaction ids makes this a lot easier to understand. Basically with every transaction we write a randomized key to the database. At the start of the transaction we check for the key and if it exists we just return successfully to the client. Options 1 and 2 are variations on storing this transaction id inside the ?REVISIONS subspace so that we can check if our update was already applied. The flow charts are a bit hard to reason through given the shared vocabulary between old CouchDB concepts and new FoundationDB concepts.

So that said, while the duplicate entries in the changes feed that initially motivated this discussion should not be an issue, given how we access the ?REVISIONS subspace. However, there is in fact an issue around recreating a deleted document. Due to recreations just taking whatever state of the revision tree exists, if we had two clients that were racing it's theoretically possible for a "recreate deleted doc" transaction to be doubly applied if we don't prevent it on our end. That particular scenario would be something like:

Client A recreates deleted doc, receives unknown commit status
Client B deletes doc again, doesn't matter what status
Client A retries transaction, finds newly deleted doc and recreates it

This is because a recreation does not actually go through normal MVCC since that would require clients to have looked up a possibly deleted document on every document creation. We just grab whatever is there. (Also related, initial doc creation does not have this issue because the absence of any revisions is treated as the precondition).

Of the three options in the RFC I think that Option 3 is probably the best path forward as its relatively straight forward both conceptually and implementation wise. The one thing I'd tweak is to change the cleanup aspect. Rather than having Erlang nodes periodically sweeping ets tables I'd prefer something a bit more reliable to ensure we're not slowly accumulating garbage in the transaction id key space.

I'm not 100% sure on the best route forward on this. Two thoughts I've had are to pair a UUID with a timestamp and then have each request probabilistically add in a clear for that keyspace prior to some previous time (i.e., 1 in 1,000 chance that any given request will clear out any transaction ids older than an hour). Obviously time in a distributed system is not a thing so that makes me a bit concerned on how that'd mess up.

A second approach to consider would be to include the hostname of the Erlang node and then just clean based on a local node's time which is less terrible but maybe some sort of monitoring of pids in transactions might be enough to know what can be cleared? Though that sounds scarily like it'd turn into our new couch_server.

kocolosk · 2019-04-10T17:59:41Z

It seems like we really need to sit down and think hard about the recreated document scenario and whether the current approach of extending a branch is even correct. It has so many weird edge cases like the "validation function bypass on replication" one.

…03-changes-feed

kocolosk · 2019-09-26T15:24:09Z

I've cleaned up the RFC to reflect our conclusion on handling of unknown commit results. I think this is ready for a final merge unless anyone has any objections

kocolosk added 9 commits March 19, 2019 18:43

WIP on _changes feed

825331a

Include deleted status in _changes subspace

91fd559

Add section on unknown_commit_result

d9d732f

Rename and renumber

ed532fc

Render mermaid diagrams

660ef06

Fix URLs

d7cbf4f

Finish the first draft

924b66b

Fix typo

3c31e51

Add a note that cleanup does not need to be per-DB

7455076

We could choose to have per-DB transaction ID subspaces (or not), but the cleanup process is global, and clear txn IDs for multiple DBs in a single transaction.

Follow tuple encoding for booleans

2b6746a

kocolosk added 9 commits September 17, 2019 11:22

Merge branch 'master' into rfc/003-changes-feed

3da42f9

Update Incarnation definition

52aaf37

Add link to RFC001

f520748

Merge remote-tracking branch 'origin/rfc/003-changes-feed' into rfc/0…

b437c04

…03-changes-feed

Merge branch 'master' into rfc/003-changes-feed

68e1f02

Minor edits and cleanup

866347a

Reflect decision on commit_unknown_result handling

8ba110f

Merge remote-tracking branch 'origin/HEAD' into rfc/003-changes-feed

86c2974

Merge remote-tracking branch 'origin/rfc/003-changes-feed' into rfc/0…

6d4e6e4

…03-changes-feed

Merge branch 'master' into rfc/003-changes-feed

d9e8006

kocolosk merged commit afb929c into master Oct 10, 2019

kocolosk deleted the rfc/003-changes-feed branch October 10, 2019 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC on implementation of _changes feed in FoundationDB #401

RFC on implementation of _changes feed in FoundationDB #401

Uh oh!

kocolosk commented Mar 19, 2019

Uh oh!

kocolosk commented Mar 21, 2019

Uh oh!

davisp commented Apr 10, 2019

Uh oh!

kocolosk commented Apr 10, 2019

Uh oh!

kocolosk commented Sep 26, 2019

Uh oh!

Uh oh!

RFC on implementation of _changes feed in FoundationDB #401

RFC on implementation of _changes feed in FoundationDB #401

Uh oh!

Conversation

kocolosk commented Mar 19, 2019

Uh oh!

kocolosk commented Mar 21, 2019

Uh oh!

davisp commented Apr 10, 2019

Uh oh!

kocolosk commented Apr 10, 2019

Uh oh!

kocolosk commented Sep 26, 2019

Uh oh!

Uh oh!