Skip to content
This repository was archived by the owner on Oct 17, 2022. It is now read-only.

Conversation

kocolosk
Copy link
Member

Opening a PR now to get comments. There were three open questions on the mailing list:

  1. How do we listen for new updates in feed=continuous?
  2. How hard do we try to preserve exactly-once semantics?
  3. What's the right way to handle a commit_unknown_result from FoundationDB?

I think we're close to consensus on the first two and have selected options in this RFC. I have an opinion about the third one as well but have left all the options I could think of in the RFC since it's quite fresh.

@kocolosk
Copy link
Member Author

Another topic to keep in mind here is that in having a single HTTP response include all the updates across an entire database we're limiting the top-end write throughput for which this endpoint would be usable. Looking at some simple benchmarks I would guess that somewhere between 10k and 50k writes/sec we'll find that a single consumer of the _changes feed will not be able to keep up.

We've had other discussions about parallel access to the _changes feed in the current implementation of CouchDB. If those go forward it would likely be good to figure out how to reuse that machinery in the FoundationDB world. I can imagine for example creating a few different buckets into which we'd drop change events using consistent hashing or whatever. The sequences of the events in those buckets would still be global, so a consumer that downloaded all those parallel feeds could subsequently shuffle them to get a totally-ordered list of events.

Net - I've focused this RFC on reimplementing the existing API in a FoundationDB world, but there's good reason to try to evolve the API going forward.

@davisp
Copy link
Member

davisp commented Apr 10, 2019

So I managed to fall down the rabbit hole of "Handling Unknown Commit Results". I'm gonna include some thoughts here but I actually think this belongs in the RFC for revision metadata handling as I'll explain in a bit.

So the first time I read this I kinda glossed over the section thinking that it was actually going to end up not being an issue when coupled with the revision handling since we'd end up with conflicts elsewhere that would prevent the duplicate entry issues. However, that's not the whole story of what's actually going on that the fdb level here.

Hopefully to help clarify the situation here, there are generally speaking three different scenarios related to the application of a transaction:

  1. Transaction is applied 0 times
  2. Transaction is applied 1 times
  3. Transaction is applied N>1 times

I was originally focused on the third situation. That case is mostly (though not entirely it turns out!) handled by the way that we read from the ?REVISIONS key space. Any two writes that happen will end up with conflicting transactions and that leaves us in the happy land. Concurrent clients will either apply non-conflicting updates or all but one will end up detecting a CouchDB conflict and returning that to the client.

However, another aspect of this is detecting between situation 1 and 2 in the face of an unknown commit status. At this point we could just as well throw an error to the client but that's unlikely to lead to happy fun times for our users so attempting to resolve that issue is part of the motivation here.

The third option in the RFC about creating transaction ids makes this a lot easier to understand. Basically with every transaction we write a randomized key to the database. At the start of the transaction we check for the key and if it exists we just return successfully to the client. Options 1 and 2 are variations on storing this transaction id inside the ?REVISIONS subspace so that we can check if our update was already applied. The flow charts are a bit hard to reason through given the shared vocabulary between old CouchDB concepts and new FoundationDB concepts.

So that said, while the duplicate entries in the changes feed that initially motivated this discussion should not be an issue, given how we access the ?REVISIONS subspace. However, there is in fact an issue around recreating a deleted document. Due to recreations just taking whatever state of the revision tree exists, if we had two clients that were racing it's theoretically possible for a "recreate deleted doc" transaction to be doubly applied if we don't prevent it on our end. That particular scenario would be something like:

  1. Client A recreates deleted doc, receives unknown commit status
  2. Client B deletes doc again, doesn't matter what status
  3. Client A retries transaction, finds newly deleted doc and recreates it

This is because a recreation does not actually go through normal MVCC since that would require clients to have looked up a possibly deleted document on every document creation. We just grab whatever is there. (Also related, initial doc creation does not have this issue because the absence of any revisions is treated as the precondition).

Of the three options in the RFC I think that Option 3 is probably the best path forward as its relatively straight forward both conceptually and implementation wise. The one thing I'd tweak is to change the cleanup aspect. Rather than having Erlang nodes periodically sweeping ets tables I'd prefer something a bit more reliable to ensure we're not slowly accumulating garbage in the transaction id key space.

I'm not 100% sure on the best route forward on this. Two thoughts I've had are to pair a UUID with a timestamp and then have each request probabilistically add in a clear for that keyspace prior to some previous time (i.e., 1 in 1,000 chance that any given request will clear out any transaction ids older than an hour). Obviously time in a distributed system is not a thing so that makes me a bit concerned on how that'd mess up.

A second approach to consider would be to include the hostname of the Erlang node and then just clean based on a local node's time which is less terrible but maybe some sort of monitoring of pids in transactions might be enough to know what can be cleared? Though that sounds scarily like it'd turn into our new couch_server.

@kocolosk
Copy link
Member Author

It seems like we really need to sit down and think hard about the recreated document scenario and whether the current approach of extending a branch is even correct. It has so many weird edge cases like the "validation function bypass on replication" one.

@kocolosk
Copy link
Member Author

I've cleaned up the RFC to reflect our conclusion on handling of unknown commit results. I think this is ready for a final merge unless anyone has any objections

@kocolosk kocolosk merged commit afb929c into master Oct 10, 2019
@kocolosk kocolosk deleted the rfc/003-changes-feed branch October 10, 2019 15:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants