Skip to content

Conversation

@KrzysFR
Copy link
Contributor

@KrzysFR KrzysFR commented Apr 25, 2018

Add initial support for VersionStamps in the newest API

  • Add VersionStamp struct
  • Add support to Tuple Encoding
  • Implement fdb_transaction_get_versionstamp
  • Implement VersionStampedKey and VersionStampedValue atomic mutations
  • Add unit tests
  • XML Comments
  • Samples / Tutorials

Open questions: see discussion at https://forums.foundationdb.org/t/implementing-versionstamps-in-bindings/250

  • Should the same struct support both 80-bits and 96-bits VersionStamps ?
  • Should the same struct support both "Incomplete" and "Complete" VersionStamps?
  • How to expose a nicer API when using retry loops (WriteAsync(...), ReadWriterAsync(...))
  • What should be the textual representation of a Versionstamp (for ToString() and DebuggerDisplay). Currently it is "@VERSION-ORDER" / "@VERSION-ORDER#USER" and "@?" / "@?#USER" for incomplete stamps.

Example usage:

readonly struct VersionStamp
{
    readonly ulong TransactionVersion;
    readonly ushort TransactionOrder;
    readonly ushort UserVersion; // or 0 if 80-bit

    bool HasUserVersion { get; } // false: 80-bit, true: 96-bit
    bool IsIncomplete { get; }

    Slice ToSlice();
    void WriteTo(Slice dest);
    void WriteTo(ref SliceWriter writer);

    VersionStamp Parse(Slice packed);
    bool TryParse(Slice packed, out VersionStamp stamp);
}

VersionStamp stamp1 = VersionStamp.Incomplete(); // 80-bit, no user version
VersionStamp stamp2 = VersionStamp.Incomplete(42); // 96-bit, user version 42

Slice key1 = stamp1.ToSlice(); // => 10 bytes
Slice key2 = stamp2.ToSlice(); // => 12 bytes

Slice packedStamp = ....;
VersionStamp stamp3 = VersionStamp.Parse(packedStamp);
if (!VersionStamp.TryParse(packedStamp, out var stamp4))
{
    throw new Exception("....");
}

Some conventions:

  • incomplete VersionStamps always have the highest 8 bits of their Transaction Version set to 1. This is what is used by deserializers to recognize them from complete stamps. The default being all bits set to 1
  • transactions can create random tokens, but they still need to set the highest 8 bits to respect the point above.
  • the UserVersion field of 80-bit stamps will always be 0, but should not be accessed (right now I'm not throwing, maybe it should?)

KrzysFR added 4 commits April 25, 2018 14:08
- Handle both 80-bit and 96-bit sizes
- Use internal flag to distinguish between both sizes, and incomplete/complete
- Support both 80-bits and 96-bits variants
- BUGBUG: cannot recognized complete/incomplete stamps yet when parsing.
@KrzysFR KrzysFR added api Issues or changes related to the client API layer:tuple Tuple Encoding Layer labels Apr 25, 2018
@KrzysFR
Copy link
Contributor Author

KrzysFR commented Apr 25, 2018

Several issues with the way Versionstamps are implemented in other bindings:

  • Java and Python only expose the larger 96-bits Versionstamp. If a user version is not specified, then a value of 0 is assumed. So their Tuple Layers only handler 12 bytes stamps with prefix 0x33.
  • They represent the incomplete stamp as all-FFs, though internally they seem to have a boolean to flag incomplete vs complete instances
  • Tuple support is achieved with a custom method packWithStamp() that has to track the offset where the stamp is located. There are overloads that deal with subspace prefixes (offset need to be adjusted).

This makes it a bit difficult to insert support of VersionStamps with the existing eco-system of Key and Value encoders (via TypeSystem and the various dynamic and typed subspaces). If we go the same route, we would need to change everything to output the extra "offset" field that tracks potential stamps. This would also not be compatible with other non-tuple based encoding schemes (binary, protobuf, hand-rolled).

An idea would be to represent incomplete versionstamps using a custom byte sequence, which is recognized and used to lookup the offset at the last minute before performing the SetVersionStampedKey/Value mutation (via "IndexOf(...)")

Pros:

  • Compatible with all existing encoding schemes, and all APIs are untouched.
  • Only VersionStamp-based atomic mutations need to look for the token, but other methods (Set, Get, GetKey, ...) could add a failsafe and throw if they see it in a regular key (most probably a bug).
  • Would work for both 80-bit and 96-bits stamps.

Cons:

  • The byte sequence could be used elsewhere in the key by change. So all-FF or all-00 is probably not a good idea.
  • Would diverge from Java/Python binding API.

One way to prevent the issue of the placeholder sequence conflicting with some other part of the key, would be to use a random token per transaction, and expose the incomplete stamp factory methods on the ITransaction itself. All methods would throw if they see the token twice, and it would change on the next retry. Probably that next random tokens also conflicts in the same key would be low.

await db.WriteAsync((tr) =>
{
    tr.SetVersionStampedKey(
        location.Keys.Encode("Foo", tr.VersionStamp()), 
        Slice.FromString("Hello World")
    );
    tr.SetVersionStampedKey(
        location.Keys.Encode("Bar", tr.VersionStamp(42)),
        Slice.FromString("Hello World")
    );
}, ct);

The call tr.VersionStamp() would return a 80-bit stamp with a random token that would change for each transaction (but be constant during the transaction lifetime). The call tr.VersionStamp(42) would in the same way return an 96-bit stamp with user version 42.

Pros:

  • The whole process of generating random tokens and checking them is hidden away from the normal user
  • Stamps produced by different concurrent transactions, or by multiple retries of the same transaction WILL be different.

Cons:

  • Need to have an instance of the transaction to create a stamp.
  • If incomplete stamps can be anything, Tuple Encoding cannot distinguish between a complete or incomplete stamp!

Possible solution for last point: if we can ensure that a Transaction Version generated by the database CANNOT have the higher bit set to 1 (ie: cannot have version numbers larger than 2^63) then we could use this bit as a marker. All random incomplete stamps would have this bit set, and all complete stamps would have this bit unset.

@KrzysFR
Copy link
Contributor Author

KrzysFR commented Apr 26, 2018

Another issue: the call to tr.GetVersionStampAsync() must be done before commiting the transaction, but it will complete after the transaction has committed. This creates a lot of problems with the current API

(var result, var stampTask) = await db.ReadWriteAsync((tr) =>
{
    // read/set some keys
    tr.SetVersionStampedKey(location.Keys.Encode("Hello", VersionStamp.Incomplete()), Slice.FromString("World!");

   // if we want to know the stamp, we have to start the task here
   var task = tr.GetVersionStampAsync();
  // but it won't complete until we commit ourselves!!
  //BUGBUG: calling 'task.Result' or 'await task' here would DEAD LOCK!

  return (...., task); // <-- this is weird having to shiip a Task<VersionStamp> as part of the result!
}, ct);

var stamp = await stampTask; // need an additional await after the fact! :(

At the moment, the only solution is to return thas Task<VersionStamp> alongside the result, and let the caller of the retry loop await it and deal with it.

The core issue is that the layers that must know the actual stamp value used, have to execute code AFTER the transaction has committed. When composed with retry loops that control the lifetime of the transaction, it means that code inside the lambda must be able to schedule more code to execute outside the scope of the lambda!

We cannot do much about this, because the low level binding API is designed like this.

We have three choices:

  1. don't deal do anything about it and the binding level, and let the user deal with it. May scare away users or produce horrible code.
  2. have some specialized overloads of WriteAsync/ReadWriteAsync that would return the resolved VersionStamp along the result?
  3. create a new pattern for retry loops that add another "onSuccess" handler that will be called after the transaction commits, and has access to transaction details such as the commit version and stamps generated?

Choice 2 does not solve the issue in all case.

Choice 3 splits code in two, and also may lead to a bad-practice pattern: Business Logic code or Layers that needs to do this will need to have access to the database instance, and call ReadWriteAsync themselves, which makes them not composable with others.

For example, if inside a single HTTP request to an MVC Controller, I need to do 2 or 3 operations (using different layers), and if at least one of them wants to handle the transaction lifetime itself, then they cannot share the same transaction. This will probably lead to mutiple transactions called sequentially, and will 1) introduce more latency, 2) break ACID guarantees if the second or third transaction fails.

KrzysFR added 2 commits April 26, 2018 20:31
…using a random token)

- Each transaction generate a random token (and on each retry).
- tr.CreateVersionStamp() can be used to get a stamp specific to this transaction
@KrzysFR
Copy link
Contributor Author

KrzysFR commented Apr 27, 2018

Message Queue Sample: [TODO: not complete]

public class FdbMessageBus
{
    public ITypedKeySubspace<string, VersionStamp> Subspace { get; }

    public FdbMessageBus(IKeySubspace folder)
    {
        this.Subspace = location.UsingEncoder<string, VersionStamp>();
    }

    public void PostMessage(IFdbTransaction tr, string queueId, Slice message)
    {
        tr.SetVersionStampedKey(
            this.Location.Keys[queueId, tr.CreateVersionStamp(0)],
            message
        );
    }

    public void PostMessages(IFdbTransaction tr, string queueId, IEnumerable<Slice> messages)
    {
        int idx = 0;
        foreach(var msg in messages)
        {
            tr.SetVersionStampedKey(
                this.Location.Keys[queueId, tr.CreateVersionStamp(idx++)],
                msg
            );
        }
    }

    //TODO: consuming messages

}

@KrzysFR
Copy link
Contributor Author

KrzysFR commented Apr 27, 2018

The current state of the PR allow basic usage of VersionStamps:

  • VersionStamp struct can model 80-bit and 96-bits stamps
  • It is supported by the Tuple encoder natively
  • SetVersionStampedKey(..) and SetVersionStampedValue(...) are implemented
  • Transactions can generate incomplete stamps that are randomized on each retry, via one of the CreateVersionStamp(...) overloads
  • The actual stamp used by the transaction can be obtained via GetVersionStampAsync(..) but this is currently a bit ugly (task must be obtain before calling commit, but should not be awaited until it completes)

I'm going to address the last point in a future PR, at least we can start playing with versionstamps !

@KrzysFR KrzysFR merged commit 06f8c42 into master Apr 27, 2018
@KrzysFR KrzysFR deleted the dev/versionstamps branch October 25, 2018 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Issues or changes related to the client API layer:tuple Tuple Encoding Layer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants