Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grain IDs are limited to strings, longs, and GUIDs, and stream IDs are limited to GUIDs #3049

Open
lwansbrough opened this Issue May 25, 2017 · 13 comments

Comments

Projects
None yet
7 participants
@lwansbrough
Copy link

lwansbrough commented May 25, 2017

This is a proposal for an update to the ID system that currently exists in Orleans (for both grains and streams). From what I understand the grain ID system works around several factors: a hash and a key which may be a string, long, or GUID, or a combination of these. The API around defining the ID for the grain, that is: GetGrain<T>(...) is rather limited, in that it only supports the types mentioned. While this may work in many cases, it comes with some considerable limitations. Some of the major limitations that I've discovered can be found here:

#1121
#1905

There have been proposals for one off patches for particular cases which have been applied, however I think it's worth considering a new approach to how grain IDs are generated. I'm not intimately familiar with the system, but I've taken a bit of a look - so forgive me if I'm ignorant to the complexity of the ID system.

It seems to me that instead of using several primitives stored in a database separately, used as a fallback to a hash, we could simply use a byte[] key instead.

A byte[] key allows us to embed the key type within the key itself, thus removing the complexity of managing separate types at various levels of the Orleans system.

The current recommendation for anyone looking to use a byte[] as the key is to use a GUID and fill the bytes yourself. While this works in many cases, the limitation of 16 bytes (the number of bytes in a GUID) has negative implications in practical applications.

An example of such a limitation is a system where you're receiving IDs from an outside source: If you wish to store the source ID inside a GUID, for instance, and you also want to store a vendor ID with the GUID, you can no longer use a GUID to contain both the source ID and the vendor ID. You could if the source ID and the vendor ID were both longs (8 bytes each), but again this can't be controlled.

The solution is fairly simple in Orleans, just use a composite key. But now you're unable to use implicit streams, because they require a single GUID.

A byte[] key would fit all of the requirements for Orleans grains: no external dependencies to encode/decode, shardable, unique as required, compatible with all databases, and is flexible enough to support all current grain ID types.

Here's how I propose it would be done:

  • Add a new API GetGrain<T>(byte[] idBytes)
  • Back all GetGrain<T>(...) APIs with a method like: GrainIdV2.ParseFrom(long), GrainIdV2.ParseFrom(Guid), etc.
  • The GrainIdV2.ParseFrom(GrainIdV2Types, byte[]) would be invoked by all the other ParseFrom methods

GrainIdV2.ParseFrom(Guid) looks like this:

GrainIdV2 ParseFrom(Guid id) {
    return ParseFrom(GrainIdV2Types.Guid, id.ToByteArray());
}

and then GrainIdV2.ParseFrom(GrainIdV2Types, byte[]) looks like this:

GrainIdV2 ParseFrom(GrainIdV2Types type, byte[] sourceIdBytes) {
    byte[] idBytes;
    switch (type) {
        case GrainIdV2Types.Guid:
            idBytes = new byte[17];
            idBytes[0] = (byte)type;
            idBytes[1 through 16] = sourceIdBytes;
            break;
        case GrainIdV2Types.Long:
            idBytes = new byte[9];
            idBytes[0] = (byte)type;
            idBytes[1 through 8] = sourceIdBytes;
            break;
        default:
        case GrainIdV2Types.UTF8String:
        case GrainIdV2Types.Custom:
            idBytes = new byte[2 + sourceIdBytes.Length];
            idBytes[0] = (byte)type;
            idBytes[1] = sourceIdBytes.Length; // implies a 255 byte limit on any type of ID
            idBytes[2 through N] = sourceIdBytes;
            break;
        }
    return new GrainIdV2(idBytes);
}

This allows us to maintain the current API (thus leaving the door open to provide backwards compatibility) while also providing a more robust solution for developers who wish to get a little more creative with their IDs. Importantly, it also allows us to define a single ID standard for both grains and streams, allow more flexibility in stream IDs as well, all the while minimizing the memory + storage impact of IDs.

@sergeybykov sergeybykov added this to the Triage milestone Jun 8, 2017

@sergeybykov

This comment has been minimized.

Copy link
Member

sergeybykov commented Jun 14, 2017

Not pushing back or defending the current implementation, I'd like to better understand where you are coming from.

You seem to bring two related but different issues.

  1. Current choice of grain key types limits creativity of developers.
  2. Stream IDs in general and implicit subscriptions in particular do not support all possible combinations of grain ID types.

With regards to 1, our answer has been, whether it is satisfactory or not, that a string key allows to encode anything that an app may need, just like a URI. Do you think that passing a byte[] instead of a application constructed string would have a better usability?

2 is an ugly limitation, a design mistake I admit we made. However, if the answer to the previous question is "no", and we fixed 2 somehow to allow all the ID types supported today for grains as stream IDs, do you think that would be sufficient?

@lwansbrough

This comment has been minimized.

Copy link
Author

lwansbrough commented Jun 14, 2017

@sergeybykov Yes, I suppose this is mostly about (2). (1) I suspect is an easier fix, though I'm intrigued by the current implementation (for instance, why use two longs instead of just a byte[]?)

I do think supporting all types of grain IDs in streams is important, whichever way it ends up being implemented. It's a limitation that is hard to decipher as an end user.

Conceptually, I suppose it doesn't matter whether I'm passing an encoded string of bytes or a raw byte array. I can't remember if the API allows string-only keys, but if it does then that would be sufficient yes. The problem of course is that the key extension stuff is a heavier implementation than the stream keys.

I suppose the point I'm trying to make is there may be an implementation which doesn't have to add the weight of the key extension implementation to the stream keys. That implementation - in my mind - is to back everything with a simple, single byte array. That way, whether its a grain ID or a stream ID, and whether the ID is a long or a GUID or a string, you'd always be using approximately the least amount of bytes possible. Hope that makes sense.

@sergeybykov

This comment has been minimized.

Copy link
Member

sergeybykov commented Jun 20, 2017

I can't remember if the API allows string-only keys, but if it does then that would be sufficient yes.

It does.

My concern about byte[] vs. string is that I think byte[] is significantly less developer friendly because it forces you to binary-serialize even trivial types. At the same time, strings are universally supported in .NET, e.g. with object.ToString(), string.Format(), etc., and developers are much more used to constructing strings than byte arrays.

@lwansbrough

This comment has been minimized.

Copy link
Author

lwansbrough commented Jun 20, 2017

@sergeybykov Sure. I think the main proposal I have is that the front end interface can stay the same, ie. GetGrain<T>(string) but the grain string ID instead of being backed by the KeyExt, it's backed by a byte[], as are the GUID, long, etc. That way your stream IDs can be backed by the same (byte[]) and you no longer have to keep the KeyExt, N0, N1 stuff.

@sergeybykov

This comment has been minimized.

Copy link
Member

sergeybykov commented Jun 20, 2017

So you are more concerned about the underlying implementation, correct?

@lwansbrough

This comment has been minimized.

Copy link
Author

lwansbrough commented Jun 21, 2017

@sergeybykov Yeah, mostly. Only in that it's the cause of limitations in Orleans (regarding Stream IDs, specifically.) The existing system has obviously had some thought put into it, but I think that thought was had pre-streams.

@sergeybykov

This comment has been minimized.

Copy link
Member

sergeybykov commented Jun 21, 2017

If we aligned stream IDs with grain IDs (#1121), would that be sufficient from your POV?

I'm trying to keep the questions of the APIs and the programming model separate from the implementation details. I see pros and cons for N0-N1 vs. byte[], but I think it's better to treat that question as an implementation detail, so that it is only concerned with performance and complexity of the code.

@lwansbrough

This comment has been minimized.

Copy link
Author

lwansbrough commented Jun 22, 2017

@sergeybykov Yeah totally, from a user perspective that is fine. Performance is obviously a concern, but at least in my case, some performance is better than no performance. :)

@shlomiw

This comment has been minimized.

Copy link
Contributor

shlomiw commented Jul 17, 2017

Same here. Having a situation where I need stream id to be an integer, so now I have to do kind of int -> Guid (and viceversa) mapping.

@sergeybykov sergeybykov modified the milestones: Triage, Backlog Sep 22, 2017

@jamescarter-le

This comment has been minimized.

Copy link
Contributor

jamescarter-le commented Nov 6, 2017

+1 for having StreamID's mirror GrainID possible types (string, long, Guid).

@luohuazhiyu

This comment has been minimized.

Copy link

luohuazhiyu commented Nov 21, 2017

+1 for StreamID should support the types (string, long, Guid),Or only support string.

@srberard

This comment has been minimized.

Copy link

srberard commented Nov 27, 2017

+1 on mirroring GrainID types.

@veikkoeeva

This comment has been minimized.

Copy link
Contributor

veikkoeeva commented Apr 28, 2018

Vote for mirroring grain IDs here too.

A bit of topic, what I was writing at #1121 (comment) is content-based hashing One way of seeing it is https://ipld.io/. This can be implemented on top of the proposed primitives and requires experimentation, so a subject of another issue. But I must say it'd blend nicely with virtual actos. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.