Allow Stream Id key to be String or CompositeKey, not only Guid #1121

centur · 2015-12-08T03:38:29Z

Given that Stream listeners are just generic grains - it looks strange that there is a limitation that Stream listener with implicit subscription can have only Guid key.
Permitting string or composite keys can allow developers to use streams in more natural way - e.g. use original object id as it's stream id, instead of sending it to some guid and passing original ID as part of the context or message. E.g. each grain can act as own events producer and all events can be easily identified by it's Id, without lookup.

This is long-lapsed follow-up of a discussion in Gitter.

gabikliot · 2015-12-08T17:46:35Z

Makes sense.

randa1 · 2015-12-09T18:21:33Z

+1 👍

jamescarter-le · 2016-01-16T17:49:00Z

+1

amamh · 2016-01-19T12:29:47Z

+1

zeus82 · 2016-01-21T00:58:21Z

+1 - Can we not replace StreamId with the UniqueKey class that's used by grains?

cvalusek · 2016-02-20T00:03:21Z

I have this functioning with the built-in providers as well as my own with some compatibility breaks and need a ruling from the team on how much to focus on compatibility versus code complexity.

I will need to add KeyExt to IStreamIdentity and IBatchContainer. I could add separate interfaces instead that inherit from these and use a safe cast to be more compatible. This makes it harder to implement a stream provider. It would, however, be fully binary compatible (with assembly binding redirects). I think using UniqueKey is your best bet in the long run, but that should cause some pretty heavy compatibility breaks.

A similar problem comes up for the arguments on IStreamProvider.GetStream, IQueueAdapter.QueueMessageBatchAsync, and IStreamQueueMapper.GetQueueForStream. I have opted to create separate methods there and only call them when the KeyExt is set. This could be taken further by creating separate interfaces, but I opted not to do it, because I think this is a more practical trade. It only would cause problems where a KeyExtension was being used with an ImplicitStreamSubscriber that expected the key extension to be lost.

cvalusek · 2016-02-23T18:02:57Z

After a moderate review of code using StreamId, I have verified the understated comment in the documentation.

Implementing a completely new Stream Provider might turn to be a rather complicated task, since you might need access to various internal runtime components, some of which may have internal access.

I highly doubt anyone has implemented their own stream provider at this point. This biggest blocking factor is the IPubSubImpl being internal. For this reason, I am going to move the change to IStreamIdentity into the second category I listed above and just make the change optimized for clean code instead of backwards compatibility. I think the statement about IBatchContainer stands, so I have added a separate interface there to allow stream providers to transition into supporting KeyExt.

jason-bragg · 2016-02-24T01:07:13Z

Initially I was of the opinion that the stream ID's should probably follow the same ID pattern as grains, making it more natural to map streams to grains, but upon reflection, I'm not terribly fond of the grainId pattern.

Guids have their problems, but they are very good for what they are, unique Id's. Opening this to a wider set of data types, while beneficial to stream users, adds complexity (and possibly performance issues) to stream infrastructure and stream provider implementations.

My concerns are:

Adding a KeyExt string to the system adds a fair degree of complexity with limited expressiveness.
The complexity and overhead (performance, memory) of supporting the KeyExt will be paid by all users (and provider developers) of streams, whether they need that flexibility or not.

Question: Is there a way to provide a high degree of flexibility, with low complexity?

Assuming stream infrastructure was made public for other stream provider developers..
What if the stream infrastructure (PubSub mainly) supported a stream id something like:

class StreamKey<TKey> where TID : struct
{
    public TKey Key { get; set; }
    public IEqualityComparer<TKey> { get; set; }
}

class StreamId<TKey> where TKey: struct
{
    public StreamKey<TKey> Id { get; set; }
    public string Namespace { get; set; }
    public string ProviderName { get; set; }
}

and the stream provider interface included the key type.

public interface IStreamProvider<TKey> where TKey: struct
{
    ...
    IAsyncStream<T> GetStream<T>(TKey streamId, string streamNamespace);
    ...
}

The existing stream providers would support only Guid keys, but could be updated to support a wider set, or even user defined set of types. Similar with future stream providers. Specialized stream providers that favored simple high performance key types like 'long' could do so, while more general stream providers like azure queue could support most any key type users wished to define.

Thoughts?

cvalusek · 2016-02-24T21:38:44Z

That is an interesting idea. Our interest in the ticket is coming from problems with the ImplicitStreamSubscriber when we use KeyExt on the grains. I think that going with a generic would make it even harder to make the implicit subscriptions compatible regardless of key format.

I think the benefits of using streams in Orleans are really coming with the integration to the grains.

I highly agree with opening up some of the streaming infrastructure at some point in the future. Not having at least the interfaces to the pub/sub implementation means that you really can't implement a full blown stream provider that can participate in pub/sub. However, the current constraints are allowing the more aggressive changes to the API while still keeping the system lightly extensible.

I have tried a couple different ways to solve the problem. The one I was favoring before your comment was to replace functions passing all the fields on IStreamIdentity with it instead. Then, I could quickly add KeyExt to IStreamIdentity. If we went that route, it might even be better to make StreamId public and just use it. I don't see a point to having the interface as things stand. That would be closer to what was done for QueueId.

Because of the way that all stream providers are probably funneling through PersistentStreamProvider, (aside from SMS which has a lot of similar code) I think it is highly unlikely we have a different implementation IStreamIdentity in existence. Having StreamId's equality comparison would tighten up some points in the code even more. This change leads to removing more code than it adds I think.

StreamId additionally has the ProviderName on it. This might be a reason not to use it, although I think the penalty for it is already being paid by most storage providers. I could look at adding StreamIdWithProvider which could differentiate between the scenarios where the ProviderName is needed. The main use I have seen for it is to make a static call out to GetProvider in some of the pub/sub code.

Maybe, also opening up a standard BatchContainer implementation would help cut out unnecessary code in stream provider implementations as well.

cvalusek · 2016-02-24T23:20:44Z

After some review of what things would be like if we got rid of IStreamIdentity and just used StreamId, there are some spots that feel wrong with respect to the ProviderName. I am going to play around with splitting StreamId into 2 classes that help split out the behaviour needing the ProviderName. There is a lot less code that uses that ProviderName. The largest issues are around spots that use it by proxy (like IRingIdentifier.GetUniformHashCode()).

Ideally, there would be a version that does not have ProviderName and can be put on IBatchContainer to clean up all the scenarios where the properties are being mapped and passed as arguments.

jason-bragg · 2016-02-25T00:36:43Z

I think that going with a generic would make it even harder to make the implicit subscriptions compatible regardless of key format.

Agreed, but the difficulty with identifying a stream to an implicitly subscribed grain is a separate problem which aggravated the stream identification issue. Making the decisions on the stream identification to accommodate an existing systems shortcomings rather than addressing them directly is, in my view, not a good idea. It's unfortunate that these issues are intertwined. :/

I think that going with a generic would make it even harder to make the implicit subscriptions compatible regardless of key format.

Harder, yes. My argument isn't based off of difficulty alone, it argues from relative value vs difficulty. The value provided by adding a KeyExt is very valuable to you because it is suffice for your need, but in a general sense, it's of relatively narrow value. The ability for stream providers to support various streamId, by making some of the infrastructure generic, has, in my view, far more utility, with only slightly more difficulty. This is just my assessment, I am fallible of course.

The main use I have seen for it (StreamId) is to make a static call out to GetProvider in some of the pub/sub code.

IPubSub interface uses it (StreamId) currently. If IPubSub is to be public it will not be able to use an internal type. I'm not yet clear of the significance of having an internal streamId type. Maybe @gabikliot or @sergeybykov can help.

A standard BatchContainer implementation would help cut out unnecessary code.

Possible, but batch containers are abstractions for queue specific data. We may be able to write one that fits most scenarios, but not I don't yet see clearly how we could write one that fit all scenarios. Until then, that abstraction will remain necessary.

gabikliot · 2016-02-25T05:48:42Z

There is no real fundamental reason to having different internal and public ids. Part of it is historic, part of it for efficiency: public id will usually have all kinds of restrictions, while the internal one can be highly optimized for perf. That is the reason for internal GrainId vs. public GrainReference, and also for SteamId vs. StreamIdentity. But it does not have to be that way. It should be possible to only have one, the public one, and implement it efficiently.

cvalusek · 2016-02-25T16:59:53Z

I'll try one more run through the change where I try with the generic TKey. You are going to need some kind of TKey => GrainId function for the implicit stream subscription code. I'll see if I can find a nice home for this.

marcinbudny · 2016-06-06T11:19:21Z

I can see this issue has been inactive for some time now. Do you plan to release this functionality any time soon?

sergeybykov · 2016-06-15T01:25:23Z

@marcinbudny Sorry about delay. It is unclear yet. We have a team that needs it, but can for now work around the limitation. Once we are done with a couple higher priority streaming workitems, we'll reassess.

cvalusek · 2016-06-15T02:07:54Z

We made some minor changes to get around it. I am still planning on helping with this after we get our setup production hardened.

lwansbrough · 2017-05-25T06:33:51Z

I'd like to see this implemented. At some point I suspect this will bite us. Currently we do not have control over the identifiers entering our system. So far it's been ints and longs, but eventually someone is going to want to use a GUID and our system will have to accommodate that. We read out the bytes of the ints and longs into a GUID. This wouldn't be an issue usually, but we're also using a few of the bytes of the GUID for metadata, so augmenting a GUID with metadata in a GUID would result in loss of information. Some would argue augmenting the GUID is a bad idea, but for us it's the most efficient way to acquire metadata during activation.

If anyone has a better solution then I'm all ears. But I know being able to use a composite key would be a perfect solution for us. Not sure what keys do internally, but it seems to me like they should just be byte arrays where the first byte is the length. That way you can support anything as a key up to 255 bytes. Keeps the API simple for devs too.

veikkoeeva · 2017-05-25T08:14:20Z

@lwansbrough You may have thought about this already, but I'll add to facilitate for larger discussion, one could encode whatever data to version 3 or 5 UUIDs and use those as GUIDs. The overall idea is good though, maybe define something like a struct OrleansId which had explicit for byte[] transformation.

Maybe one doesn't want to use UUIDs (GUIDs), so it looks to me that taking this into a more logical conclusion would be to use Naming Things with Hashes (RFC 6920) and Defining Well-Known Uniform Resource Identifiers (URIs) (RFC 5785), which would allow one for a uniform way to point to objects both in storage and in clusters and even from outside in. The resulting URIs (embedded in some payload) could be something like

"ni://myorganisation.com/sha-256;abcdfg-ABCDFG?hash=sha-256&param1=1&someparam=somevalue".

lwansbrough · 2017-05-25T15:30:20Z

@veikkoeeva The problem with using GUIDs to store data is that there's a 16 byte maximum. I'm having trouble understanding what prevents us from maintaining all the existing APIs but with a change where they're now all backed by byte[] instead of their own primitive types (strings, longs, and Guids).

Are you proposing that URI would be the grain ID? It's an interesting idea but that's ludicrously long and inefficient on memory. Besides, with a byte[] anyone would be welcome to implement that scheme as a string ID.

lwansbrough · 2017-05-25T15:34:06Z

@veikkoeeva Yeah the GrainId struct with a byte[] makes a lot of sense to me personally.

veikkoeeva · 2017-05-25T16:42:04Z

@lwansbrough I suppose it's just work and maybe backwards compatibility for custom tooling.

I'm not proposing that URI would be the grain ID. I'm outlining a thought the ID could be represented as a string amongst any other types (for convenience), but internally be byte[] hashed in certain way. Say it could be a v5 UUID, 160 bytes, or maybe a SHA-256. If it's that internally, what if one expands this point of view beyond Orleans?

One way could to make an explicit design decision and some facility code in Orleans that the data is stored with hashing the grain type and ID (and deployment ID). Then if Orleans had a default way of pointing to actors/data outside, it could resemble that way. Notably having a bit of cryptoagility to tell the hash used in case it'll change over time and then have the Orleans codegen to create a table that maps the type strings to hashed ones and the IDs can be translated dynamically (and cached).

This is what the ADO.NET storage provider does already, it hashes both the grain type and ID, uses them as the indexed storage ID and solves hash collisions by ordinary string comparison (like GetHashCode and Equals in Dictionary, say).

u-less · 2017-11-21T09:18:44Z

+1 for StreamID should support the types (string, long, Guid),Or only support string.

eisendle · 2017-11-21T11:25:46Z

+1 👍

srberard · 2018-01-06T16:50:20Z

+1

Has anyone started working on this?

sergeybykov · 2018-01-09T08:49:42Z

We decided to postpone this until after 2.0.0 and make it part of revisiting the streaming API altogether.

kutensky · 2018-04-14T17:22:20Z

Hi folks. I was thinking about implementation of this. And according to what we desire to have and what structure we have, i think the best implementation should be next one:
As @lwansbrough proposed, we should replace Guid to byte[] as a stream key. Public methods for accessing stream (such as IStreamProvider, IGrainFactory, etc.) should have 3 overloads: for Guid, for string and for T : struct. That will help us to support previous implemntation with Guid as a key, and provide an ability for customers to use string or any struct type.
Inside those public methods we will serialize Guid, string or struct to bytes array using Orlean's "ByteArrayBuilder".
Also we need to create few classes for StreamId. For example "StreamId" (for Guid), "StreamId where TKey: struct" for struct and StreamIdString for string.
@jason-bragg , what do you think about such implementation. Could I try to impement it in that way?

amccool · 2020-07-29T19:55:27Z

@sergeybykov do you have a new issue in 3.x for the use of string rather than guid keying on implicit steams?

JanEggers · 2020-07-30T05:17:51Z

have a look at #6660

sergeybykov · 2020-08-07T03:44:06Z

@amccool It's among the 4.0 "back to basics" things.

benjaminpetit · 2021-09-07T15:05:39Z

Will be included in the next major release

ElanHasson · 2021-09-08T11:59:40Z

Awesome! Thank you!

gabikliot added the enhancement label Dec 8, 2015

gabikliot added the help wanted label Jan 16, 2016

sergeybykov assigned jason-bragg Feb 23, 2016

sergeybykov mentioned this issue Mar 10, 2016

Question/Suggestion: Streams with string/long ids #1549

Closed

sergeybykov mentioned this issue Jun 28, 2016

ImplicitStreamSubscription doesn't activate a grain that implements IGrainWithIntegerKey #1882

Closed

sergeybykov added this to the Backlog milestone Nov 5, 2016

sergeybykov removed the help wanted label Nov 5, 2016

jamescarter-le mentioned this issue Nov 17, 2016

Implementing a new Stream Provider, documentation? #2421

Closed

lwansbrough mentioned this issue May 25, 2017

Grain IDs are limited to strings, longs, and GUIDs, and stream IDs are limited to GUIDs #3049

Closed

benjaminpetit added the P3 label Jul 26, 2018

veikkoeeva mentioned this issue Mar 15, 2019

Adding optional TLS/encryption to transport channel #828

Closed

sergeybykov modified the milestones: Backlog, 4.0.0 Aug 7, 2020

ReubenBond assigned benjaminpetit and unassigned jason-bragg Apr 6, 2021

ReubenBond removed the P3 label Sep 3, 2021

benjaminpetit closed this as completed Sep 7, 2021

dotnet locked as resolved and limited conversation to collaborators Oct 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Stream Id key to be String or CompositeKey, not only Guid #1121

Allow Stream Id key to be String or CompositeKey, not only Guid #1121

centur commented Dec 8, 2015

gabikliot commented Dec 8, 2015

randa1 commented Dec 9, 2015

jamescarter-le commented Jan 16, 2016

amamh commented Jan 19, 2016

zeus82 commented Jan 21, 2016

cvalusek commented Feb 20, 2016

cvalusek commented Feb 23, 2016

jason-bragg commented Feb 24, 2016

cvalusek commented Feb 24, 2016

cvalusek commented Feb 24, 2016

jason-bragg commented Feb 25, 2016

gabikliot commented Feb 25, 2016

cvalusek commented Feb 25, 2016

marcinbudny commented Jun 6, 2016

sergeybykov commented Jun 15, 2016

cvalusek commented Jun 15, 2016

lwansbrough commented May 25, 2017 •

edited

veikkoeeva commented May 25, 2017 •

edited

lwansbrough commented May 25, 2017

lwansbrough commented May 25, 2017

veikkoeeva commented May 25, 2017

u-less commented Nov 21, 2017

eisendle commented Nov 21, 2017

srberard commented Jan 6, 2018

sergeybykov commented Jan 9, 2018

kutensky commented Apr 14, 2018

amccool commented Jul 29, 2020

JanEggers commented Jul 30, 2020

sergeybykov commented Aug 7, 2020

benjaminpetit commented Sep 7, 2021

ElanHasson commented Sep 8, 2021

Allow Stream Id key to be String or CompositeKey, not only Guid #1121

Allow Stream Id key to be String or CompositeKey, not only Guid #1121

Comments

centur commented Dec 8, 2015

gabikliot commented Dec 8, 2015

randa1 commented Dec 9, 2015

jamescarter-le commented Jan 16, 2016

amamh commented Jan 19, 2016

zeus82 commented Jan 21, 2016

cvalusek commented Feb 20, 2016

cvalusek commented Feb 23, 2016

jason-bragg commented Feb 24, 2016

cvalusek commented Feb 24, 2016

cvalusek commented Feb 24, 2016

jason-bragg commented Feb 25, 2016

gabikliot commented Feb 25, 2016

cvalusek commented Feb 25, 2016

marcinbudny commented Jun 6, 2016

sergeybykov commented Jun 15, 2016

cvalusek commented Jun 15, 2016

lwansbrough commented May 25, 2017 • edited

veikkoeeva commented May 25, 2017 • edited

lwansbrough commented May 25, 2017

lwansbrough commented May 25, 2017

veikkoeeva commented May 25, 2017

u-less commented Nov 21, 2017

eisendle commented Nov 21, 2017

srberard commented Jan 6, 2018

sergeybykov commented Jan 9, 2018

kutensky commented Apr 14, 2018

amccool commented Jul 29, 2020

JanEggers commented Jul 30, 2020

sergeybykov commented Aug 7, 2020

benjaminpetit commented Sep 7, 2021

ElanHasson commented Sep 8, 2021

lwansbrough commented May 25, 2017 •

edited

veikkoeeva commented May 25, 2017 •

edited