Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Stream Id key to be String or CompositeKey, not only Guid #1121

Closed
centur opened this issue Dec 8, 2015 · 31 comments
Closed

Allow Stream Id key to be String or CompositeKey, not only Guid #1121

centur opened this issue Dec 8, 2015 · 31 comments
Assignees
Milestone

Comments

@centur
Copy link
Contributor

centur commented Dec 8, 2015

Given that Stream listeners are just generic grains - it looks strange that there is a limitation that Stream listener with implicit subscription can have only Guid key.
Permitting string or composite keys can allow developers to use streams in more natural way - e.g. use original object id as it's stream id, instead of sending it to some guid and passing original ID as part of the context or message. E.g. each grain can act as own events producer and all events can be easily identified by it's Id, without lookup.

This is long-lapsed follow-up of a discussion in Gitter.

@gabikliot
Copy link
Contributor

Makes sense.

@randa1
Copy link
Contributor

randa1 commented Dec 9, 2015

+1 👍

@jamescarter-le
Copy link
Contributor

+1

@amamh
Copy link

amamh commented Jan 19, 2016

+1

@zeus82
Copy link
Contributor

zeus82 commented Jan 21, 2016

+1 - Can we not replace StreamId with the UniqueKey class that's used by grains?

@cvalusek
Copy link

I have this functioning with the built-in providers as well as my own with some compatibility breaks and need a ruling from the team on how much to focus on compatibility versus code complexity.

I will need to add KeyExt to IStreamIdentity and IBatchContainer. I could add separate interfaces instead that inherit from these and use a safe cast to be more compatible. This makes it harder to implement a stream provider. It would, however, be fully binary compatible (with assembly binding redirects). I think using UniqueKey is your best bet in the long run, but that should cause some pretty heavy compatibility breaks.

A similar problem comes up for the arguments on IStreamProvider.GetStream, IQueueAdapter.QueueMessageBatchAsync, and IStreamQueueMapper.GetQueueForStream. I have opted to create separate methods there and only call them when the KeyExt is set. This could be taken further by creating separate interfaces, but I opted not to do it, because I think this is a more practical trade. It only would cause problems where a KeyExtension was being used with an ImplicitStreamSubscriber that expected the key extension to be lost.

@cvalusek
Copy link

After a moderate review of code using StreamId, I have verified the understated comment in the documentation.

Implementing a completely new Stream Provider might turn to be a rather complicated task, since you might need access to various internal runtime components, some of which may have internal access.

I highly doubt anyone has implemented their own stream provider at this point. This biggest blocking factor is the IPubSubImpl being internal. For this reason, I am going to move the change to IStreamIdentity into the second category I listed above and just make the change optimized for clean code instead of backwards compatibility. I think the statement about IBatchContainer stands, so I have added a separate interface there to allow stream providers to transition into supporting KeyExt.

@jason-bragg
Copy link
Contributor

Initially I was of the opinion that the stream ID's should probably follow the same ID pattern as grains, making it more natural to map streams to grains, but upon reflection, I'm not terribly fond of the grainId pattern.

Guids have their problems, but they are very good for what they are, unique Id's. Opening this to a wider set of data types, while beneficial to stream users, adds complexity (and possibly performance issues) to stream infrastructure and stream provider implementations.

My concerns are:

  • Adding a KeyExt string to the system adds a fair degree of complexity with limited expressiveness.
  • The complexity and overhead (performance, memory) of supporting the KeyExt will be paid by all users (and provider developers) of streams, whether they need that flexibility or not.

Question: Is there a way to provide a high degree of flexibility, with low complexity?

Assuming stream infrastructure was made public for other stream provider developers..
What if the stream infrastructure (PubSub mainly) supported a stream id something like:

class StreamKey<TKey> where TID : struct
{
    public TKey Key { get; set; }
    public IEqualityComparer<TKey> { get; set; }
}

class StreamId<TKey> where TKey: struct
{
    public StreamKey<TKey> Id { get; set; }
    public string Namespace { get; set; }
    public string ProviderName { get; set; }
}

and the stream provider interface included the key type.

public interface IStreamProvider<TKey> where TKey: struct
{
    ...
    IAsyncStream<T> GetStream<T>(TKey streamId, string streamNamespace);
    ...
}

The existing stream providers would support only Guid keys, but could be updated to support a wider set, or even user defined set of types. Similar with future stream providers. Specialized stream providers that favored simple high performance key types like 'long' could do so, while more general stream providers like azure queue could support most any key type users wished to define.

Thoughts?

@cvalusek
Copy link

That is an interesting idea. Our interest in the ticket is coming from problems with the ImplicitStreamSubscriber when we use KeyExt on the grains. I think that going with a generic would make it even harder to make the implicit subscriptions compatible regardless of key format.

I think the benefits of using streams in Orleans are really coming with the integration to the grains.

I highly agree with opening up some of the streaming infrastructure at some point in the future. Not having at least the interfaces to the pub/sub implementation means that you really can't implement a full blown stream provider that can participate in pub/sub. However, the current constraints are allowing the more aggressive changes to the API while still keeping the system lightly extensible.

I have tried a couple different ways to solve the problem. The one I was favoring before your comment was to replace functions passing all the fields on IStreamIdentity with it instead. Then, I could quickly add KeyExt to IStreamIdentity. If we went that route, it might even be better to make StreamId public and just use it. I don't see a point to having the interface as things stand. That would be closer to what was done for QueueId.

Because of the way that all stream providers are probably funneling through PersistentStreamProvider, (aside from SMS which has a lot of similar code) I think it is highly unlikely we have a different implementation IStreamIdentity in existence. Having StreamId's equality comparison would tighten up some points in the code even more. This change leads to removing more code than it adds I think.

StreamId additionally has the ProviderName on it. This might be a reason not to use it, although I think the penalty for it is already being paid by most storage providers. I could look at adding StreamIdWithProvider which could differentiate between the scenarios where the ProviderName is needed. The main use I have seen for it is to make a static call out to GetProvider in some of the pub/sub code.

Maybe, also opening up a standard BatchContainer implementation would help cut out unnecessary code in stream provider implementations as well.

@cvalusek
Copy link

After some review of what things would be like if we got rid of IStreamIdentity and just used StreamId, there are some spots that feel wrong with respect to the ProviderName. I am going to play around with splitting StreamId into 2 classes that help split out the behaviour needing the ProviderName. There is a lot less code that uses that ProviderName. The largest issues are around spots that use it by proxy (like IRingIdentifier.GetUniformHashCode()).

Ideally, there would be a version that does not have ProviderName and can be put on IBatchContainer to clean up all the scenarios where the properties are being mapped and passed as arguments.

@jason-bragg
Copy link
Contributor

I think that going with a generic would make it even harder to make the implicit subscriptions compatible regardless of key format.

Agreed, but the difficulty with identifying a stream to an implicitly subscribed grain is a separate problem which aggravated the stream identification issue. Making the decisions on the stream identification to accommodate an existing systems shortcomings rather than addressing them directly is, in my view, not a good idea. It's unfortunate that these issues are intertwined. :/

I think that going with a generic would make it even harder to make the implicit subscriptions compatible regardless of key format.

Harder, yes. My argument isn't based off of difficulty alone, it argues from relative value vs difficulty. The value provided by adding a KeyExt is very valuable to you because it is suffice for your need, but in a general sense, it's of relatively narrow value. The ability for stream providers to support various streamId, by making some of the infrastructure generic, has, in my view, far more utility, with only slightly more difficulty. This is just my assessment, I am fallible of course.

The main use I have seen for it (StreamId) is to make a static call out to GetProvider in some of the pub/sub code.

IPubSub interface uses it (StreamId) currently. If IPubSub is to be public it will not be able to use an internal type. I'm not yet clear of the significance of having an internal streamId type. Maybe @gabikliot or @sergeybykov can help.

A standard BatchContainer implementation would help cut out unnecessary code.

Possible, but batch containers are abstractions for queue specific data. We may be able to write one that fits most scenarios, but not I don't yet see clearly how we could write one that fit all scenarios. Until then, that abstraction will remain necessary.

@gabikliot
Copy link
Contributor

There is no real fundamental reason to having different internal and public ids. Part of it is historic, part of it for efficiency: public id will usually have all kinds of restrictions, while the internal one can be highly optimized for perf. That is the reason for internal GrainId vs. public GrainReference, and also for SteamId vs. StreamIdentity. But it does not have to be that way. It should be possible to only have one, the public one, and implement it efficiently.

@cvalusek
Copy link

I'll try one more run through the change where I try with the generic TKey. You are going to need some kind of TKey => GrainId function for the implicit stream subscription code. I'll see if I can find a nice home for this.

@marcinbudny
Copy link

I can see this issue has been inactive for some time now. Do you plan to release this functionality any time soon?

@sergeybykov
Copy link
Contributor

@marcinbudny Sorry about delay. It is unclear yet. We have a team that needs it, but can for now work around the limitation. Once we are done with a couple higher priority streaming workitems, we'll reassess.

@cvalusek
Copy link

We made some minor changes to get around it. I am still planning on helping with this after we get our setup production hardened.

@lwansbrough
Copy link

lwansbrough commented May 25, 2017

I'd like to see this implemented. At some point I suspect this will bite us. Currently we do not have control over the identifiers entering our system. So far it's been ints and longs, but eventually someone is going to want to use a GUID and our system will have to accommodate that. We read out the bytes of the ints and longs into a GUID. This wouldn't be an issue usually, but we're also using a few of the bytes of the GUID for metadata, so augmenting a GUID with metadata in a GUID would result in loss of information. Some would argue augmenting the GUID is a bad idea, but for us it's the most efficient way to acquire metadata during activation.

If anyone has a better solution then I'm all ears. But I know being able to use a composite key would be a perfect solution for us. Not sure what keys do internally, but it seems to me like they should just be byte arrays where the first byte is the length. That way you can support anything as a key up to 255 bytes. Keeps the API simple for devs too.

@veikkoeeva
Copy link
Contributor

veikkoeeva commented May 25, 2017

@lwansbrough You may have thought about this already, but I'll add to facilitate for larger discussion, one could encode whatever data to version 3 or 5 UUIDs and use those as GUIDs. The overall idea is good though, maybe define something like a struct OrleansId which had explicit for byte[] transformation.

Maybe one doesn't want to use UUIDs (GUIDs), so it looks to me that taking this into a more logical conclusion would be to use Naming Things with Hashes (RFC 6920) and Defining Well-Known Uniform Resource Identifiers (URIs) (RFC 5785), which would allow one for a uniform way to point to objects both in storage and in clusters and even from outside in. The resulting URIs (embedded in some payload) could be something like

"ni://myorganisation.com/sha-256;abcdfg-ABCDFG?hash=sha-256&param1=1&someparam=somevalue".

@lwansbrough
Copy link

@veikkoeeva The problem with using GUIDs to store data is that there's a 16 byte maximum. I'm having trouble understanding what prevents us from maintaining all the existing APIs but with a change where they're now all backed by byte[] instead of their own primitive types (strings, longs, and Guids).

Are you proposing that URI would be the grain ID? It's an interesting idea but that's ludicrously long and inefficient on memory. Besides, with a byte[] anyone would be welcome to implement that scheme as a string ID.

@lwansbrough
Copy link

@veikkoeeva Yeah the GrainId struct with a byte[] makes a lot of sense to me personally.

@veikkoeeva
Copy link
Contributor

@lwansbrough I suppose it's just work and maybe backwards compatibility for custom tooling.

I'm not proposing that URI would be the grain ID. I'm outlining a thought the ID could be represented as a string amongst any other types (for convenience), but internally be byte[] hashed in certain way. Say it could be a v5 UUID, 160 bytes, or maybe a SHA-256. If it's that internally, what if one expands this point of view beyond Orleans?

One way could to make an explicit design decision and some facility code in Orleans that the data is stored with hashing the grain type and ID (and deployment ID). Then if Orleans had a default way of pointing to actors/data outside, it could resemble that way. Notably having a bit of cryptoagility to tell the hash used in case it'll change over time and then have the Orleans codegen to create a table that maps the type strings to hashed ones and the IDs can be translated dynamically (and cached).

This is what the ADO.NET storage provider does already, it hashes both the grain type and ID, uses them as the indexed storage ID and solves hash collisions by ordinary string comparison (like GetHashCode and Equals in Dictionary, say).

@u-less
Copy link

u-less commented Nov 21, 2017

+1 for StreamID should support the types (string, long, Guid),Or only support string.

@eisendle
Copy link
Contributor

+1 👍

@srberard
Copy link

srberard commented Jan 6, 2018

+1

Has anyone started working on this?

@sergeybykov
Copy link
Contributor

We decided to postpone this until after 2.0.0 and make it part of revisiting the streaming API altogether.

@kutensky
Copy link
Contributor

Hi folks. I was thinking about implementation of this. And according to what we desire to have and what structure we have, i think the best implementation should be next one:
As @lwansbrough proposed, we should replace Guid to byte[] as a stream key. Public methods for accessing stream (such as IStreamProvider, IGrainFactory, etc.) should have 3 overloads: for Guid, for string and for T : struct. That will help us to support previous implemntation with Guid as a key, and provide an ability for customers to use string or any struct type.
Inside those public methods we will serialize Guid, string or struct to bytes array using Orlean's "ByteArrayBuilder".
Also we need to create few classes for StreamId. For example "StreamId" (for Guid), "StreamId where TKey: struct" for struct and StreamIdString for string.
@jason-bragg , what do you think about such implementation. Could I try to impement it in that way?

@amccool
Copy link
Contributor

amccool commented Jul 29, 2020

@sergeybykov do you have a new issue in 3.x for the use of string rather than guid keying on implicit steams?

@JanEggers
Copy link

have a look at #6660

@sergeybykov
Copy link
Contributor

@amccool It's among the 4.0 "back to basics" things.

@sergeybykov sergeybykov modified the milestones: Backlog, 4.0.0 Aug 7, 2020
@ReubenBond ReubenBond removed the P3 label Sep 3, 2021
@benjaminpetit
Copy link
Member

Will be included in the next major release

@ElanHasson
Copy link
Contributor

Awesome! Thank you!

@dotnet dotnet locked as resolved and limited conversation to collaborators Oct 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests