-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #6675: Update IEventHubDataAdapter to support StreamId to partition mapping #6676
Conversation
Change looks good, but could break current users on upgrade. This change could, on upgrade, cause streams to be sent to different partition than they previously were. Ordered processing and recovery isn't preserved across partitions. That's why a stream's data is always kept in a single partition. If we are to make this change we need to provide some way for current users to preserve the current behavior. Maybe config switch? Maybe adding partition mapping to the IEventHubDataAdapter? Not sure what's the best way to preserver backwards compactivity. Open to suggestions. |
Thanks for looking at this @jason-bragg. In my rush to find a solution I forgot about the ordering issue with existing and new events being on different partitions and this is something that needs to be addressed to maintain the guarantees that are in place with the Event Hub stream provider. I originally thought about adding a callback option that would allow the user to define the partition key with a default implementation that would use the existing Guid only format. I then thought there isn’t many variations that make sense as you want all messages for a given stream to land in the same partition while maintaining even distribution, so the Guid and Namespace combination was ideal. Instead of being a callback maybe the option could be an enumeration with values for It’s annoying that the ordering issue with existing events is only a short-lived problem at the point of upgrade, and a simple opt-in option to the new format would just be stating that you had existing events and want to keep the old partition key format, even though the new format might be desirable for new streams that don’t have existing events. I can also work around the hot partition issue by generating a namespaced UUID that would consistently generate the Guid portion of the StreamId based on a combination what is currently being used as the Guid and Namespace (the TenantId and stream name in my case). It would mean that the Guid is no longer easily identifiable as the TenantId and it would break all my existing subscriptions. It would also mean that the original hot partition issue would still exist for anyone else that is using a similar StreamId format where there are several different Namespace values being combined for the same Guid. I will have a look at adding the explicit format option mentioned above that only has two choices available ( |
@alexmg, What you propose sounds viable. The suggestion of adding a streamId to partition key mapping to IEventHubData adapter abstraction was to allow you (or any user) to customize the behavior appropriate for their environment. It both addresses your needs (as I understand it) as well as provides a degree of future proofing for unforeseen use cases. As for addressing the transition, some customers have added a new provider along with marker message in stream to move streams from one hub to another. This has mostly been the case when a change is needed in partition count but could probably work for this as well. As I've not worked on such a system, I can't speak authoritatively on how this is done, or how difficult it is other than to say it should be possible. My understanding is that for one deployment cycle all events are generated on both providers, but a marker event is sent periodically on the deprecated provider. When the consumer gets the marker, it subscribes to the new provider and switches to processing events on the new provider when it has received the same event from both providers. I believe this needs a way to detect that the two events match, like some application specific ordering information. Once one deployment has processed events in this way it should be processing all active stream from the new provider, so the transition logic can be disabled/removed in the next deployment, since all idle streams should not encounter ordering or recovery problems. I'm not suggesting this, only surfacing the potential. |
I agree your suggestion regarding extending the I have added the methods below to the string GetPartitionKey(Guid streamGuid, string streamNamespace);
IStreamIdentity GetStreamIdentity(EventData queueMessage); It would have been nice to use the string GetPartitionKey(IStreamIdentity streamIdentity);
IStreamIdentity GetStreamIdentity(EventData queueMessage); If you prefer this signature and are not concerned about the extra allocation of a Thanks for the suggestions on managing changes to the Event Hub partitions. This is useful information for both this situation and when a partition count change is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg2m
Nice work!
@benjaminpetit Any concerns? |
I think this change is good, but noting shortcomings of current pattern for future consideration. Some history - The data adapter was initially intended to adapt data from the queue to the cache and from the cache to the application. It didn't deal with how data was written to the queue or which queue it was written to. Over time the queue to cache translation was mostly generalized and the need to also handle how data was written to the queue was added. However, no ability was added to support mapping a message to a specific queue (originally the responsibility of the stream queue mapper). This change now adds the ability to determine which queue the data is written to, which is good, but this behavior is only available in IEventHubDataAdapter, where it's been added. This means that there is still a hole in the pattern as this behavior is still not fully supported across all persistent providers. Suggest revisiting data adapter design, with explicit focus on common needs across persistent stream providers and which of those needs the adapter should be responsible for. Separation of producer and consumer side needs should also be considered. |
Thanks for your assistance with this issue @jason-bragg. It is very much appreciated. The way the Orleans team approaches open source and engages with the community is top-notch. Would this be a candidate for the 3.2.3 patch release and is there an expected timeframe for that? |
I think it's a good addition, even if it's only available for the event hub adapter for now. We will try to include that in 3.3 |
That is great news. Thank you @benjaminpetit. |
Is this still on the cards for |
Look like we missed to backport this commit for 3.3.0 :) I think it's totally fine to include it in 3.3.0, especially since 4.0 will be a much bigger release |
… partition mapping (dotnet#6676) * Fixes dotnet#6675: Include stream Namespace in PartitionKey for Event Hub messages * Update IEventHubDataAdapter to support partition to StreamId mapping
That's great news! Thank you @benjaminpetit. |
When you have a large number of streams with the same stream Guid but different Namespace values it can result in a hot partition in the Event Hub because only the Guid is being used in the PartitionKey.
Summary of changes:
EventHubAdapterFactory.QueueMessageBatchAsync
to append the Namespace to the PartitionKey if a value is availableEvenetHubDataAdapter.GetStreamPosition
to cater for the Namespace potentially being included in the PartitionKey when extracting the stream Guid