Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #6675: Update IEventHubDataAdapter to support StreamId to partition mapping #6676

Merged
merged 3 commits into from
Aug 10, 2020

Conversation

alexmg
Copy link
Contributor

@alexmg alexmg commented Aug 5, 2020

When you have a large number of streams with the same stream Guid but different Namespace values it can result in a hot partition in the Event Hub because only the Guid is being used in the PartitionKey.

Summary of changes:

  • Update EventHubAdapterFactory.QueueMessageBatchAsync to append the Namespace to the PartitionKey if a value is available
  • Update EvenetHubDataAdapter.GetStreamPosition to cater for the Namespace potentially being included in the PartitionKey when extracting the stream Guid

@jason-bragg
Copy link
Contributor

Change looks good, but could break current users on upgrade. This change could, on upgrade, cause streams to be sent to different partition than they previously were. Ordered processing and recovery isn't preserved across partitions. That's why a stream's data is always kept in a single partition.

If we are to make this change we need to provide some way for current users to preserve the current behavior. Maybe config switch? Maybe adding partition mapping to the IEventHubDataAdapter? Not sure what's the best way to preserver backwards compactivity. Open to suggestions.

@alexmg
Copy link
Contributor Author

alexmg commented Aug 5, 2020

Thanks for looking at this @jason-bragg. In my rush to find a solution I forgot about the ordering issue with existing and new events being on different partitions and this is something that needs to be addressed to maintain the guarantees that are in place with the Event Hub stream provider.

I originally thought about adding a callback option that would allow the user to define the partition key with a default implementation that would use the existing Guid only format. I then thought there isn’t many variations that make sense as you want all messages for a given stream to land in the same partition while maintaining even distribution, so the Guid and Namespace combination was ideal. Instead of being a callback maybe the option could be an enumeration with values for Guid (default) or GuidAndNamespace to limit the choices and keep the potential formats well-known for the EventHubDataAdapter.

It’s annoying that the ordering issue with existing events is only a short-lived problem at the point of upgrade, and a simple opt-in option to the new format would just be stating that you had existing events and want to keep the old partition key format, even though the new format might be desirable for new streams that don’t have existing events.

I can also work around the hot partition issue by generating a namespaced UUID that would consistently generate the Guid portion of the StreamId based on a combination what is currently being used as the Guid and Namespace (the TenantId and stream name in my case). It would mean that the Guid is no longer easily identifiable as the TenantId and it would break all my existing subscriptions. It would also mean that the original hot partition issue would still exist for anyone else that is using a similar StreamId format where there are several different Namespace values being combined for the same Guid.

I will have a look at adding the explicit format option mentioned above that only has two choices available (Guid or GuidAndNamespace) with the default of Guid maintaining the existing behaviour. I suppose a callback style option could receive a Guid and Namespace and return the desired enumeration but again that would really only make sense for that short period of time where existing events are present during an upgrade and would probably be confusing after that.

@jason-bragg
Copy link
Contributor

jason-bragg commented Aug 5, 2020

@alexmg, What you propose sounds viable. The suggestion of adding a streamId to partition key mapping to IEventHubData adapter abstraction was to allow you (or any user) to customize the behavior appropriate for their environment. It both addresses your needs (as I understand it) as well as provides a degree of future proofing for unforeseen use cases.

As for addressing the transition, some customers have added a new provider along with marker message in stream to move streams from one hub to another. This has mostly been the case when a change is needed in partition count but could probably work for this as well. As I've not worked on such a system, I can't speak authoritatively on how this is done, or how difficult it is other than to say it should be possible. My understanding is that for one deployment cycle all events are generated on both providers, but a marker event is sent periodically on the deprecated provider. When the consumer gets the marker, it subscribes to the new provider and switches to processing events on the new provider when it has received the same event from both providers. I believe this needs a way to detect that the two events match, like some application specific ordering information. Once one deployment has processed events in this way it should be processing all active stream from the new provider, so the transition logic can be disabled/removed in the next deployment, since all idle streams should not encounter ordering or recovery problems. I'm not suggesting this, only surfacing the potential.

@alexmg
Copy link
Contributor Author

alexmg commented Aug 6, 2020

I agree your suggestion regarding extending the IEventHubDataAdapter is the cleanest and most flexible option. I was trying to avoid changing existing interfaces, but I assume most people using a custom IEventHubDataAdapter are deriving from EventHubDataAdapter, and with virtual methods containing default implementations it should not cause major issues.

I have added the methods below to the IEventHubDataAdapter interface and added the default implementations in EventHubDataAdapter that maintain the existing behaviour.

string GetPartitionKey(Guid streamGuid, string streamNamespace);

IStreamIdentity GetStreamIdentity(EventData queueMessage);

It would have been nice to use the IStreamIdentity for both directions as below, but that would have required allocating another StreamIdentity instance in the QueueMessageBatchAsync method because it receives the Guid and Namespace as separate arguments. I did not want to change the signature of the IQueueAdapter.QueueMessageBatchAsync to receive a IStreamIdentity because that would impact the many other implementations.

string GetPartitionKey(IStreamIdentity streamIdentity);

IStreamIdentity GetStreamIdentity(EventData queueMessage);

If you prefer this signature and are not concerned about the extra allocation of a StreamIdentity I can update the PR.

Thanks for the suggestions on managing changes to the Event Hub partitions. This is useful information for both this situation and when a partition count change is required.

@alexmg alexmg changed the title Fixes #6675: Include stream Namespace in PartitionKey for EH messages Fixes #6675: Update IEventHubDataAdapter to support StreamId to partition mapping Aug 6, 2020
Copy link
Contributor

@jason-bragg jason-bragg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg2m
Nice work!

@jason-bragg
Copy link
Contributor

@benjaminpetit Any concerns?

@jason-bragg
Copy link
Contributor

jason-bragg commented Aug 6, 2020

I think this change is good, but noting shortcomings of current pattern for future consideration.

Some history - The data adapter was initially intended to adapt data from the queue to the cache and from the cache to the application. It didn't deal with how data was written to the queue or which queue it was written to. Over time the queue to cache translation was mostly generalized and the need to also handle how data was written to the queue was added. However, no ability was added to support mapping a message to a specific queue (originally the responsibility of the stream queue mapper).

This change now adds the ability to determine which queue the data is written to, which is good, but this behavior is only available in IEventHubDataAdapter, where it's been added. This means that there is still a hole in the pattern as this behavior is still not fully supported across all persistent providers. Suggest revisiting data adapter design, with explicit focus on common needs across persistent stream providers and which of those needs the adapter should be responsible for. Separation of producer and consumer side needs should also be considered.

@alexmg
Copy link
Contributor Author

alexmg commented Aug 7, 2020

Thanks for your assistance with this issue @jason-bragg. It is very much appreciated. The way the Orleans team approaches open source and engages with the community is top-notch.

Would this be a candidate for the 3.2.3 patch release and is there an expected timeframe for that?

@sergeybykov sergeybykov added this to the 3.3.0 milestone Aug 10, 2020
@benjaminpetit
Copy link
Member

I think it's a good addition, even if it's only available for the event hub adapter for now. We will try to include that in 3.3

@benjaminpetit benjaminpetit merged commit 2d3ef68 into dotnet:master Aug 10, 2020
@alexmg
Copy link
Contributor Author

alexmg commented Aug 11, 2020

That is great news. Thank you @benjaminpetit.

@alexmg alexmg deleted the PartitionKey branch August 11, 2020 11:10
@alexmg
Copy link
Contributor Author

alexmg commented Sep 4, 2020

Is this still on the cards for 3.3.0 or are you waiting for the new StreamId from #6660 to land in 4.0.0? The StreamId looks great and I'm keen to see this improvement in one form or another but hoping the wait isn't too long. Is there a current timeframe for a 4.0.0 release and would this still be considered an acceptable 3.3.0 change in the interim?

@benjaminpetit
Copy link
Member

Look like we missed to backport this commit for 3.3.0 :)

I think it's totally fine to include it in 3.3.0, especially since 4.0 will be a much bigger release

benjaminpetit pushed a commit to benjaminpetit/orleans that referenced this pull request Sep 4, 2020
… partition mapping (dotnet#6676)

* Fixes dotnet#6675: Include stream Namespace in PartitionKey for Event Hub messages

* Update IEventHubDataAdapter to support partition to StreamId mapping
sergeybykov pushed a commit that referenced this pull request Sep 4, 2020
…tion mapping (#6676) (#6731)

* Fixes #6675: Include stream Namespace in PartitionKey for Event Hub messages

* Update IEventHubDataAdapter to support partition to StreamId mapping

Co-authored-by: Alex Meyer-Gleaves <alex.meyergleaves@gmail.com>
@alexmg
Copy link
Contributor Author

alexmg commented Sep 5, 2020

That's great news! Thank you @benjaminpetit.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants