Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serverless Change Feed Processing with Azure Functions, is it possible to scale-out across multiple partitions? #34153

Closed
chad-chronotek opened this issue Jun 27, 2019 — with docs.microsoft.com · 4 comments

Comments

Copy link

commented Jun 27, 2019 — with docs.microsoft.com

I noticed the docs mention the ability to scale out processing of change feed in parallel by partition. It is not clear to me how this works with serverless and Azure Functions. If I wire up the Change Feed trigger will I end up with parallel execution flow per partition? Ideally I would like to scale our solution by having an azure function triggering across all partitions in parallel but processing in sequence within the partition. So imagine 2 partitions A and B, combined change feed stream looks like this A1, B1, A2, B2, A3, B3. Partition A events could process at the same time as partition B events but I want to ensure that the order within the partition is maintained so A1, A2, A3 and B1, B2, B3. Is this possible when using the Change Feed binding on an Azure Function?


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

@MohitVerma-MSFT

This comment has been minimized.

Copy link

commented Jun 28, 2019

@chad-chronotek Thank you for your feedback! We will review and provide an update as appropriate.

@angoyal-msft

This comment has been minimized.

Copy link
Contributor

commented Jul 1, 2019

@chad-chronotek I have assigned this issue to content author to help you better on this issue.
@rimman Could you please help the user with his concern and update the document appropriately.

@ealsur

This comment has been minimized.

Copy link
Contributor

commented Jul 3, 2019

Let's say you deploy your Cosmos DB Trigger and it starts with 1 instance. A single instance will try to consume all Partition Key Ranges in parallel, one thread per Partition Key Range (see https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed-processor), so they are effectively being processed in parallel.
If more instances come, then the Trigger will start to distribute the PK Ranges across all existing instances with an equal-distribution algorithm.

So let's say you have PK Ranges A, B, C, and D. You start with 1 instance. Instance 1 will process all 4 in parallel.
When instance 2 comes in, then it will take part of the PK Ranges, so instance 1 will have A and B and instance 2, C and D.
If you have 4 instances, then each instance will have 1 PK Range to process. Effectively distributing compute power.

The amount of PK Ranges can grow dynamically as your storage and throughput needs grow, and the Trigger and dynamically adjust.

@SnehaGunda

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2019

@chad-chronotek Hope Matias's response above helps. We will close this issue if you don't have further questions.

#please-close

@PRMerger9 PRMerger9 closed this Jul 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.