cosmosdb statestore partitioning broken under load #3716

javageek79 · 2021-09-27T16:13:42Z

In what area(s)?

/area runtime

/area operator

/area placement

/area docs

/area test-and-release

What version of Dapr?

1.4.0

1.0.x
edge: output of git describe --dirty

Expected Behavior

Having only one db entry called actors||ACTOR_TYPE||metadata and n buckets, where n is the number of partitions.

Actual Behavior

There are multiple parent entries in db with name actors||ACTOR_TYPE for each id value, there are n partition-entries, in db containing 200 - 400 reminder entries. value property is set to null.
There is also one single actors||ACTOR_TYPE||metadata with also n partitions and a properly initialized value field.

Table shows parent entries in statestore. For each entry, there are also 50 partiotion-bucket entries as mentioned above.

id	partitionkey	ts	date	value-attribute
actors\|\|ACTOR_TYPE	8e7c8966-ac47-41bb-8f98-1e4b4633feb3	1632737080	27.09.2021 10:04	null
actors\|\|ACTOR_TYPE\|\|metadata	actors\|\|ACTOR_TYPE\|\|metadata	1632737080	27.09.2021 10:04	{ "id": "8e7c8966-ac47-41bb-8f98-1e4b4633feb3", "actorRemindersMetadata": { "partitionCount": 50 } }
actors\|\|ACTOR_TYPE	6c8f8b84-d4bd-405e-a349-57065e644d73	1632735901	27.09.2021 09:45	null
actors\|\|ACTOR_TYPE	a7e27bd7-dc5b-45cc-8760-60d615d3b455	1632734569	27.09.2021 09:22	null

The screenshots show the time frame in which the loadtest took place.
The loadtest started at 11:12:23am and created 2k actors each registering 7 reminders.
2k reminders where supposed to execute at 11:45. After registering those reminders, all actors have been deactivated before 11:45 and recreated with an execution tim at 11:35.
The reregistration started at 11:19:47am
actors||ACTOR_TYPE||metadata -> id == 8e7c8966-ac47-41bb-8f98-1e4b4633feb3

actors||ACTOR_TYPE -> id == 8e7c8966-ac47-41bb-8f98-1e4b4633feb3

actors||ACTOR_TYPE -> id == a7e27bd7-dc5b-45cc-8760-60d615d3b455

actors||ACTOR_TYPE -> id == 6c8f8b84-d4bd-405e-a349-57065e644d73

Steps to Reproduce the Problem

It was revealed only through our loadtest session. In normal load situation, error situation did not occur.

Release Note

RELEASE NOTE:

The text was updated successfully, but these errors were encountered:

artursouza · 2021-09-29T17:05:04Z

This seems to be a hot partition problem in CosmosDB. I recommend going through this documentation first: https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview#physical-partitions

Then, see if your provisioned throughput is enough to sustain this load. The throughput is evenly split across partitions, so there might be a need to overprovision to compensate for a hot partition.

Another strategy to minimize the hot partition problem is to play with the partition count. For example, try using a prime number. In your scenario, the next prime number is 53.

javageek79 · 2021-09-29T17:11:06Z

ok, thx. We will try the prime approach first.

artursouza · 2021-09-30T22:34:11Z

The fact that you see Key "actors||ACTOR_TYPE" with PartitionKey "a7e27bd7-dc5b-45cc-8760-60d615d3b455" is odd because the first is the old way data was saved while the second is partitioned - which means the Id was supposed to be per reminder partition and not for the global "old" way.

artursouza · 2021-09-30T22:34:20Z

I am looking into this issue.

artursouza · 2021-09-30T22:37:45Z

To clarify, any instance of id "actors||ACTOR_TYPE" should have a partitionKey of "actors||ACTOR_TYPE" as well.

artursouza · 2021-10-04T03:00:29Z

I reproduced a slightly different problem. The issue I found is that the "actors||ACTOR_TYPE" record is created even after the partition is set and none of the partition reminder records were used. I will find the root cause for this, which might lead to fix the issue you are seeing.

javageek79 · 2021-10-04T14:45:34Z

Hi @artursouza , sounds promising. It would be great, to have a fix available soon.
Last loadtest ran with 1009 partitions, caused ~75.000 entries on the db, where only 1.010 were expected due to this behavior ;-)

artursouza · 2021-10-04T23:25:40Z

@javageek79 I found the root cause for the behavior I was seeing before. Let me explain my test scenario:

test client ---> dapr sidecar A -----> dapr sidecar B ---> actor app

The issue I was facing is because sidecar A was being invoked to register a reminder but it was not configured with the Actor.TypeMetadata feature. Only sidecar B had that configuration. Once I added this config, it worked as expected.

Can you confirm if the same is going on in your test scenario? Please, make sure that all sidecar instances have this preview feature enabled:

spec:
  features:
    - name: Actor.TypeMetadata
      enabled: true

javageek79 · 2021-10-05T05:49:27Z

Hi @artursouza, the feature is enabled in a config.yml which is referenced only by sidecar B.
The connection between sidecar A and sidecar B is via Azure Service Bus. Therefore have no direct connection.
Does every dapr related service needs to reference the config to make the feature work?
How can I verify that the feature was enabled in sidecars?

artursouza · 2021-10-06T18:25:19Z

Do a kubectl describe pod <pod> to check the config being used in the dapr.io/config annotation. https://docs.dapr.io/operations/configuration/configuration-overview/

javageek79 · 2021-10-06T19:32:29Z

In the one Service that uses the statestore or in all services that have sidecars?

artursouza · 2021-10-06T20:07:29Z

Start with the ones that use state store first.

javageek79 · 2021-10-06T21:02:22Z

I've checked, only the ones using statestore ( 1 Service with 2-6 replicas) have config annotation set.

artursouza · 2021-11-05T16:39:45Z

Moving this to 1.6. The action item is to remove the transaction logic and use batch write instead since transaction is not really required. Also, we will change the partitionKey to be simply the key - which is a breaking change and will need to provide a script to migrate the data.

artursouza · 2022-01-07T01:14:22Z

We have changed how reminder partitions are migrated in 1.6 by using batch instead of transaction, migrating only at start up and not on registering/unregistering reminders and blocking multiple migrations in parallel from the same process. This should reduce contention in reminder persistence.

In addition, we also fixed a bug that was allowing multiple state stores for actors to be randomly selected instead of failing the sidecar right away.

We also found an issue on how first-write-wins was being used when the record was being written for the first time and eTag was still nil, it ended up falling back to last-write-wins. This was also fixed.

Please, observe these in 1.6 release to confirm if issues continues.

artursouza · 2022-01-14T03:04:43Z

I am closing this issue as part of the release activities since we made changes to address this problem. Please, reopen if it can still be reproduced after 1.6 release.

javageek79 added the kind/bug Something isn't working label Sep 27, 2021

artursouza self-assigned this Sep 30, 2021

artursouza added this to the v1.5 milestone Sep 30, 2021

artursouza added the pinned label Sep 30, 2021

artursouza added this to Backlog in Dapr Roadmap via automation Sep 30, 2021

artursouza added P1 area/runtime/actor size/S 1 week of work triaged/resolved Indicates that this issue has been triaged labels Sep 30, 2021

artursouza modified the milestones: v1.5, v1.6 Nov 5, 2021

artursouza closed this as completed Jan 14, 2022

artursouza moved this from Backlog to Done in Dapr Roadmap Feb 1, 2022

artursouza moved this from Done to Released in Dapr Roadmap Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cosmosdb statestore partitioning broken under load #3716

cosmosdb statestore partitioning broken under load #3716

javageek79 commented Sep 27, 2021

artursouza commented Sep 29, 2021

javageek79 commented Sep 29, 2021

artursouza commented Sep 30, 2021

artursouza commented Sep 30, 2021

artursouza commented Sep 30, 2021

artursouza commented Oct 4, 2021

javageek79 commented Oct 4, 2021 •

edited

artursouza commented Oct 4, 2021

javageek79 commented Oct 5, 2021

artursouza commented Oct 6, 2021

javageek79 commented Oct 6, 2021

artursouza commented Oct 6, 2021

javageek79 commented Oct 6, 2021

artursouza commented Nov 5, 2021

artursouza commented Jan 7, 2022 •

edited

artursouza commented Jan 14, 2022

cosmosdb statestore partitioning broken under load #3716

cosmosdb statestore partitioning broken under load #3716

Comments

javageek79 commented Sep 27, 2021

In what area(s)?

What version of Dapr?

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Release Note

artursouza commented Sep 29, 2021

javageek79 commented Sep 29, 2021

artursouza commented Sep 30, 2021

artursouza commented Sep 30, 2021

artursouza commented Sep 30, 2021

artursouza commented Oct 4, 2021

javageek79 commented Oct 4, 2021 • edited

artursouza commented Oct 4, 2021

javageek79 commented Oct 5, 2021

artursouza commented Oct 6, 2021

javageek79 commented Oct 6, 2021

artursouza commented Oct 6, 2021

javageek79 commented Oct 6, 2021

artursouza commented Nov 5, 2021

artursouza commented Jan 7, 2022 • edited

artursouza commented Jan 14, 2022

javageek79 commented Oct 4, 2021 •

edited

artursouza commented Jan 7, 2022 •

edited