Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cosmosdb statestore partitioning broken under load #3716

Closed
javageek79 opened this issue Sep 27, 2021 · 16 comments
Closed

cosmosdb statestore partitioning broken under load #3716

javageek79 opened this issue Sep 27, 2021 · 16 comments
Assignees
Labels
area/runtime/actor kind/bug Something isn't working P1 pinned size/S 1 week of work triaged/resolved Indicates that this issue has been triaged
Milestone

Comments

@javageek79
Copy link

In what area(s)?

/area runtime

/area operator

/area placement

/area docs

/area test-and-release

What version of Dapr?

1.4.0

1.0.x
edge: output of git describe --dirty

Expected Behavior

Having only one db entry called actors||ACTOR_TYPE||metadata and n buckets, where n is the number of partitions.

Actual Behavior

There are multiple parent entries in db with name actors||ACTOR_TYPE for each id value, there are n partition-entries, in db containing 200 - 400 reminder entries. value property is set to null.
There is also one single actors||ACTOR_TYPE||metadata with also n partitions and a properly initialized value field.

Table shows parent entries in statestore. For each entry, there are also 50 partiotion-bucket entries as mentioned above.

id partitionkey ts date value-attribute
actors||ACTOR_TYPE 8e7c8966-ac47-41bb-8f98-1e4b4633feb3 1632737080 27.09.2021 10:04 null
actors||ACTOR_TYPE||metadata actors||ACTOR_TYPE||metadata 1632737080 27.09.2021 10:04 {  "id": "8e7c8966-ac47-41bb-8f98-1e4b4633feb3",  "actorRemindersMetadata": {    "partitionCount": 50  }        }
actors||ACTOR_TYPE 6c8f8b84-d4bd-405e-a349-57065e644d73 1632735901 27.09.2021 09:45 null
actors||ACTOR_TYPE a7e27bd7-dc5b-45cc-8760-60d615d3b455 1632734569 27.09.2021 09:22 null

The screenshots show the time frame in which the loadtest took place.
The loadtest started at 11:12:23am and created 2k actors each registering 7 reminders.
2k reminders where supposed to execute at 11:45. After registering those reminders, all actors have been deactivated before 11:45 and recreated with an execution tim at 11:35.
The reregistration started at 11:19:47am
actors||ACTOR_TYPE||metadata -> id == 8e7c8966-ac47-41bb-8f98-1e4b4633feb3
image
actors||ACTOR_TYPE -> id == 8e7c8966-ac47-41bb-8f98-1e4b4633feb3
image
actors||ACTOR_TYPE -> id == a7e27bd7-dc5b-45cc-8760-60d615d3b455
image
actors||ACTOR_TYPE -> id == 6c8f8b84-d4bd-405e-a349-57065e644d73
image

Steps to Reproduce the Problem

It was revealed only through our loadtest session. In normal load situation, error situation did not occur.

Release Note

RELEASE NOTE:

@javageek79 javageek79 added the kind/bug Something isn't working label Sep 27, 2021
@artursouza
Copy link
Member

This seems to be a hot partition problem in CosmosDB. I recommend going through this documentation first: https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview#physical-partitions

Then, see if your provisioned throughput is enough to sustain this load. The throughput is evenly split across partitions, so there might be a need to overprovision to compensate for a hot partition.

Another strategy to minimize the hot partition problem is to play with the partition count. For example, try using a prime number. In your scenario, the next prime number is 53.

@javageek79
Copy link
Author

ok, thx. We will try the prime approach first.

@artursouza artursouza self-assigned this Sep 30, 2021
@artursouza
Copy link
Member

The fact that you see Key "actors||ACTOR_TYPE" with PartitionKey "a7e27bd7-dc5b-45cc-8760-60d615d3b455" is odd because the first is the old way data was saved while the second is partitioned - which means the Id was supposed to be per reminder partition and not for the global "old" way.

@artursouza
Copy link
Member

I am looking into this issue.

@artursouza artursouza added this to the v1.5 milestone Sep 30, 2021
@artursouza artursouza added this to Backlog in Dapr Roadmap via automation Sep 30, 2021
@artursouza artursouza added P1 area/runtime/actor size/S 1 week of work triaged/resolved Indicates that this issue has been triaged labels Sep 30, 2021
@artursouza
Copy link
Member

To clarify, any instance of id "actors||ACTOR_TYPE" should have a partitionKey of "actors||ACTOR_TYPE" as well.

@artursouza
Copy link
Member

I reproduced a slightly different problem. The issue I found is that the "actors||ACTOR_TYPE" record is created even after the partition is set and none of the partition reminder records were used. I will find the root cause for this, which might lead to fix the issue you are seeing.

image

@javageek79
Copy link
Author

javageek79 commented Oct 4, 2021

Hi @artursouza , sounds promising. It would be great, to have a fix available soon.
Last loadtest ran with 1009 partitions, caused ~75.000 entries on the db, where only 1.010 were expected due to this behavior ;-)

@artursouza
Copy link
Member

@javageek79 I found the root cause for the behavior I was seeing before. Let me explain my test scenario:

test client ---> dapr sidecar A -----> dapr sidecar B ---> actor app

The issue I was facing is because sidecar A was being invoked to register a reminder but it was not configured with the Actor.TypeMetadata feature. Only sidecar B had that configuration. Once I added this config, it worked as expected.

Can you confirm if the same is going on in your test scenario? Please, make sure that all sidecar instances have this preview feature enabled:

spec:
  features:
    - name: Actor.TypeMetadata
      enabled: true

@javageek79
Copy link
Author

Hi @artursouza, the feature is enabled in a config.yml which is referenced only by sidecar B.
The connection between sidecar A and sidecar B is via Azure Service Bus. Therefore have no direct connection.
Does every dapr related service needs to reference the config to make the feature work?
How can I verify that the feature was enabled in sidecars?

@artursouza
Copy link
Member

Do a kubectl describe pod <pod> to check the config being used in the dapr.io/config annotation. https://docs.dapr.io/operations/configuration/configuration-overview/

@javageek79
Copy link
Author

In the one Service that uses the statestore or in all services that have sidecars?

@artursouza
Copy link
Member

Start with the ones that use state store first.

@javageek79
Copy link
Author

I've checked, only the ones using statestore ( 1 Service with 2-6 replicas) have config annotation set.

@artursouza
Copy link
Member

Moving this to 1.6. The action item is to remove the transaction logic and use batch write instead since transaction is not really required. Also, we will change the partitionKey to be simply the key - which is a breaking change and will need to provide a script to migrate the data.

@artursouza artursouza modified the milestones: v1.5, v1.6 Nov 5, 2021
@artursouza
Copy link
Member

artursouza commented Jan 7, 2022

We have changed how reminder partitions are migrated in 1.6 by using batch instead of transaction, migrating only at start up and not on registering/unregistering reminders and blocking multiple migrations in parallel from the same process. This should reduce contention in reminder persistence.

In addition, we also fixed a bug that was allowing multiple state stores for actors to be randomly selected instead of failing the sidecar right away.

We also found an issue on how first-write-wins was being used when the record was being written for the first time and eTag was still nil, it ended up falling back to last-write-wins. This was also fixed.

Please, observe these in 1.6 release to confirm if issues continues.

@artursouza
Copy link
Member

I am closing this issue as part of the release activities since we made changes to address this problem. Please, reopen if it can still be reproduced after 1.6 release.

@artursouza artursouza moved this from Backlog to Done in Dapr Roadmap Feb 1, 2022
@artursouza artursouza moved this from Done to Released in Dapr Roadmap Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime/actor kind/bug Something isn't working P1 pinned size/S 1 week of work triaged/resolved Indicates that this issue has been triaged
Projects
Development

No branches or pull requests

2 participants