You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently running an ArangoDB cluster with the following specifications:
ArangoDB Version: 3.11.2
Deployment Mode: Cluster
Deployment Strategy: Kubernetes
Configuration: rook/ceph PVC provider
Infrastructure: own/kubeadm
Operating System: Ubuntu 20.04
Total RAM in your machine: 32Gb
Disks in use: SSD
Used Package: Docker - official Docker library
I have created a collection with a custom shardKey and expected data to be distributed among the cluster nodes. However, I noticed that all data is being inserted into the same node instead of being evenly distributed.
Reproduction Steps:
Create ArangoDB cluster with the specified settings.
Create a collection with a custom shardKey.
Insert data into the collection.
Expected Behavior:
I expect data to be evenly distributed among the cluster nodes based on the custom shardKey.
Observed Behavior:
All data is being inserted into the same node, despite the use of the custom shardKey.
Note that currently all the elements has the same shardKey value.
When I insert a new element with a new shardKey value, it starts to be inserted in another Leader, and it is not correct.
Additional Information:
I have checked the load balancing configuration and sharding settings, but I have not been able to resolve the issue. I would like to request assistance in understanding why data is not being properly distributed in my cluster environment or if there si something that I have to set in order to have the correct behavior.
Thank you for your attention and assistance.
Best regards,
Gianluca Valentini
The text was updated successfully, but these errors were encountered:
hm, the distribution is happening by the shard key you specified.
If all documents have the same value in that attribute, all documents will end in the same shard as you specified it?
Hi @dothebart
Thank you for your response.
I understand that the distribution is based on the shard key specified. However, I have observed that even though I have specified a custom shard key, the data is not being evenly distributed among the nodes. For example, if one shard key value groups 1,000,000 documents and another only 100, the data is not distributed equally among the nodes. Is this behavior to be expected? I appreciate your clarification on this matter
Hi @dothebart
If I use _key as the shard key, which is the default, Arango requires it to be present in a unique index. That's fine.
However, if the document also requires another unique field, the database gives me an error: Error: 1470 - shard key '_key' must be present in unique index.
So, if I understand correctly, the shard key must be the only unique field in the document, is this correct?
In my scenario, if I want to shard accounts using _key, I need to ensure that the userid is unique. But it is not possible.
What am I missing?
My Environment
I am currently running an ArangoDB cluster with the following specifications:
ArangoDB Version: 3.11.2
Deployment Mode: Cluster
Deployment Strategy: Kubernetes
Configuration: rook/ceph PVC provider
Infrastructure: own/kubeadm
Operating System: Ubuntu 20.04
Total RAM in your machine: 32Gb
Disks in use: SSD
Used Package: Docker - official Docker library
I have created a collection with a custom shardKey and expected data to be distributed among the cluster nodes. However, I noticed that all data is being inserted into the same node instead of being evenly distributed.
Reproduction Steps:
Create ArangoDB cluster with the specified settings.
Create a collection with a custom shardKey.
Insert data into the collection.
Expected Behavior:
I expect data to be evenly distributed among the cluster nodes based on the custom shardKey.
Observed Behavior:
![image](https://private-user-images.githubusercontent.com/3224592/259836970-ade0bef7-6d0c-4206-b10c-ae68f2566cc4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgwNDk3MjEsIm5iZiI6MTcxODA0OTQyMSwicGF0aCI6Ii8zMjI0NTkyLzI1OTgzNjk3MC1hZGUwYmVmNy02ZDBjLTQyMDYtYjEwYy1hZTY4ZjI1NjZjYzQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYxMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MTBUMTk1NzAxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YjMzNTE3Zjc1YWQ2MjAzYjQ0Y2IzNDkzYjliNzE0NWZhMGQwZTJmZjE2ZmU0ZGFhNTk5NmNjMzZjMmFjZjJjZSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.PGVhPJTQfGQ03HD1CLTuhiGGMQq-qF3ZkVahOVSXBLc)
All data is being inserted into the same node, despite the use of the custom shardKey.
Note that currently all the elements has the same shardKey value.
When I insert a new element with a new shardKey value, it starts to be inserted in another Leader, and it is not correct.
Additional Information:
I have checked the load balancing configuration and sharding settings, but I have not been able to resolve the issue. I would like to request assistance in understanding why data is not being properly distributed in my cluster environment or if there si something that I have to set in order to have the correct behavior.
Thank you for your attention and assistance.
Best regards,
Gianluca Valentini
The text was updated successfully, but these errors were encountered: