Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sharding): Additional legacy shard algorithm that provides consistent shard on duplicate server URLs (#18118) #18713

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

Ezzahhh
Copy link

@Ezzahhh Ezzahhh commented Jun 18, 2024

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

Closes #18118

There is an edge case where cluster secrets may be duplicated (by server URL, or in other words multiple cluster secrets that point to the same physical cluster) to be able to easily target tenants for an ApplicationSet (multiple deployments of the same helm chart into different namespaces) within the same physical cluster.

Unfortunately, this causes an issue when there are more than 1 controller replicas, because the sharding algorithms assume that the cluster ID uniquely identifies each physical cluster. This results in potentially the same physical cluster being sharded to more than one controller and causing sync/race condition issues (e.g. helm hooks breaking as two controllers race against each other).

This PR adds an additional sharding algorithm that is a modification of the existing legacy distribution algorithm. It shards by the cluster's server URL instead so that any cluster with the same server URL will return the same consistent shard to avoid the above described issue. The downside is that sharding on the server URL may potentially be less evenly distributed compared to using the ID.

Please let me know if there is a better method or preferred solution to this issue; this seemed like the least harmful to cause issue to existing users as it is opt-in. Our current workaround in production is to specify the shard for all clusters that are shared but it could become difficult over time to maintain this and ensure consistency.

…icate server URLs

Signed-off-by: Ezzahhh <38420555+Ezzahhh@users.noreply.github.com>
Copy link

bunnyshell bot commented Jun 18, 2024

✅ Preview Environment deployed on Bunnyshell

Component Endpoints
argocd https://argocd-hn2ugq.bunnyenv.com/
argocd-ttyd https://argocd-web-cli-hn2ugq.bunnyenv.com/

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔴 /bns:stop to stop the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

} else {
h := fnv.New32a()
_, _ = h.Write([]byte(server))
shard := int32(h.Sum32() % uint32(replicas))

Check failure

Code scanning / CodeQL

Incorrect conversion between integer types High

Incorrect conversion of an integer with architecture-dependent bit size from
strconv.ParseInt
to a lower bit size type uint32 without an upper bound check.
Signed-off-by: Ezzahhh <38420555+Ezzahhh@users.noreply.github.com>
@Ezzahhh Ezzahhh marked this pull request as ready for review June 18, 2024 11:32
@Ezzahhh Ezzahhh requested a review from a team as a code owner June 18, 2024 11:32
Copy link

codecov bot commented Jun 18, 2024

Codecov Report

Attention: Patch coverage is 54.54545% with 10 lines in your changes missing coverage. Please review.

Project coverage is 50.66%. Comparing base (64b76f2) to head (2fe7ca4).

Files Patch % Lines
controller/sharding/sharding.go 54.54% 6 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #18713      +/-   ##
==========================================
+ Coverage   50.64%   50.66%   +0.01%     
==========================================
  Files         315      315              
  Lines       43359    43381      +22     
==========================================
+ Hits        21960    21978      +18     
- Misses      18893    18895       +2     
- Partials     2506     2508       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Ezzahhh
Copy link
Author

Ezzahhh commented Jul 20, 2024

Sorry for pinging @ishitasequeira, not sure who to ask about this PR so am happy to take critique or guidance. Does this look like an acceptable addition to address this edge case? If not, is there any suggested method other than the workaround described (setting shard manually on each shared cluster secret)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster secrets with identical server URL should resolve to the same shard
1 participant