Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit maximum number of edges #595

Closed
hardbyte opened this issue Dec 18, 2020 · 0 comments · Fixed by #605
Closed

Limit maximum number of edges #595

hardbyte opened this issue Dec 18, 2020 · 0 comments · Fixed by #605

Comments

@hardbyte
Copy link
Collaborator

hardbyte commented Dec 18, 2020

I propose adding a feature where the service has a configured limit on the maximum number of edges. This would prevent the server from dying a slow and horrible death of swapping to a halt.

Probably makes sense to have a globally set limit or limits.

As the solver runs fully in memory the deployed high memory worker's memory limit imposes an upper limit.
Say the max memory was 16GiB, and estimating that each candidate pair is 40 bytes (8 Byte score, 4 x 8 Byte indicies into dataset and records):

(16 gibibytes) / (40 bytes) =
429 496 730

In the case where a user is attempting to download raw similarity scores instead of using the solver the limit may be much higher, but it still makes sense to perhaps configure the system to avoid creating arbitrarily large outputs.

I think it could also be useful to override this limit, less than the global limit, per run.

How about these names and defaults?

  • SOLVER_MAX_CANDIDATE_PAIRS = 100M
  • SIMILARITY_SCORES_MAX_ CANDIDATE_PAIRS = 500M (Could instead limit the file size in GiB)

The backend task would discard the data and cause the run to fail if these limits are exceeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant