You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I propose adding a feature where the service has a configured limit on the maximum number of edges. This would prevent the server from dying a slow and horrible death of swapping to a halt.
Probably makes sense to have a globally set limit or limits.
As the solver runs fully in memory the deployed high memory worker's memory limit imposes an upper limit.
Say the max memory was 16GiB, and estimating that each candidate pair is 40 bytes (8 Byte score, 4 x 8 Byte indicies into dataset and records):
(16 gibibytes) / (40 bytes) =
429 496 730
In the case where a user is attempting to download raw similarity scores instead of using the solver the limit may be much higher, but it still makes sense to perhaps configure the system to avoid creating arbitrarily large outputs.
I think it could also be useful to override this limit, less than the global limit, per run.
How about these names and defaults?
SOLVER_MAX_CANDIDATE_PAIRS = 100M
SIMILARITY_SCORES_MAX_ CANDIDATE_PAIRS = 500M (Could instead limit the file size in GiB)
The backend task would discard the data and cause the run to fail if these limits are exceeded.
The text was updated successfully, but these errors were encountered:
I propose adding a feature where the service has a configured limit on the maximum number of edges. This would prevent the server from dying a slow and horrible death of swapping to a halt.
Probably makes sense to have a globally set limit or limits.
As the solver runs fully in memory the deployed high memory worker's memory limit imposes an upper limit.
Say the max memory was 16GiB, and estimating that each candidate pair is 40 bytes (8 Byte score, 4 x 8 Byte indicies into dataset and records):
In the case where a user is attempting to download raw similarity scores instead of using the solver the limit may be much higher, but it still makes sense to perhaps configure the system to avoid creating arbitrarily large outputs.
I think it could also be useful to override this limit, less than the global limit, per run.
How about these names and defaults?
SOLVER_MAX_CANDIDATE_PAIRS = 100M
SIMILARITY_SCORES_MAX_ CANDIDATE_PAIRS = 500M
(Could instead limit the file size in GiB)The backend task would discard the data and cause the run to fail if these limits are exceeded.
The text was updated successfully, but these errors were encountered: