Limit the maximum number of candidate pairs #605

hardbyte · 2021-02-16T03:40:32Z

Adds two global settings to protect the service from running out of memory due to excessive numbers of candidate pairs being processed.

Closes #595

…mparisons carried out.

wilko77

It would be nicer to not even start a run if it is likely to exceed the resources. However, I don't have a good idea on how to estimate the number of candidate pairs above the threshold...

What I am a bit concerned about with your solution is that the server quietly fails. What's gonna be the first thing a user will do when he sees that his run errored? Try again.
Would it be possible to have some sort of state-info field in the run table that we can include in the output of the run status endpoint?

backend/entityservice/cache/progress.py

wilko77 · 2021-02-16T23:04:20Z

backend/entityservice/tasks/comparing.py

+        if global_candidates_for_run is not None and global_candidates_for_run > Config.SIMILARITY_SCORES_MAX_CANDIDATE_PAIRS:
+            log.warning(f"This run has created more than the global limit of candidate pairs. Setting state to 'error'")
+            with DBConn() as conn:
+                update_run_mark_failure(conn, run_id)


So the user will never know why the run failed?

wilko77 · 2021-02-16T23:05:46Z

backend/entityservice/tasks/solver.py

+        if len(candidate_pairs[0]) > config.SOLVER_MAX_CANDIDATE_PAIRS:
+            log.warning(f"Attempting to solve with more than the global limit of candidate pairs.")
+            with DBConn() as conn:
+                update_run_mark_failure(conn, run_id)


same here. Has the run table the ability to store an error message, and could that be passed on to the user?

hardbyte · 2021-02-17T04:09:02Z

It would be nicer to not even start a run if it is likely to exceed the resources. However, I don't have a good idea on how to estimate the number of candidate pairs above the threshold...

Yeah that would be nice, a very simple additional protection that can be pre-computed is including a limit on the number of comparisons?

What I am a bit concerned about with your solution is that the server quietly fails. What's gonna be the first thing a user will do when he sees that his run errored? Try again. Would it be possible to have some sort of state-info field in the run table that we can include in the output of the run status endpoint?

It is certainly possible to add a new column to store error details in the run table - something like this in the alembic model:

err_msg = Column(String, nullable=True)

IMO I think that should be tackled separately though.

* catching NoSuchBucket exception at cleanup (#576) * fix NoSuchBucket error (#577) * use the same helper function as the other tests * more logging on server side * we need read access to check if bucket exists * creating bucket if it does not exist * Bump marshmallow from 3.6.0 to 3.6.1 in /base Bumps [marshmallow](https://github.com/marshmallow-code/marshmallow) from 3.6.0 to 3.6.1. - [Release notes](https://github.com/marshmallow-code/marshmallow/releases) - [Changelog](https://github.com/marshmallow-code/marshmallow/blob/dev/CHANGELOG.rst) - [Commits](marshmallow-code/marshmallow@3.6.0...3.6.1) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * Bump pytest from 5.3.5 to 5.4.3 in /base Bumps [pytest](https://github.com/pytest-dev/pytest) from 5.3.5 to 5.4.3. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/master/CHANGELOG.rst) - [Commits](pytest-dev/pytest@5.3.5...5.4.3) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * Bump anonlink-client from 0.1.2 to 0.1.3 in /base Bumps [anonlink-client](https://github.com/data61/anonlink-client) from 0.1.2 to 0.1.3. - [Release notes](https://github.com/data61/anonlink-client/releases) - [Changelog](https://github.com/data61/anonlink-client/blob/master/CHANGELOG.md) - [Commits](https://github.com/data61/anonlink-client/commits) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * update python3 version (#582) * update python3 version * update anonlink-client, as old version was broken * that overwrites the version from base for no good reason. * Bump ijson from 3.0.4 to 3.1.1 in /base Bumps [ijson](https://github.com/ICRAR/ijson) from 3.0.4 to 3.1.1. - [Release notes](https://github.com/ICRAR/ijson/releases) - [Changelog](https://github.com/ICRAR/ijson/blob/master/CHANGELOG.md) - [Commits](ICRAR/ijson@v3.0.4...v3.1.1) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * Bump celery from 4.4.2 to 4.4.7 in /base Bumps [celery](https://github.com/celery/celery) from 4.4.2 to 4.4.7. - [Release notes](https://github.com/celery/celery/releases) - [Changelog](https://github.com/celery/celery/blob/master/Changelog.rst) - [Commits](celery/celery@4.4.2...v4.4.7) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * Fix case sensitivity in minio metadata Field names in HTTP headers are case-insensitive, some networks decide that means that can normalize them however they like. Minio's stat.metadata is a dict of custom HTTP headers. This small change ensures that queries will get the header regardless of the case. * Migrate off deprecated K8s dependencies (#596) * Update helm minio chart by several major versions * Migrate off deprecated redis-ha repository * Provide a fallback UPLOAD_OBJECT_STORE_SERVER option as an ingress isn't required for minio to work. * Documents upload object store configuration. * Update azure pipelines * Update base image deps * Pin an older version of bitarray * Update minio image used with docker compose * Bump the chart version * Update ingress to include path Remove defaults from values file for ingress settings Fixed two typos in the templates. * Documents ingress configuration * Updates base and Python dependencies (#601) * Updates base alpine image * Updates python requirements * Use latest release of anonlink and minio * Fix docker build script and benchmark image * Adjusts to a new minio. Noticed that minio has a bug if the assume role duration is less than an hour. * Expose similarities via object store (#594) Sparse similarity results can be extremely large, this commit adds an option for callers to request the object store path of the similarity results instead of the results themselves. * Adds a small test ensuring we can pull similarity scores via object store * Build script now builds the test docker image * Put common environment variables into a .env file for docker-compose * Store credentials with environment variable names to avoid confusion and reduce duplication * The init object store script now creates a readonly user * Updates documentation on uploading and downloading via object store * [minor] Update entity-service chart to use helm api v2 (#606) * Initialize database via alembic * Delete the raw SQL to create the database * Update k8s deployment to use alembic * Update queries to use run_id instead of run for run_results table * Minio python API now requires a "DeleteObject" I don't know why. * Base wasn't building * Bump psycopg2 from 2.8.4 to 2.8.6 in /base (#604) Bumps [psycopg2](https://github.com/psycopg/psycopg2) from 2.8.4 to 2.8.6. - [Release notes](https://github.com/psycopg/psycopg2/releases) - [Changelog](https://github.com/psycopg/psycopg2/blob/master/NEWS) - [Commits](https://github.com/psycopg/psycopg2/commits) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com> Co-authored-by: wilko77 <wilko77@users.noreply.github.com> * Bump alpine from 3.13.1 to 3.13.2 in /base Bumps alpine from 3.13.1 to 3.13.2. Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * Run migration jobs after upgrade as well as after install (#611) * Connect to object store using TLS if configured to (#614) * Update env var names in k8s init jobs (#612) * Update env var name * Update comment in deployment values * Update environment variable used in alembic * Bump iso8601 from 0.1.12 to 0.1.14 in /base Bumps [iso8601](https://github.com/micktwomey/pyiso8601) from 0.1.12 to 0.1.14. - [Release notes](https://github.com/micktwomey/pyiso8601/releases) - [Commits](micktwomey/pyiso8601@0.1.12...0.1.14) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * bump python3 dependency Alpine 3.13.2 now needs python3=3.8.7-r1 * Adds recommended k8s labels to deployments and services (#616) * Limit the maximum number of candidate pairs (#605) * Cache the number of identified candidates along with the number of comparisons carried out. * Update cache test * Add global limits on number of edges * Handle the case where there are no cached edges * Add a step in the integration test pipeline validating if a test result file exists, otherwise fails. (#618) * Add optional pod annotations to init jobs (#619) * Adds changelog/release notes for v1.14.0 (#620) * Proposed changelog for v1.14.0 * Update azure-pipelines.yml to fix a name change in a previous PR... Co-authored-by: wilko77 <wilko77@users.noreply.github.com> * Bump pytest-xdist from 1.29.0 to 2.2.1 in /base Bumps [pytest-xdist](https://github.com/pytest-dev/pytest-xdist) from 1.29.0 to 2.2.1. - [Release notes](https://github.com/pytest-dev/pytest-xdist/releases) - [Changelog](https://github.com/pytest-dev/pytest-xdist/blob/master/CHANGELOG.rst) - [Commits](pytest-dev/pytest-xdist@v1.29.0...v2.2.1) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * bump version number * more bumping... Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com> Co-authored-by: Brian Thorne <brian@hardbyte.nz> Co-authored-by: Brian Thorne <brian@thorne.link> Co-authored-by: Guillaume Smith <gusmith@users.noreply.github.com>

hardbyte added 4 commits February 16, 2021 14:46

Cache the number of identified candidates along with the number of co…

66da30b

…mparisons carried out.

Update cache test

bf6ad5b

Add global limits on number of edges

a915557

Handle the case where there are no cached edges

30e3a3d

hardbyte requested a review from wilko77 February 16, 2021 03:40

wilko77 reviewed Feb 16, 2021

View reviewed changes

Rename argument to candidate_pairs in save_current_progress

b5d0363

hardbyte mentioned this pull request Feb 23, 2021

Store and expose error message for a run #617

Closed

hardbyte requested a review from wilko77 February 23, 2021 00:30

wilko77 approved these changes Feb 23, 2021

View reviewed changes

Merge branch 'develop' into feature-max-number-of-matches

defea8e

hardbyte merged commit 9502f9a into develop Feb 23, 2021

hardbyte deleted the feature-max-number-of-matches branch February 23, 2021 01:43

hardbyte added this to the Entity Service v1.14 milestone Feb 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit the maximum number of candidate pairs #605

Limit the maximum number of candidate pairs #605

hardbyte commented Feb 16, 2021

wilko77 left a comment

wilko77 Feb 16, 2021

wilko77 Feb 16, 2021

hardbyte commented Feb 17, 2021

Limit the maximum number of candidate pairs #605

Limit the maximum number of candidate pairs #605

Conversation

hardbyte commented Feb 16, 2021

wilko77 left a comment

Choose a reason for hiding this comment

wilko77 Feb 16, 2021

Choose a reason for hiding this comment

wilko77 Feb 16, 2021

Choose a reason for hiding this comment

hardbyte commented Feb 17, 2021