Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the maximum number of candidate pairs #605

Merged
merged 6 commits into from
Feb 23, 2021

Conversation

hardbyte
Copy link
Collaborator

Adds two global settings to protect the service from running out of memory due to excessive numbers of candidate pairs being processed.

Closes #595

Copy link
Collaborator

@wilko77 wilko77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer to not even start a run if it is likely to exceed the resources. However, I don't have a good idea on how to estimate the number of candidate pairs above the threshold...

What I am a bit concerned about with your solution is that the server quietly fails. What's gonna be the first thing a user will do when he sees that his run errored? Try again.
Would it be possible to have some sort of state-info field in the run table that we can include in the output of the run status endpoint?

backend/entityservice/cache/progress.py Outdated Show resolved Hide resolved
if global_candidates_for_run is not None and global_candidates_for_run > Config.SIMILARITY_SCORES_MAX_CANDIDATE_PAIRS:
log.warning(f"This run has created more than the global limit of candidate pairs. Setting state to 'error'")
with DBConn() as conn:
update_run_mark_failure(conn, run_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the user will never know why the run failed?

if len(candidate_pairs[0]) > config.SOLVER_MAX_CANDIDATE_PAIRS:
log.warning(f"Attempting to solve with more than the global limit of candidate pairs.")
with DBConn() as conn:
update_run_mark_failure(conn, run_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. Has the run table the ability to store an error message, and could that be passed on to the user?

@hardbyte
Copy link
Collaborator Author

It would be nicer to not even start a run if it is likely to exceed the resources. However, I don't have a good idea on how to estimate the number of candidate pairs above the threshold...

Yeah that would be nice, a very simple additional protection that can be pre-computed is including a limit on the number of comparisons?

What I am a bit concerned about with your solution is that the server quietly fails. What's gonna be the first thing a user will do when he sees that his run errored? Try again. Would it be possible to have some sort of state-info field in the run table that we can include in the output of the run status endpoint?

It is certainly possible to add a new column to store error details in the run table - something like this in the alembic model:

err_msg = Column(String, nullable=True)

IMO I think that should be tackled separately though.

@hardbyte hardbyte merged commit 9502f9a into develop Feb 23, 2021
@hardbyte hardbyte deleted the feature-max-number-of-matches branch February 23, 2021 01:43
@hardbyte hardbyte added this to the Entity Service v1.14 milestone Feb 23, 2021
wilko77 added a commit that referenced this pull request Feb 24, 2021
* catching NoSuchBucket exception at cleanup (#576)

* fix NoSuchBucket error (#577)

* use the same helper function as the other tests

* more logging on server side

* we need read access to check if bucket exists

* creating bucket if it does not exist

* Bump marshmallow from 3.6.0 to 3.6.1 in /base

Bumps [marshmallow](https://github.com/marshmallow-code/marshmallow) from 3.6.0 to 3.6.1.
- [Release notes](https://github.com/marshmallow-code/marshmallow/releases)
- [Changelog](https://github.com/marshmallow-code/marshmallow/blob/dev/CHANGELOG.rst)
- [Commits](marshmallow-code/marshmallow@3.6.0...3.6.1)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* Bump pytest from 5.3.5 to 5.4.3 in /base

Bumps [pytest](https://github.com/pytest-dev/pytest) from 5.3.5 to 5.4.3.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/master/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@5.3.5...5.4.3)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* Bump anonlink-client from 0.1.2 to 0.1.3 in /base

Bumps [anonlink-client](https://github.com/data61/anonlink-client) from 0.1.2 to 0.1.3.
- [Release notes](https://github.com/data61/anonlink-client/releases)
- [Changelog](https://github.com/data61/anonlink-client/blob/master/CHANGELOG.md)
- [Commits](https://github.com/data61/anonlink-client/commits)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* update python3 version (#582)

* update python3 version

* update anonlink-client, as old version was broken

* that overwrites the version from base for no good reason.

* Bump ijson from 3.0.4 to 3.1.1 in /base

Bumps [ijson](https://github.com/ICRAR/ijson) from 3.0.4 to 3.1.1.
- [Release notes](https://github.com/ICRAR/ijson/releases)
- [Changelog](https://github.com/ICRAR/ijson/blob/master/CHANGELOG.md)
- [Commits](ICRAR/ijson@v3.0.4...v3.1.1)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* Bump celery from 4.4.2 to 4.4.7 in /base

Bumps [celery](https://github.com/celery/celery) from 4.4.2 to 4.4.7.
- [Release notes](https://github.com/celery/celery/releases)
- [Changelog](https://github.com/celery/celery/blob/master/Changelog.rst)
- [Commits](celery/celery@4.4.2...v4.4.7)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* Fix case sensitivity in minio metadata

Field names in HTTP headers are case-insensitive, some networks decide that means that can normalize them however they like.

Minio's stat.metadata is a dict of custom HTTP headers. This
small change ensures that queries will get the header regardless
of the case.

* Migrate off deprecated K8s dependencies (#596)

* Update helm minio chart by several major versions
* Migrate off deprecated redis-ha repository
* Provide a fallback UPLOAD_OBJECT_STORE_SERVER option as an ingress isn't required for minio to work.
* Documents upload object store configuration.
* Update azure pipelines
* Update base image deps
* Pin an older version of bitarray
* Update minio image used with docker compose
* Bump the chart version

* Update ingress to include path
Remove defaults from values file for ingress settings
Fixed two typos in the templates.

* Documents ingress configuration

* Updates base and Python dependencies (#601)

* Updates base alpine image
* Updates python requirements
* Use latest release of anonlink and minio
* Fix docker build script and benchmark image
* Adjusts to a new minio. Noticed that minio has a bug if the assume role duration is less than an hour.

* Expose similarities via object store (#594)

Sparse similarity results can be extremely large, this commit adds an option for callers to request the object store path of the similarity results instead of the results themselves.

* Adds a small test ensuring we can pull similarity scores via object store
* Build script now builds the test docker image
* Put common environment variables into a .env file for docker-compose
* Store credentials with environment variable names to avoid confusion and reduce duplication
* The init object store script now creates a readonly user
* Updates documentation on uploading and downloading via object store

* [minor] Update entity-service chart to use helm api v2 (#606)

* Initialize database via alembic

* Delete the raw SQL to create the database

* Update k8s deployment to use alembic

* Update queries to use run_id instead of run for run_results table

* Minio python API now requires a "DeleteObject"

I don't know why.

* Base wasn't building

* Bump psycopg2 from 2.8.4 to 2.8.6 in /base (#604)

Bumps [psycopg2](https://github.com/psycopg/psycopg2) from 2.8.4 to 2.8.6.
- [Release notes](https://github.com/psycopg/psycopg2/releases)
- [Changelog](https://github.com/psycopg/psycopg2/blob/master/NEWS)
- [Commits](https://github.com/psycopg/psycopg2/commits)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Co-authored-by: wilko77 <wilko77@users.noreply.github.com>

* Bump alpine from 3.13.1 to 3.13.2 in /base

Bumps alpine from 3.13.1 to 3.13.2.

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* Run migration jobs after upgrade as well as after install (#611)

* Connect to object store using TLS if configured to (#614)

* Update env var names in k8s init jobs (#612)

* Update env var name
* Update comment in deployment values
* Update environment variable used in alembic

* Bump iso8601 from 0.1.12 to 0.1.14 in /base

Bumps [iso8601](https://github.com/micktwomey/pyiso8601) from 0.1.12 to 0.1.14.
- [Release notes](https://github.com/micktwomey/pyiso8601/releases)
- [Commits](micktwomey/pyiso8601@0.1.12...0.1.14)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* bump python3 dependency

Alpine 3.13.2 now needs python3=3.8.7-r1

* Adds recommended k8s labels to  deployments and services (#616)

* Limit the maximum number of candidate pairs (#605)

* Cache the number of identified candidates along with the number of comparisons carried out.
* Update cache test
* Add global limits on number of edges
* Handle the case where there are no cached edges

* Add a step in the integration test pipeline validating if a test result file exists, otherwise fails. (#618)

* Add optional pod annotations to init jobs (#619)

* Adds changelog/release notes for v1.14.0 (#620)

* Proposed changelog for v1.14.0
* Update azure-pipelines.yml to fix a name change in a previous PR...

Co-authored-by: wilko77 <wilko77@users.noreply.github.com>

* Bump pytest-xdist from 1.29.0 to 2.2.1 in /base

Bumps [pytest-xdist](https://github.com/pytest-dev/pytest-xdist) from 1.29.0 to 2.2.1.
- [Release notes](https://github.com/pytest-dev/pytest-xdist/releases)
- [Changelog](https://github.com/pytest-dev/pytest-xdist/blob/master/CHANGELOG.rst)
- [Commits](pytest-dev/pytest-xdist@v1.29.0...v2.2.1)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* bump version number

* more bumping...

Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Co-authored-by: Brian Thorne <brian@hardbyte.nz>
Co-authored-by: Brian Thorne <brian@thorne.link>
Co-authored-by: Guillaume Smith <gusmith@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit maximum number of edges
2 participants