Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock Module Implementation #1567

Merged
merged 40 commits into from May 18, 2021
Merged

Lock Module Implementation #1567

merged 40 commits into from May 18, 2021

Conversation

jackyzha0
Copy link
Collaborator

@jackyzha0 jackyzha0 commented Apr 5, 2021

Description

Closes #1540

Motivation and Context

See #1540

How Has This Been Tested?

  • local unit and integration tests

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature and improvements (non-breaking change which adds/improves functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Code Refactoring (internal change which is not user facing)
  • Documentation
  • Test, CI, or build

Component(s) if applicable

  • BentoService (service definition, dependency management, API input/output adapters)
  • Model Artifact (model serialization, multi-framework support)
  • Model Server (mico-batching, dockerisation, logging, OpenAPI, instruments)
  • YataiService gRPC server (model registry, cloud deployment automation)
  • YataiService web server (nodejs HTTP server and web UI)
  • Internal (BentoML's own configuration, logging, utility, exception handling)
  • BentoML CLI

Checklist:

  • My code follows the bentoml code style, both ./dev/format.sh and
    ./dev/lint.sh script have passed
    (instructions).
  • My change reduces project test coverage and requires unit tests to be added
  • I have added unit tests covering my code change
  • My change requires a change to the documentation
  • I have updated the documentation accordingly

@pep8speaks
Copy link

pep8speaks commented Apr 5, 2021

Hello @jackyzha0, Thanks for updating this PR.

There are currently no PEP 8 issues detected in this PR. Cheers! 🍻

Comment last updated at 2021-05-17 18:44:37 UTC

@codecov
Copy link

codecov bot commented Apr 5, 2021

Codecov Report

Merging #1567 (2e55289) into master (0a5624a) will decrease coverage by 2.97%.
The diff coverage is 85.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1567      +/-   ##
==========================================
- Coverage   68.72%   65.75%   -2.98%     
==========================================
  Files         154      164      +10     
  Lines       10255    11554    +1299     
==========================================
+ Hits         7048     7597     +549     
- Misses       3207     3957     +750     
Impacted Files Coverage Δ
bentoml/yatai/db/stores/label.py 84.33% <ø> (-0.19%) ⬇️
bentoml/exceptions.py 79.62% <75.00%> (-0.38%) ⬇️
bentoml/yatai/yatai_service_impl.py 53.84% <78.94%> (+1.19%) ⬆️
bentoml/yatai/db/stores/lock.py 79.24% <79.24%> (ø)
bentoml/yatai/locking/lock.py 97.43% <97.43%> (ø)
bentoml/yatai/db/__init__.py 81.94% <100.00%> (+1.06%) ⬆️
bentoml/tracing/__init__.py 62.96% <0.00%> (-13.97%) ⬇️
bentoml/cli/__init__.py 92.30% <0.00%> (-4.25%) ⬇️
bentoml/marshal/marshal.py 73.33% <0.00%> (-1.22%) ⬇️
bentoml/cli/bento_service.py 78.90% <0.00%> (-1.10%) ⬇️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a5624a...2e55289. Read the comment docs.

@@ -99,7 +102,8 @@ def create_all_or_upgrade_db(self):
inspector = inspect(self.engine)
tables = inspector.get_table_names()

if 'deployments' not in tables or 'bentos' not in tables:
table_names = ['deployments', 'bentos', 'labels', 'locks']
if not all([table_name in tables for table_name in table_names]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be "only create table if none of the table exist", right?

if all([table_name in tables for table_name not in table_names]):


@staticmethod
def release(sess, resource_id):
lock = LockStore._find_lock(sess, resource_id)
Copy link
Member

@bojiang bojiang Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't verify the owner of the lock. A worker may release the lock held by others.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that we don't expose this acquire/release API to users and only lock using the context manager in yatai/locking/lock.py. That way, we don't need to store lock owner in the table.

@jackyzha0 jackyzha0 marked this pull request as ready for review May 3, 2021 21:03
@jackyzha0 jackyzha0 requested review from parano and bojiang May 3, 2021 21:28
@jackyzha0 jackyzha0 requested a review from yubozhao May 7, 2021 13:53
Copy link
Collaborator Author

@jackyzha0 jackyzha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be potential problems

Comment on lines +59 to +69
with db.create_session() as sess:
lock_objs = []
# acquire all locks in lock list
for (lock_identifier, lock_type) in locks:
lock_obj = LockStore.acquire(
sess, lock_type, lock_identifier, ttl_min
)
lock_objs.append(lock_obj)

# try to commit all locks to db
sess.commit()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session is created on line 59 and changes are finally commited on line 69. this may cause lock contention if another yataiservice tries to acquire a lock in the time that the for loop runs for


def run_delayed_thread(t1, t2):
t1.start()
time.sleep(1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything less than 0.5s here causes some instability in the guarantee that t1 acquires lock before t2 here (related to concern above)

@yubozhao yubozhao merged commit cc8cf31 into bentoml:master May 18, 2021
aarnphm pushed a commit to aarnphm/BentoML that referenced this pull request Jul 29, 2022
* fix: makefile sphinx deps

* docs: added batch_serving docs and linked to it in quickstart

* refactor database operations into db class

* remove old db code

* migrate store related functionality to yatai/db/stores module

* makefile + sources typo

* refactor to static classes

* finish static refactor

* fix yatai_service_impl

* fix bad patch in test_usage_stats

* patch metadatastore property naming bug

* lint + format

* more lint + fmt

* lint + format (again)

* add base lock table and orm impl

* add locking module impl with retry

* wrap yatai service with Lock

* fix retry mechanism

* add proper lock release

* lint + fmt

* ttl implementation

* lint + fmt

* fix broken label test and added ability to pass sesison to lock fn

* multilock support

* lint + fmt (again)

* whitespace fix

* add better documentation

* line shortening and more docstrings

* base multithreaded test

* clean up lock test

* linter pass

* fix up tests

* unit test for expiring/lock

* lint + fmt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bundle/Deployment Locking Module
5 participants