Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(utility): add lock for cached file between processes #1151

Merged

Conversation

Lee-000
Copy link
Collaborator

@Lee-000 Lee-000 commented Dec 6, 2021

When using PyTorch dataloader with multiple workers, the cache file will
be accessed by different processes. And the cache file can be written
and read at the same time, which would lead to the read content being
empty.

Thus, use a temp filename while writing the cache file and then change
it to the cache path. And add a lock to avoid repeatedly writing cache.

@Lee-000 Lee-000 requested a review from linjiX as a code owner December 6, 2021 13:03
@coveralls
Copy link

coveralls commented Dec 6, 2021

Pull Request Test Coverage Report for Build 1627169800

  • 40 of 41 (97.56%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.04%) to 85.888%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tensorbay/utility/common.py 29 30 96.67%
Totals Coverage Status
Change from base Build 1626244239: 0.04%
Covered Lines: 7699
Relevant Lines: 8964

💛 - Coveralls

tensorbay/utility/common.py Outdated Show resolved Hide resolved
tensorbay/utility/common.py Outdated Show resolved Hide resolved
tensorbay/utility/common.py Outdated Show resolved Hide resolved
def wrapper(self: Any, *arg: Any, **kwargs: Any) -> None:
key = getattr(self, attr_name) if attr_name else id(self)
# https://github.com/PyCQA/pylint/issues/3313
lock = process_locks.setdefault(key, manager.Lock()) # pylint: disable=no-member
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get item from a Manager.dict() seems a copy behavior, which means the gotten lock is a copy of the lock in the dict. That means the operation in the outside lock will not affect the inside lock?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared objects are capable of being nested. The manager.Lock() is also a shared object, thus the change will be synced to the dict.
https://docs.python.org/3.8/library/multiprocessing.html#multiprocessing.managers.SyncManager

@Lee-000 Lee-000 force-pushed the T19988_add_multiprocess_lock_for_cache branch from 3d38af0 to 2490402 Compare December 7, 2021 08:00
@Lee-000 Lee-000 requested a review from linjiX December 7, 2021 08:01
@linjiX linjiX requested a review from AChenQ December 7, 2021 08:43
@Lee-000 Lee-000 force-pushed the T19988_add_multiprocess_lock_for_cache branch from 2490402 to b50bae0 Compare December 7, 2021 08:55
tensorbay/utility/common.py Outdated Show resolved Hide resolved
tensorbay/utility/common.py Show resolved Hide resolved
@Lee-000 Lee-000 force-pushed the T19988_add_multiprocess_lock_for_cache branch 2 times, most recently from 1858cae to 3a07341 Compare December 8, 2021 07:13
@Lee-000 Lee-000 requested a review from linjiX December 8, 2021 07:14
@Lee-000 Lee-000 force-pushed the T19988_add_multiprocess_lock_for_cache branch 2 times, most recently from 1a95b3f to ba54946 Compare December 10, 2021 08:11
def wrapper(func_self: Any, *arg: Any, **kwargs: Any) -> None:
key = getattr(func_self, self._attr_name)
lock_exists = False
if key in self.process_locks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it possible that two processes go into this if statement at the same time, and two processes both return false, which makes them both create their own locks?

@Lee-000 Lee-000 force-pushed the T19988_add_multiprocess_lock_for_cache branch from ba54946 to 8fb8259 Compare December 13, 2021 11:58
# https://github.com/PyCQA/pylint/issues/3313
self.process_locks[key] = self._manager.Lock() # pylint: disable=no-member
with _acquire(self.process_locks[key]) as success:
self._lock.release()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are something wrong happens before releasing the self._lock, will it cause self._lock never release?

self._lock.release()
if success and not lock_exists:
func(func_self, *arg, **kwargs)
del self.process_locks[key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It that possible one process is deleting the lock from self.process_locks, and at the same time another process is checking whether the key in the self.process_locks?

@Lee-000 Lee-000 force-pushed the T19988_add_multiprocess_lock_for_cache branch from 8fb8259 to 12a54d7 Compare December 15, 2021 06:11
tensorbay/utility/common.py Outdated Show resolved Hide resolved
@Lee-000 Lee-000 force-pushed the T19988_add_multiprocess_lock_for_cache branch from 12a54d7 to a7b26ef Compare December 15, 2021 09:35
@Lee-000 Lee-000 force-pushed the T19988_add_multiprocess_lock_for_cache branch from a7b26ef to 7797a2d Compare December 27, 2021 13:41
When using PyTorch dataloader with multiple workers, the cache file will
be accessed by different processes. And the cache file can be written
and read at the same time, which would lead to the read content being
empty.

Thus, use a temp filename while writing the cache file and then change
it to the cache path. And add a lock to avoid repeatedly writing cache.

PR Closed: Graviti-AI#1151
@Lee-000 Lee-000 force-pushed the T19988_add_multiprocess_lock_for_cache branch from 7797a2d to abb7b9c Compare December 28, 2021 12:09
@Lee-000 Lee-000 merged commit 8c6b192 into Graviti-AI:main Dec 28, 2021
@Lee-000 Lee-000 deleted the T19988_add_multiprocess_lock_for_cache branch December 28, 2021 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants