Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: interprocess lock for checking/opening ssh socket connections #5466

Merged
merged 2 commits into from Mar 16, 2021

Conversation

yarikoptic
Copy link
Member

This should prevent race condition between testing if socket path
already exists and establishing it.

Closes: #5460

Related issues discovered while troubleshooting for this one are
#5465
#5464

This should prevent race condition between testing if socket path
already exists and establishing it.

Closes: datalad#5460

Related issues discovered while troubleshooting for this one are
datalad#5465
datalad#5464
@yarikoptic yarikoptic added this to the 0.14.1 milestone Mar 4, 2021
@yarikoptic
Copy link
Member Author

@bpoldrack @mih you might like to test this on your ssh use cases to see if it is not adding a notable slow down -- our benchmarking wouldn't test for this.

@yarikoptic
Copy link
Member Author

ha -- travis hits old #4101 (and again on maint). appveyor is just unhappy and this time against more than those Macs... even ubuntu freaks out.

Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@@ -521,6 +521,12 @@ def open(self):
ConnectionOpenFailedError
When starting the SSH ControlMaster process failed.
"""
with fasteners.process_lock.InterProcessLock(self.ctrl_path.with_suffix('.lck')):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could datalad.support.locking.lock_if_check_fails be used here to avoid creating a lock in many cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right

  1. we should just have a lock instance bound to the instance and reuse it (should shave off some cycles).
  2. Indeed we should be able to use lock_if_check_fails but it would loose that (reuse instance) so I might need to RF it to allow to reuse a lock provided from outside to take advantage of 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided against using lock_if_check_fails: it was coded up with use cases in mind where check would check for something you expect to not disappear (e.g. downloaded annex key for an archive) between you checking and then doing some next action... so for faster paced use cases I am afraid there might be a race condition that we would happily not bother establishing the socket while it disappears from us. So, to be safer than sorry decided to just always lock so we do not get returned upon file check until some process establishes it (although I think my logic above is flawed and not using it would not give us any protection but at least code would be easier to follow and I do not think it would really make a big difference in terms of performance)

I think there might still be benefit from reusing a lock instance so I have pushed a RF which does that while relying on a feature of fasteners.locked to have a list of locks instead of a single one. So the diff becomes shorter ;)

…" feature of fasteners

I believe multi-lock feature was there since forever or exactly
since when .locked was added in 0.7.0~3^2~8  so no boost in version for
fasteners is needed
@codecov
Copy link

codecov bot commented Mar 15, 2021

Codecov Report

Merging #5466 (c5c13fe) into maint (56dd5dc) will increase coverage by 0.57%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##            maint    #5466      +/-   ##
==========================================
+ Coverage   89.54%   90.11%   +0.57%     
==========================================
  Files         296      296              
  Lines       41790    41829      +39     
==========================================
+ Hits        37420    37696     +276     
+ Misses       4370     4133     -237     
Impacted Files Coverage Δ
datalad/support/sshconnector.py 87.81% <100.00%> (ø)
datalad/customremotes/tests/test_archives.py 89.26% <0.00%> (-0.61%) ⬇️
datalad/support/gitrepo.py 91.82% <0.00%> (-0.15%) ⬇️
datalad/interface/tests/test_ls_webui.py 100.00% <0.00%> (ø)
datalad/support/tests/test_gitrepo.py 99.89% <0.00%> (+0.10%) ⬆️
datalad/support/tests/test_annexrepo.py 97.43% <0.00%> (+0.14%) ⬆️
datalad/distribution/tests/test_install.py 100.00% <0.00%> (+0.20%) ⬆️
datalad/cmd.py 92.60% <0.00%> (+0.29%) ⬆️
datalad/distribution/tests/test_dataset.py 99.70% <0.00%> (+0.29%) ⬆️
datalad/distribution/publish.py 89.61% <0.00%> (+0.34%) ⬆️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56dd5dc...c5c13fe. Read the comment docs.

self._lock = [
threading.Lock(),
fasteners.process_lock.InterProcessLock(self.ctrl_path.with_suffix('.lck'))
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's fancy. Thanks for leaving the comment to point future confused readers in the right direction.

@yarikoptic yarikoptic merged commit 0e243d8 into datalad:maint Mar 16, 2021
@yarikoptic yarikoptic deleted the bf-ssh-locking branch April 29, 2021 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants