Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add locks around fsspec operations #83

Merged
merged 2 commits into from
Nov 23, 2022
Merged

Add locks around fsspec operations #83

merged 2 commits into from
Nov 23, 2022

Conversation

jwodder
Copy link
Member

@jwodder jwodder commented Nov 16, 2022

No description provided.

@jwodder jwodder added the patch Increment the patch version when merged label Nov 16, 2022
@yarikoptic
Copy link
Member

Sine works -- let's finalize/merge/release. Ideally: could you add a test which would operate in multiple threads over some limited set of files , so we would double test that all works good? (or it would need also fsspec/filesystem_spec#1111 to work?)

@codecov
Copy link

codecov bot commented Nov 21, 2022

Codecov Report

Base: 70.69% // Head: 71.06% // Increases project coverage by +0.36% 🎉

Coverage data is based on head (4992d5a) compared to base (24727b8).
Patch coverage: 66.66% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #83      +/-   ##
==========================================
+ Coverage   70.69%   71.06%   +0.36%     
==========================================
  Files          11       11              
  Lines         761      788      +27     
==========================================
+ Hits          538      560      +22     
- Misses        223      228       +5     
Impacted Files Coverage Δ
datalad_fuse/fuse_.py 26.31% <0.00%> (-0.55%) ⬇️
datalad_fuse/tests/conftest.py 93.13% <100.00%> (+0.50%) ⬆️
datalad_fuse/tests/test_fuse.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jwodder
Copy link
Member Author

jwodder commented Nov 21, 2022

@yarikoptic Test added and passes in at least one run.

@yarikoptic
Copy link
Member

and seems to fail on another :-/ but seems reminiscent of that issue addressed by atomic writing at fsspec level
=================================== FAILURES ===================================
251
_____________________________ test_parallel_access _____________________________
252

253
tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/test_parallel_access0')
254
big_url_dataset = (Dataset('/tmp/pytest-of-runner/pytest-0/big_url_dataset0/ds'), {'APL.pdf': 'c65ccc2a97cdb6042641847112dc6e4d4d6e75fde...ff9197', 'libpython3.10-stdlib_3.10.4-3_i386.deb': 'e79c1416ec792b61ad9770f855bf6889e57be5f6511ea814d81ef5f9b1a3eec9'})
255

256
    def test_parallel_access(tmp_path, big_url_dataset):
257
        ds, data_files = big_url_dataset
258
        with fusing(ds.path, tmp_path) as mount:
259
            with ThreadPoolExecutor() as pool:
260
                futures = {
261
                    pool.submit(sha256_file, mount / path): dgst
262
                    for path, dgst in data_files.items()
263
                }
264
                for fut in as_completed(futures.keys()):
265
>                   assert fut.result() == futures[fut]
266

267
datalad_fuse/tests/test_fuse.py:247: 
268
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
269
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/concurrent/futures/_base.py:425: in result
270
    return self.__get_result()
271
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/concurrent/futures/_base.py:384: in __get_result
272
    raise self._exception
273
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/concurrent/futures/thread.py:56: in run
274
    result = self.fn(*self.args, **self.kwargs)
275
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
276

277
path = PosixPath('/tmp/pytest-of-runner/pytest-0/test_parallel_access0/gameboy.pdf')
278

279
    def sha256_file(path):
280
        dgst = hashlib.sha256()
281
>       with open(path, "rb") as fp:
282
E       OSError: [Errno 34] Numerical result out of range: '/tmp/pytest-of-runner/pytest-0/test_parallel_access0/gameboy.pdf'
283

284
datalad_fuse/tests/test_fuse.py:252: OSError
285
---------------------------- Captured stderr setup -----------------------------
286
[INFO] Creating a new annex repo at /tmp/pytest-of-runner/pytest-0/big_url_dataset0/ds 
287
------------------------------ Captured log setup ------------------------------
288
INFO     datalad.core.local.create:create.py:494 Creating a new annex repo at /tmp/pytest-of-runner/pytest-0/big_url_dataset0/ds
289
----------------------------- Captured stdout call -----------------------------
290
fusefs(ok): /tmp/pytest-of-runner/pytest-0/test_parallel_access0
291
----------------------------- Captured stderr call -----------------------------
292
fuse: bad error value: 1
293
/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version!
294
  RequestsDependencyWarning)
295
Traceback (most recent call last):
296
  File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/fuse.py", line 734, in _wrapper
297
    return func(*args, **kwargs) or 0
298
  File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/fuse.py", line 835, in open
299
    fi.flags)
300
  File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/datalad_fuse/fuse_.py", line 75, in __call__
301
    return super(DataLadFUSE, self).__call__(op, self.root + path, *args)
302
  File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/fuse.py", line 1076, in __call__
303
    return getattr(self, op)(*args)
304
  File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/datalad_fuse/fuse_.py", line 206, in open
305
    fsspec_file = self._adapter.open(path)
306
  File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/datalad_fuse/fsspec.py", line 243, in open
307
    return dsap.open(relpath, mode=mode, encoding=encoding, errors=errors)
308
  File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/datalad_fuse/fsspec.py", line 183, in open
309
    raise IOError(f"Could not find a usable URL for {relpath}")
310
OSError: Could not find a usable URL for gameboy.pdf
311

312
During handling of the above exception, another exception occurred:
313

314
Traceback (most recent call last):
315
  File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/fuse.py", line 737, in _wrapper
316
    if e.errno > 0:
317
TypeError: '>' not supported between instances of 'NoneType' and 'int'
318

319
During handling of the above exception, another exception occurred:
320

321
Traceback (most recent call last):
322
  File "_ctypes/callbacks.c", line 234, in 'calling callback function'
323
  File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/fuse.py", line 756, in _wrapper
324
    self.__critical_exception = e
325
NameError: name 'self' is not defined
326
Warning:  Destroying fsspecs and collection of 0 fhs 

@jwodder
Copy link
Member Author

jwodder commented Nov 21, 2022

@yarikoptic The root error seems to be that one of the file URLs temporarily failed, and that triggered a bug in fusepy. It has nothing to do with fsspec's atomicity.

@@ -150,7 +150,8 @@ def getattr(self, path, fh=None):
# We should just fabricate stats from the key here or not even
# bother???!
lgr.debug("File not already open")
fsspec_file = self._adapter.open(path)
with self.rwlock:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move locking down into _adapter and make it repository specific since we have caches per repository?! then outside code would be able to parallelize across repositories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yarikoptic No, as we also need to lock whenever using a pseudo-filehandle returned from _adapter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we might need a helper to return the appropriate (repository specific) lock for a given path to be used.

@yarikoptic
Copy link
Member

@yarikoptic The root error seems to be that one of the file URLs temporarily failed, and that triggered a bug in fusepy. It has nothing to do with fsspec's atomicity.

I've reran the failing test ... it passed, let's proceed. Filed dedicated #87 for a possible improvement

@jwodder jwodder marked this pull request as ready for review November 23, 2022 18:32
@yarikoptic yarikoptic merged commit 5421d3a into master Nov 23, 2022
@yarikoptic yarikoptic deleted the fuse-lock branch November 23, 2022 18:32
jwodder added a commit that referenced this pull request Dec 5, 2022
This reverts commit 5421d3a, reversing
changes made to 2ec7ad2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Increment the patch version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants