-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add locks around fsspec operations #83
Conversation
Sine works -- let's finalize/merge/release. Ideally: could you add a test which would operate in multiple threads over some limited set of files , so we would double test that all works good? (or it would need also fsspec/filesystem_spec#1111 to work?) |
Codecov ReportBase: 70.69% // Head: 71.06% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #83 +/- ##
==========================================
+ Coverage 70.69% 71.06% +0.36%
==========================================
Files 11 11
Lines 761 788 +27
==========================================
+ Hits 538 560 +22
- Misses 223 228 +5
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@yarikoptic Test added and passes in at least one run. |
and seems to fail on another :-/ but seems reminiscent of that issue addressed by atomic writing at fsspec level=================================== FAILURES ===================================
251
_____________________________ test_parallel_access _____________________________
252
253
tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/test_parallel_access0')
254
big_url_dataset = (Dataset('/tmp/pytest-of-runner/pytest-0/big_url_dataset0/ds'), {'APL.pdf': 'c65ccc2a97cdb6042641847112dc6e4d4d6e75fde...ff9197', 'libpython3.10-stdlib_3.10.4-3_i386.deb': 'e79c1416ec792b61ad9770f855bf6889e57be5f6511ea814d81ef5f9b1a3eec9'})
255
256
def test_parallel_access(tmp_path, big_url_dataset):
257
ds, data_files = big_url_dataset
258
with fusing(ds.path, tmp_path) as mount:
259
with ThreadPoolExecutor() as pool:
260
futures = {
261
pool.submit(sha256_file, mount / path): dgst
262
for path, dgst in data_files.items()
263
}
264
for fut in as_completed(futures.keys()):
265
> assert fut.result() == futures[fut]
266
267
datalad_fuse/tests/test_fuse.py:247:
268
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
269
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/concurrent/futures/_base.py:425: in result
270
return self.__get_result()
271
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/concurrent/futures/_base.py:384: in __get_result
272
raise self._exception
273
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/concurrent/futures/thread.py:56: in run
274
result = self.fn(*self.args, **self.kwargs)
275
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
276
277
path = PosixPath('/tmp/pytest-of-runner/pytest-0/test_parallel_access0/gameboy.pdf')
278
279
def sha256_file(path):
280
dgst = hashlib.sha256()
281
> with open(path, "rb") as fp:
282
E OSError: [Errno 34] Numerical result out of range: '/tmp/pytest-of-runner/pytest-0/test_parallel_access0/gameboy.pdf'
283
284
datalad_fuse/tests/test_fuse.py:252: OSError
285
---------------------------- Captured stderr setup -----------------------------
286
[INFO] Creating a new annex repo at /tmp/pytest-of-runner/pytest-0/big_url_dataset0/ds
287
------------------------------ Captured log setup ------------------------------
288
INFO datalad.core.local.create:create.py:494 Creating a new annex repo at /tmp/pytest-of-runner/pytest-0/big_url_dataset0/ds
289
----------------------------- Captured stdout call -----------------------------
290
fusefs(ok): /tmp/pytest-of-runner/pytest-0/test_parallel_access0
291
----------------------------- Captured stderr call -----------------------------
292
fuse: bad error value: 1
293
/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version!
294
RequestsDependencyWarning)
295
Traceback (most recent call last):
296
File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/fuse.py", line 734, in _wrapper
297
return func(*args, **kwargs) or 0
298
File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/fuse.py", line 835, in open
299
fi.flags)
300
File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/datalad_fuse/fuse_.py", line 75, in __call__
301
return super(DataLadFUSE, self).__call__(op, self.root + path, *args)
302
File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/fuse.py", line 1076, in __call__
303
return getattr(self, op)(*args)
304
File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/datalad_fuse/fuse_.py", line 206, in open
305
fsspec_file = self._adapter.open(path)
306
File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/datalad_fuse/fsspec.py", line 243, in open
307
return dsap.open(relpath, mode=mode, encoding=encoding, errors=errors)
308
File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/datalad_fuse/fsspec.py", line 183, in open
309
raise IOError(f"Could not find a usable URL for {relpath}")
310
OSError: Could not find a usable URL for gameboy.pdf
311
312
During handling of the above exception, another exception occurred:
313
314
Traceback (most recent call last):
315
File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/fuse.py", line 737, in _wrapper
316
if e.errno > 0:
317
TypeError: '>' not supported between instances of 'NoneType' and 'int'
318
319
During handling of the above exception, another exception occurred:
320
321
Traceback (most recent call last):
322
File "_ctypes/callbacks.c", line 234, in 'calling callback function'
323
File "/home/runner/work/datalad-fuse/datalad-fuse/.tox/py3/lib/python3.6/site-packages/fuse.py", line 756, in _wrapper
324
self.__critical_exception = e
325
NameError: name 'self' is not defined
326
Warning: Destroying fsspecs and collection of 0 fhs |
@yarikoptic The root error seems to be that one of the file URLs temporarily failed, and that triggered a bug in fusepy. It has nothing to do with fsspec's atomicity. |
@@ -150,7 +150,8 @@ def getattr(self, path, fh=None): | |||
# We should just fabricate stats from the key here or not even | |||
# bother???! | |||
lgr.debug("File not already open") | |||
fsspec_file = self._adapter.open(path) | |||
with self.rwlock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move locking down into _adapter
and make it repository specific since we have caches per repository?! then outside code would be able to parallelize across repositories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yarikoptic No, as we also need to lock whenever using a pseudo-filehandle returned from _adapter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we might need a helper to return the appropriate (repository specific) lock for a given path to be used.
I've reran the failing test ... it passed, let's proceed. Filed dedicated #87 for a possible improvement |
No description provided.