Skip to content

Fix stale locks (#56) and add __repr__/__str__ to public classes (#21)#63

Merged
rfrenchseti merged 8 commits intomainfrom
fix/issues-12-56-stale-locks-and-exceptions
Apr 10, 2026
Merged

Fix stale locks (#56) and add __repr__/__str__ to public classes (#21)#63
rfrenchseti merged 8 commits intomainfrom
fix/issues-12-56-stale-locks-and-exceptions

Conversation

@rfrenchseti
Copy link
Copy Markdown
Collaborator

@rfrenchseti rfrenchseti commented Apr 10, 2026

Summary

Fixes #56 (stale lock detection) and fixes #21 (__repr__/__str__ for public classes). Also fixes a pre-existing bug where negative lock_timeout (meaning "never time out") would immediately trigger TimeoutError.


Issue #56 — Stale lock detection in _retrieve_multi_locked

When a process holding a file-lock crashes, the POSIX flock is automatically released by the kernel, but the lock file on disk remains. The previous wait_to_appear loop only tested lock_path.is_file() and would wait forever if the owner died without cleaning up its lock file.

Fix: Each iteration of the polling loop now attempts a non-blocking filelock.FileLock(lock_path, timeout=0).acquire(). If the acquire succeeds the lock file is stale — the current process steals ownership, groups stale items by source, and downloads them in parallel via retrieve_multi. If acquire fails (the original owner is still alive), the path goes back onto wait_to_appear for the next iteration.

Race-condition guarantee: fcntl.flock(LOCK_EX|LOCK_NB) is a single atomic kernel syscall. If two processes simultaneously detect a stale lock and both call acquire(timeout=0), exactly one wins and the other gets filelock._error.Timeout — there is no window where both believe they hold the lock. The winner downloads to a temp file and renames it into place atomically; the loser finds the completed file on its next poll cycle.

_retrieve_single (single-file locked download) was already correct — the filelock.FileLock(timeout=...) loop will eventually acquire the freed flock after a crash, with no code changes needed.

New tests (tests/test_file_cache.py):

  • test_stale_lock_single — creates a stale lock via raw fcntl.flock, verifies retrieval succeeds.
  • test_stale_lock_multi — background thread holds an flock; retrieve enters wait_to_appear; thread releases flock (simulating a crash without cleanup); verifies download completes. Uses a state-based barrier (polls download_counter) instead of time.sleep.
  • test_stale_lock_multi_pfx — same scenario exercised through the FCPath interface.
  • All three are skipped on Windows (no fcntl).

Bonus fix: The wait_to_appear timeout check now guards against negative lock_timeout (lock_timeout >= 0 and ...), which previously caused immediate TimeoutError instead of waiting indefinitely as documented.


Issue #21__repr__ and __str__ for all public classes

Adds or improves __repr__ and __str__ on every public (non-internal) class:

Class __repr__ __str__
FileCache Shows cache_name, anonymous, lock_timeout, nthreads Cache directory path
FCPath Path plus non-default options (filecache, anonymous, lock_timeout, nthreads, url_to_url, url_to_path) The path string (unchanged)
FileCacheSource (base) ClassName(scheme, remote, anonymous=...) — inherited by all five subclasses Source prefix URI

New tests: test_filecache_repr_str (in test_file_cache.py), test_fcpath_repr_str (in test_file_cache_path.py), test_filecachesource_repr_str (in test_file_cache_source.py).

Test plan

  • All 406 tests pass (pytest tests/ -x -q)
  • Stale-lock tests verified on Linux (skipped on Windows)
  • __repr__/__str__ tested for all five source subclasses, FileCache, and FCPath
  • flake8 and mypy clean

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Added stale-lock detection and recovery during multi-file retrieval; changed upload-failure message composition; introduced repr/str for FileCache, FileCacheSource, and FCPath; added POSIX-only stale-lock and repr tests; removed an extra blank line in filecache/__init__.py.

Changes

Cohort / File(s) Summary
Stale-lock recovery & locking
filecache/file_cache.py
Extended _retrieve_multi_locked wait-list to include source/sub_path; added non-blocking lock acquire to detect/steal stale lock files and immediately re-run retrieval for stolen locks; adjusted polling/timeout handling.
Upload error behavior
filecache/file_cache.py
Reworked _upload_multi exception_on_fail logic to raise FileNotFoundError with clearer, conditional message composition depending on files_failed and files_not_exist.
String representations & FCPath repr
filecache/file_cache.py, filecache/file_cache_source.py, filecache/file_cache_path.py, filecache/__init__.py
Added FileCache.__repr__ and FileCache.__str__; added FileCacheSource.__repr__ and __str__; expanded FCPath.__repr__ to include conditional internal options (filecache, anonymous, lock_timeout, nthreads, url_to_url, url_to_path); removed an extra blank line in __init__.py.
Tests: stale-lock, FCPath and source repr/str
tests/test_file_cache.py, tests/test_file_cache_path.py, tests/test_file_cache_source.py
Added POSIX-only stale-lock simulation tests using fcntl and background threads/events for single/multi retrieval and FCPath interface; added tests for FileCache, FCPath, and FileCacheSource repr/str; imported threading in tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the two main changes: fixing stale locks (issue #56) and adding repr/str methods (issue #21) to public classes.
Description check ✅ Passed The pull request description is comprehensive and follows the template structure with fixed issues, detailed summaries of changes, and test plans.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 87.69231% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.73%. Comparing base (3ecac5a) to head (160f360).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
filecache/file_cache.py 82.97% 5 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #63      +/-   ##
==========================================
+ Coverage   89.55%   89.73%   +0.17%     
==========================================
  Files           5        5              
  Lines        2413     2465      +52     
  Branches      507      518      +11     
==========================================
+ Hits         2161     2212      +51     
- Misses        149      150       +1     
  Partials      103      103              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@filecache/file_cache_source.py`:
- Around line 1296-1301: The blanket except around blob.upload_from_filename
hides a FileNotFoundError and incorrectly converts it to UploadFailed; update
the handler in the upload call(s) (the blob.upload_from_filename call sites) so
that if the caught exception is a FileNotFoundError it is re-raised unchanged,
otherwise wrap/raise an UploadFailed with the existing message that includes
self._src_prefix_ and sub_path; apply the same change to the other identical
upload block at the later location mentioned (around the 1627-1636 block).

In `@filecache/file_cache.py`:
- Around line 1973-1981: The docstring for the public FileCache.upload() is out
of sync: the implementation can raise UploadFailed in addition to
FileNotFoundError; update the FileCache.upload() docstring to document that
FileNotFoundError is raised when local files are missing and UploadFailed is
raised when remote upload(s) fail (include that both can be raised together in
the error message logic that combines files_not_exist and files_failed), so
callers know to catch UploadFailed as well as FileNotFoundError.

In `@tests/test_file_cache.py`:
- Around line 1155-1159: Replace the brittle time.sleep(0.3) with a state-based
barrier: instead of sleeping, wait for an observable condition that proves
retrieve() has reached the stale-lock wait path (e.g. a threading.Event or flag
set when the code enters wait_to_appear, or by polling until retrieve_thread is
alive AND the other files have finished downloading), then call
flock_release.set(); specifically modify the test around
retrieve_thread/retrieve() and flock_release so the test waits on that
event/condition (or polls for “all other files finished while retrieve_thread is
still alive”) before releasing the flock to reliably exercise the stale-lock
branch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 19a6fbc5-2a9f-495a-9208-fc19978392c4

📥 Commits

Reviewing files that changed from the base of the PR and between 3ecac5a and 958bcd3.

📒 Files selected for processing (5)
  • filecache/__init__.py
  • filecache/exceptions.py
  • filecache/file_cache.py
  • filecache/file_cache_source.py
  • tests/test_file_cache.py

Reverts the #12 upload-exception changes (removes UploadFailed /
FileCacheError, restores native exceptions). Implements #21: adds
__repr__ and __str__ to FileCache, enhances FCPath.__repr__ to include
non-default options (filecache, anonymous, lock_timeout, nthreads), and
adds __repr__/__str__ to FileCacheSource and all subclasses via the base
class.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rfrenchseti rfrenchseti changed the title Fix stale locks and generalize upload exceptions (#56, #12) Fix stale locks (#56) and add __repr__/__str__ to public classes (#21) Apr 10, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
tests/test_file_cache.py (1)

1155-1159: ⚠️ Potential issue | 🟠 Major

Replace the fixed sleep with a state-based barrier.

time.sleep(0.3) still doesn't prove the retrieve thread has entered the stale-lock wait path, so these tests can miss the regression or flake on slower/faster CI. Wait for an observable state before calling flock_release.set(), such as “all other files finished downloading while retrieve_thread is still alive”.

Suggested direction
-        # Give the retrieve thread time to enter the wait_to_appear loop.
-        time.sleep(0.3)
+        deadline = time.monotonic() + 5
+        while time.monotonic() < deadline:
+            if (fc.download_counter == len(EXPECTED_FILENAMES) - 1 and
+                    retrieve_thread.is_alive()):
+                break
+            time.sleep(0.01)
+        else:
+            pytest.fail('retrieve never reached the blocked/stale-lock state')

         # Signal the holder to "crash" -- releases flock, keeps the lock file.
         flock_release.set()

Also applies to: 1216-1218

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_file_cache.py` around lines 1155 - 1159, Replace the brittle
time.sleep(0.3) with a state-based barrier: wait (with timeout) until the
retrieve_thread is confirmed to be in the stale-lock wait path and the other
downloads have completed before calling flock_release.set(). For example, poll
or wait on a test Event/Condition (set by the code under test or by observing a
shared counter) that indicates "retrieve_thread is waiting" and that "other
files finished downloading" (use retrieve_thread.is_alive() plus a
downloaded-files counter or a retrieve_waiting_event) and only then call
flock_release.set(); keep a short timeout to fail fast if the state never
appears.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@filecache/file_cache_path.py`:
- Around line 541-551: FCPath.__repr__ currently omits the translator hooks, so
objects with different url_to_url or url_to_path print identically; update
FCPath.__repr__ to include these attributes when present by appending entries
like f'url_to_url={self._url_to_url!r}' and f'url_to_path={self._url_to_path!r}'
to the parts list (guarded by checks for None just like the other fields) so the
returned string reflects the translator hooks' presence and values.

In `@filecache/file_cache_source.py`:
- Around line 76-78: The __repr__ implementation in FileCacheSource uses
self._src_prefix but the class (and its subclasses) are constructed with
(scheme, remote, *, anonymous=...), so update FileCacheSource.__repr__ to emit a
constructor-like representation using self._scheme and self._remote (and keep
the anonymous flag from self._anonymous) instead of self._src_prefix; ensure the
string format matches other constructors by including scheme and remote as the
first two quoted args and anonymous=<bool> as the kwarg.

In `@filecache/file_cache.py`:
- Around line 421-425: The __repr__ currently prints self._cache_dir.name (which
includes the internal "_filecache_" prefix) causing misleading reconstructions;
update FileCache.__repr__ to display the user-provided cache name attribute
(e.g., self._cache_name) instead of self._cache_dir.name and explicitly show
None when no name was provided so repr round-trips correctly, or if
self._cache_name doesn't exist use the original user-facing name field rather
than the generated directory name.
- Around line 1981-1989: The code currently raises FileNotFoundError for both
missing local files and provider upload failures, losing distinction; update the
branch in file_cache.py so that missing local files still raise
FileNotFoundError (when files_not_exist and not files_failed) but when
files_failed is non-empty raise a distinct exception (e.g., UploadFailedError or
ProviderUploadError) with the exc_str and any underlying provider exception
info; if such a custom exception class doesn’t exist, add one (e.g., class
UploadFailedError(Exception)) and use it in the block that builds exc_str (and
include files_not_exist text only as part of the UploadFailedError message when
both exist).

---

Duplicate comments:
In `@tests/test_file_cache.py`:
- Around line 1155-1159: Replace the brittle time.sleep(0.3) with a state-based
barrier: wait (with timeout) until the retrieve_thread is confirmed to be in the
stale-lock wait path and the other downloads have completed before calling
flock_release.set(). For example, poll or wait on a test Event/Condition (set by
the code under test or by observing a shared counter) that indicates
"retrieve_thread is waiting" and that "other files finished downloading" (use
retrieve_thread.is_alive() plus a downloaded-files counter or a
retrieve_waiting_event) and only then call flock_release.set(); keep a short
timeout to fail fast if the state never appears.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9425fc62-431c-49cf-81f0-c6d06ff27e3e

📥 Commits

Reviewing files that changed from the base of the PR and between 958bcd3 and e954784.

📒 Files selected for processing (5)
  • filecache/__init__.py
  • filecache/file_cache.py
  • filecache/file_cache_path.py
  • filecache/file_cache_source.py
  • tests/test_file_cache.py
💤 Files with no reviewable changes (1)
  • filecache/init.py

rfrenchseti and others added 2 commits April 10, 2026 13:11
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FileCache.__repr__: store _cache_name and show it directly so
  repr(FileCache(None)) prints None and repr(FileCache('global'))
  prints 'global', not the internal _filecache_-prefixed dir name
- FileCacheSource.__repr__: use (scheme, remote) to match the actual
  constructor signature instead of the derived src_prefix URI
- FCPath.__repr__: include url_to_url and url_to_path when non-None
  so two behaviourally different instances don't print identically
- test_stale_lock_multi/pfx: replace brittle time.sleep(0.3) with a
  state-based barrier that polls until download_counter reaches
  len(EXPECTED_FILENAMES)-1 and the retrieve thread is still alive,
  guaranteeing the thread is in wait_to_appear before the flock is released

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rfrenchseti
Copy link
Copy Markdown
Collaborator Author

Addressed the CodeRabbit review comments — details below.

Implemented:

  • FileCache.__repr__ — wrong cache name (minor): Now stores self._cache_name and uses it directly, so FileCache(None, ...) and FileCache("global", ...) round-trip correctly instead of showing the internal _filecache_-prefixed directory name.

  • FileCacheSource.__repr__ — constructor mismatch (minor): Changed from ClassName("gs://bucket", ...) to ClassName("gs", "bucket", ...) to match the actual two-argument constructor signature.

  • FCPath.__repr__ — missing translator hooks (trivial): url_to_url and url_to_path are now included in repr when non-None, so two behaviourally different FCPath instances can no longer print identically.

  • Stale-lock tests — fragile time.sleep(0.3) (major): Replaced with a state-based barrier that polls until fc.download_counter == len(EXPECTED_FILENAMES) - 1 and retrieve_thread.is_alive(). This guarantees the retrieve thread is blocked in wait_to_appear before the flock is released, rather than relying on a timing assumption.

Not implemented (no longer relevant):

The remaining comments (about UploadFailed in docstrings, FileNotFoundError vs UploadFailed in _upload_multi, and the TOCTOU FileNotFoundError-preservation fix in GS/S3 upload methods) all refer to the UploadFailed exception infrastructure that was part of issue #12. Since this PR now addresses #21 (not #12), that code was reverted and those comments no longer apply to the current diff.

Stale-lock items are now grouped by source and downloaded in parallel
using source.retrieve_multi(nthreads=nthreads) instead of sequentially
calling source.retrieve() in a loop. Locks are released after the
parallel batch completes, matching the pattern used by the initial
multi-file download path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@filecache/file_cache.py`:
- Around line 1659-1674: The wait loop incorrectly treats negative lock_timeout
as a timeout because it unconditionally compares elapsed > lock_timeout; update
the loop logic (the section setting timed_out and the condition that checks
elapsed > lock_timeout) to only perform the elapsed > lock_timeout check when
lock_timeout is non-negative (e.g., lock_timeout >= 0), so negative values mean
“never time out”; apply the same change to the analogous check around lines
1743-1745 to ensure both wait loops honor negative lock_timeout.

In `@tests/test_file_cache.py`:
- Around line 1137-1139: The call to flock_held.wait() in the test's setup can
hang if the holder_thread dies before setting the event; change the unbounded
wait to use a timeout (e.g. flock_held.wait(timeout=...) matching the later
thread.join() timeouts) so the test fails fast instead of blocking indefinitely;
apply this same change for the second occurrence in test_stale_lock_multi_pfx()
and keep the timeout value consistent with the existing join() timeouts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c24af9db-44c9-4968-88e1-41c05860fb75

📥 Commits

Reviewing files that changed from the base of the PR and between fe5bf9d and c44d8d7.

📒 Files selected for processing (4)
  • filecache/file_cache.py
  • filecache/file_cache_path.py
  • filecache/file_cache_source.py
  • tests/test_file_cache.py

Comment on lines +1659 to 1674
# In each iteration we also check for stale locks: a stale lock exists when the
# lock file is present but the process that created it has died (the OS released
# the advisory flock, but did not delete the file). We detect a stale lock by
# attempting a non-blocking acquire -- if it succeeds, no live process holds the
# lock, so we steal it and initiate the download ourselves.
#
# Race-condition guarantee: multiple waiting processes may all notice the same
# stale lock and all attempt `lock.acquire(timeout=0)` simultaneously. Because
# the underlying flock(2) call is atomic, exactly one process wins the race; the
# others receive a Timeout and remain in the wait list. The winner downloads the
# file atomically (write to temp path + rename), so the losers will find the
# completed file on their next poll iteration.
#
# If the lock file disappears without the destination file appearing, that means
# the other process failed (or cleaned up after itself on error).
timed_out = False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Honor negative lock_timeout in the wait loop.

The public contract says a negative timeout means “never time out,” but this loop still checks elapsed > lock_timeout. For any negative value, that becomes true on the first poll, so locked multi-file retrieves can raise TimeoutError immediately instead of waiting indefinitely.

Proposed fix
-            if time.time() - start_time > lock_timeout:
+            if lock_timeout >= 0 and time.time() - start_time > lock_timeout:
                 exc = TimeoutError(
                     'Timeout while waiting for another process to finish downloading')

Also applies to: 1743-1745

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@filecache/file_cache.py` around lines 1659 - 1674, The wait loop incorrectly
treats negative lock_timeout as a timeout because it unconditionally compares
elapsed > lock_timeout; update the loop logic (the section setting timed_out and
the condition that checks elapsed > lock_timeout) to only perform the elapsed >
lock_timeout check when lock_timeout is non-negative (e.g., lock_timeout >= 0),
so negative values mean “never time out”; apply the same change to the analogous
check around lines 1743-1745 to ensure both wait loops honor negative
lock_timeout.

rfrenchseti and others added 2 commits April 10, 2026 15:08
- test_fcpath_repr_str → tests/test_file_cache_path.py
- test_filecachesource_repr_str → tests/test_file_cache_source.py
- test_filecache_repr_str stays in tests/test_file_cache.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
tests/test_file_cache.py (1)

1137-1139: ⚠️ Potential issue | 🟡 Minor

Bound the initial flock_held.wait() calls.

If holder_thread() fails before signaling the event, both tests hang the suite instead of failing diagnostically. Use the same kind of timeout you already apply to the later join() calls.

Suggested fix
-        flock_held.wait()  # make sure the flock is held before we start retrieve
+        assert flock_held.wait(timeout=5), 'holder thread never acquired flock'
@@
-        flock_held.wait()
+        assert flock_held.wait(timeout=5), 'holder thread never acquired flock'

Also applies to: 1207-1209

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_file_cache.py` around lines 1137 - 1139, The test can hang if
holder_thread() never signals flock_held; update the two places that call
flock_held.wait() (the ones after starting the Thread that runs holder_thread)
to use a bounded wait with a timeout (same pattern used for the later
t.join(timeout=...) calls), and assert/raise a clear failure if the wait returns
False so the test fails fast; locate the calls by looking for flock_held.wait()
immediately after creating/starting the Thread targeting holder_thread.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_file_cache_path.py`:
- Around line 1313-1331: Add assertions to test_fcpath_repr_str to exercise
FCPath.__repr__ branches that include translator hooks: construct FCPath
instances with the new url_to_url and url_to_path keyword fields (e.g.,
FCPath(..., url_to_url=some_callable, url_to_path=some_callable)) and verify
their repr includes 'url_to_url=' and 'url_to_path=' (and any identifying
callable/name), and also test when both are None they do not appear; reference
FCPath and its __repr__ behavior and the url_to_url / url_to_path attributes in
file_cache_path.py to locate where to add these extra assertions.

In `@tests/test_file_cache_source.py`:
- Around line 409-435: Add a direct test for the base class FileCacheSource to
cover its __repr__ and __str__ implementations: instantiate FileCacheSource with
appropriate constructor args (matching how subclasses are constructed in the
file, e.g., scheme and bucket and anonymous flag), assert
repr(FileCacheSource(...)) matches the expected "FileCacheSource('scheme',
'bucket', anonymous=...)" string and assert str(FileCacheSource(...)) returns
the expected URI string (e.g., "scheme://bucket" or "file://" for empty bucket);
reference the FileCacheSource class and its __repr__/__str__ methods when adding
the new assertion to the existing test_filecachesource_repr_str test.

---

Duplicate comments:
In `@tests/test_file_cache.py`:
- Around line 1137-1139: The test can hang if holder_thread() never signals
flock_held; update the two places that call flock_held.wait() (the ones after
starting the Thread that runs holder_thread) to use a bounded wait with a
timeout (same pattern used for the later t.join(timeout=...) calls), and
assert/raise a clear failure if the wait returns False so the test fails fast;
locate the calls by looking for flock_held.wait() immediately after
creating/starting the Thread targeting holder_thread.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 605a976b-9f3b-42fe-ae37-37874ca9d711

📥 Commits

Reviewing files that changed from the base of the PR and between c44d8d7 and 0698009.

📒 Files selected for processing (3)
  • tests/test_file_cache.py
  • tests/test_file_cache_path.py
  • tests/test_file_cache_source.py

Comment on lines +409 to +435
def test_filecachesource_repr_str():
src_gs = FileCacheSourceGS('gs', 'rms-filecache-tests', anonymous=True)
r = repr(src_gs)
assert r == "FileCacheSourceGS('gs', 'rms-filecache-tests', anonymous=True)"
assert str(src_gs) == 'gs://rms-filecache-tests'

src_s3 = FileCacheSourceS3('s3', 'rms-filecache-tests', anonymous=True)
r = repr(src_s3)
assert r == "FileCacheSourceS3('s3', 'rms-filecache-tests', anonymous=True)"
assert str(src_s3) == 's3://rms-filecache-tests'

src_http = FileCacheSourceHTTP('https', 'storage.googleapis.com',
anonymous=False)
r = repr(src_http)
assert r == ("FileCacheSourceHTTP('https', 'storage.googleapis.com', "
"anonymous=False)")
assert str(src_http) == 'https://storage.googleapis.com'

src_file = FileCacheSourceFile('file', '', anonymous=False)
r = repr(src_file)
assert r == "FileCacheSourceFile('file', '', anonymous=False)"
assert str(src_file) == 'file://'

src_fake = FileCacheSourceFake('fake', 'fake-bucket', anonymous=False)
r = repr(src_fake)
assert r == "FileCacheSourceFake('fake', 'fake-bucket', anonymous=False)"
assert str(src_fake) == 'fake://fake-bucket'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add a direct FileCacheSource repr/str assertion.

This only covers subclasses, but the PR scope also includes the public base FileCacheSource. Since filecache/file_cache_source.py:75-80 defines __repr__ / __str__ there, one direct base-class case would keep that contract covered too.

Suggested test addition
 def test_filecachesource_repr_str():
+    base = FileCacheSource('custom', 'remote', anonymous=False)
+    assert repr(base) == "FileCacheSource('custom', 'remote', anonymous=False)"
+    assert str(base) == 'custom://remote'
+
     src_gs = FileCacheSourceGS('gs', 'rms-filecache-tests', anonymous=True)
     r = repr(src_gs)
     assert r == "FileCacheSourceGS('gs', 'rms-filecache-tests', anonymous=True)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_file_cache_source.py` around lines 409 - 435, Add a direct test
for the base class FileCacheSource to cover its __repr__ and __str__
implementations: instantiate FileCacheSource with appropriate constructor args
(matching how subclasses are constructed in the file, e.g., scheme and bucket
and anonymous flag), assert repr(FileCacheSource(...)) matches the expected
"FileCacheSource('scheme', 'bucket', anonymous=...)" string and assert
str(FileCacheSource(...)) returns the expected URI string (e.g.,
"scheme://bucket" or "file://" for empty bucket); reference the FileCacheSource
class and its __repr__/__str__ methods when adding the new assertion to the
existing test_filecachesource_repr_str test.

- Fix pre-existing bug: negative lock_timeout (meaning "never time out")
  was immediately triggering TimeoutError because elapsed > negative is
  always true. Now guarded with lock_timeout >= 0.
- Bound flock_held.wait() with timeout=5 in stale lock tests so they
  fail fast instead of hanging if the holder thread dies.
- Add url_to_url/url_to_path coverage to test_fcpath_repr_str.
- Skip base FileCacheSource repr test (it's an ABC, can't instantiate).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rfrenchseti
Copy link
Copy Markdown
Collaborator Author

Addressed the second round of CodeRabbit review comments:

Implemented:

  • Negative lock_timeout bug (major): The wait_to_appear timeout check time.time() - start_time > lock_timeout is always true when lock_timeout is negative, causing immediate TimeoutError instead of "never time out" as documented. Fixed by guarding with lock_timeout >= 0 and .... This was a pre-existing bug, not introduced by this PR.

  • Bounded flock_held.wait() (minor): Added timeout=5 to the flock_held.wait() calls in both stale lock tests so they fail fast with an assertion error instead of hanging forever if the holder thread dies unexpectedly.

  • url_to_url/url_to_path test coverage (trivial): Added assertions to test_fcpath_repr_str that construct an FCPath with translator hooks and verify they appear in repr().

Not implemented:

  • Direct FileCacheSource base class repr test: FileCacheSource is an ABC with abstract methods — it cannot be instantiated directly. The five subclass tests already exercise the inherited __repr__/__str__.

@rfrenchseti rfrenchseti merged commit f3a705e into main Apr 10, 2026
15 of 16 checks passed
@rfrenchseti rfrenchseti deleted the fix/issues-12-56-stale-locks-and-exceptions branch April 10, 2026 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Programs can leave behind locks that never go away Add __str__ and __repr__ to all classes

1 participant