Skip to content

Anvil/fix adversarial review findings 1776030818#5

Merged
binhex merged 23 commits into
mainfrom
anvil/fix-adversarial-review-findings-1776030818
Apr 14, 2026
Merged

Anvil/fix adversarial review findings 1776030818#5
binhex merged 23 commits into
mainfrom
anvil/fix-adversarial-review-findings-1776030818

Conversation

@binhex
Copy link
Copy Markdown
Owner

@binhex binhex commented Apr 14, 2026

No description provided.

binhex and others added 23 commits April 13, 2026 16:15
- Add _DISC_NUM_RE and _TRACK_NUM_RE regexes to filter.py
- Add _leaf_name() module-level helper (was a local _leaf in pick_best_directory)
- Add _track_num_from_filename() to parse track numbers from filenames
- Add _has_contiguous_track_numbers() to check per-directory track sequences:
  only fires when >=75% of files carry numeric prefixes; albums without
  consistent numbering are passed through unchanged
- Add check_candidate_integrity() as the public entry point:
  - Disc gap: requires discs to start at 1 and be contiguous; a lone CD2
    is rejected because CD1 is missing
  - Track gap: per-directory check; detects disc-track prefixes (04-14
    correctly extracts track 14, not disc 4)
- Integrate check in cli.py main candidate loop: skip candidate on failure
- Integrate check in cli.py preferred-user fast path: fall back to full
  search on failure
- Add 22 unit tests covering: consecutive tracks, track gaps, EP gaps,
  unnumbered albums, sparse numbering, disc-track format, consecutive
  discs, disc gaps, lone CD2, per-disc track gaps (the Oasis case)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sort by upload speed / free-slot status is now unconditional.
Removed the CLI toggle and the sort_by_speed parameter from
build_candidates() — there is no scenario where not sorting
by speed is desirable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace paired boolean flags with single override-only flags:
- --exclude-locked/--include-locked → --include-locked (excludes locked by default)
- --clear-completed/--no-clear-completed → --clear-completed
- --strip-tags/--no-strip-tags → --strip-tags

For --include-locked, the Click is_flag+flag_value+default combo does not
work as intended (Click ignores explicit default=True when is_flag=True).
Fixed by renaming the function param to include_locked and computing
exclude_locked = not include_locked in the function body.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Both directories existed as git-tracked .gitignore placeholders from the
original template. The app now writes all runtime files (logs, database)
to ~/.seakarr/ — keeping these dirs in the repo root was misleading.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Default changes from 30 minutes to 300 seconds (5 min). The inactivity check and absolute ceiling (3x) now operate in seconds throughout. Removes the /60 division in the monitor loop; variable names updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Track Soulseek users who pass speed-check downloads in a new
preferred_users SQLite table (success_count, failure_count, last_seen_at).

On each run, the top 5 most reliable users are loaded from the DB and
probed first per album via user_browse() before falling back to a full
network search. All existing quality/content filters still apply; only
queue-length and speed-check are skipped (data not available from browse).

Ranking uses net reliability score (success_count - failure_count) so
users that start failing sink below more reliable alternatives.

- database.py: PreferredUser model + get_preferred_users / promote_user /
  record_user_failure methods (atomic upsert for promote)
- cli.py: replace session-only preferred_user with DB-backed list; loop
  over preferred users per album; persist promotions and failures
- tests/unit/test_database.py: 12 new tests covering all three methods

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously _apply_result would move any succeeded_files to the organised
directory even when the overall download was cancelled (speed check,
absolute ceiling) or timed out. This left orphaned partial albums in the
completed folder.

Add an early return at the top of _apply_result for res.timed_out and
res.cancelled. All three call sites already handle those cases in the
code that follows the _apply_result call, so no further changes needed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nload-time option

The previous absolute download ceiling was silently computed as download_timeout * 3,
making the effective limit invisible and surprising. Replace with a dedicated
--max-download-time CLI option (default 1800s) that users can see and control directly.

- download_and_monitor() gains a max_download_time: int parameter
- cli.py adds --max-download-time (default 1800s, shown in --help)
- --download-timeout help text clarified to describe inactivity semantics
- All three download_and_monitor call sites updated
- No hidden multipliers remain

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Default changes from 1800s to 30min. All three download_and_monitor() call sites
now multiply the user-supplied value by 60 before passing to the function, which
continues to operate in seconds internally. Metavar and help text updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Default changed from 30000 to 30 (30 seconds)
- Metavar updated to <seconds>
- Help text updated to reflect seconds
- Both network_search() call sites in cli.py multiply by 1000
- search.py internals unchanged (still operates in milliseconds)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…- 1 files

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lure

Previously the artist-only fallback (search by artist, filter by album name)
only ran when the initial artist+album search returned 0 responses.

This change also triggers the fallback when all download candidates are
exhausted (all failed, timed out, or speed-cancelled) — giving a second
chance via a broader search pool before giving up on an album.

Changes:
- tried_artist_fallback flag initialised after preferred-user phase
- Set True in the existing 0-results fallback to prevent duplicate search
- Filter+download block wrapped in while True: (at most 2 iterations)
- After total download failure, artist-only search is run and its results
  replace pseudo_responses; loop continues for a second filter+download pass
- API error during fallback search sets skip_recorded=True to suppress a
  misleading 'All candidates exhausted' DB entry
- downloaded initialised before while loop for robustness

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add filter_responses_by_query() to filter.py: rejects search responses
  whose file paths lack expected artist/album tokens, and rejects
  'Various Artists' compilations for named-artist queries.
- Fix _VA_PAT to also match bare 'Various' directories (not just
  'Various Artists').
- Apply filter in cli.py primary search path; when filter removes all
  results, immediately trigger artist-only fallback rather than keeping
  mismatched originals (which would defeat the filter).
- Apply VA/query rejection in both artist-only fallback paths
  (search-phase and download-phase) via same soft-fallback pattern.
- Add 17 unit tests covering correct layouts, VA rejection, album/artist
  token checks, and edge cases (bare Various, empty inputs, short tokens).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- _album_dir_component now skips noise-only leaf dirs (e.g. 'FLAC', 'MP3')
  in addition to Disc N subdirectories, so format-subdir layouts are
  transparent to the density check.
- album_counter (Counter) restored for multiplicity-aware presence check;
  albums like 'Talk Talk' now require the token to appear twice in the dir.
- Presence check runs on raw path tokens; density check runs on noise-stripped
  tokens.  When all tokens are noise/year (e.g. album '1984'), presence passing
  is sufficient — no density check needed.
- has_artist now requires ALL artist tokens to appear in the directory
  (previously any() — one short token like 'the' could falsely corroborate).
- Add 3 new tests: format subdir transparency, year-titled album, repeated
  token multiplicity enforcement.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sity bypass

- filter.py: add _strip_track_number, first_track_name_of, first_track_matches helpers
- filter.py: fix has_artist block — use combined artist+album density check (>= 0.5)
  instead of unconditional album_ok = True, preventing OST false-positive matches
- scanner.py: scan_library now returns 4-tuple (artist, album, count, first_track_name)
- cli.py: unpack 4-tuple in main loop; apply exact track-count (== not >=) and
  first-track fuzzy filter in both preferred-users and main candidate paths
- tests: 11 new unit tests for _strip_track_number, first_track_name_of,
  first_track_matches, and the OST density rejection fix

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…etches

- user_browse() accepts optional browse_cache dict keyed by username
- Only successful directory listings are cached (failures are not, to
  keep transient errors retryable within the same run)
- Filtering runs on every call against the cached data so different
  artist/album queries against the same user work correctly
- Added try/except around the filtering loop to prevent malformed
  directory entries crashing the entire CLI run
- cli.py initialises browse_cache once per run and passes it to both
  user_browse call sites (preferred-users path + search-user path)
- 5 unit tests cover: cache hit skips API, different users each fetched
  once, filtering still applied per-call, failures not cached (retried),
  and no-cache always calls API

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously 'Completed, Rejected' fell through the state machine
unhandled — the file sat in active indefinitely until the inactivity
timeout fired.  Now treated the same as Errored/Cancelled/TimedOut:
retried up to max_retries times, then abandoned.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add front_since dict: tracks when each file first reaches position 0
  in the remote user's upload queue (via get_queue_position API call)
- Queue timeout now fires only after front_since is set — files still
  waiting at e.g. position 40 are never incorrectly cancelled
- Initializing-state fallback arms front_since if we miss the position-0
  poll between cycles (Queued → Initializing transition)
- Clear front_since on retry re-enqueue so the timer restarts cleanly
- Update queue timeout log message and max_queue_time docstring
- Also carry user tweaks: --search-timeout default 60s, --max-queue-time
  default 60s

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lete paths

Three scenarios now handled:

1. Ctrl+C mid-download: KeyboardInterrupt handler in download_and_monitor()
   now returns a DownloadResult(interrupted=True) instead of re-raising, so
   _apply_result() gets control and can clean up slskd-completed files before
   re-raising KeyboardInterrupt for a clean process exit.

2. Partial failure (remote user goes offline mid-download): _apply_result()
   previously called organize_files() on the partial succeeded_files and then
   returned False, silently moving an incomplete album to the library with no
   DB record.  The not-success check is now moved BEFORE organize_files() so
   partial files are deleted from slskd_completed_path instead.

3. Timeout / cancel: cleanup is now called on these paths too (previously
   returned False with no filesystem cleanup).

Changes:
- download.py: add DownloadResult.interrupted field; return result on Ctrl+C
  instead of re-raising (slskd incomplete-dir cleanup still runs first)
- cli.py: add _delete_from_completed_path() module-level helper that removes
  files using the same safe_path_parts last-two-parts logic as _apply_result,
  then removes any empty parent dirs; restructure _apply_result with explicit
  priority: interrupted → cancelled/timed_out → partial → full success
- tests: test_completed_path_cleanup.py covers the new helper and the
  interrupted flag

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
F01 download.py: never fall back to pre-existing transfer ID in initial
discovery; only attach to genuinely new transfers to avoid monitoring
stale/concurrent sessions.

F02 download.py: clear seen_errored on failed-reattach cleanup path and
on late-discovery re-attach so retry budget and queue timers apply fresh
to the re-attached transfer.

F03 download.py: suppress inactivity timeout while any tracked transfer is
Queued/Initializing/Requested; previously queued files were incorrectly
cancelled before max_queue_time could govern them.

F04 cli.py: roll back partially-moved files on organize_files partial
failure to prevent mixed-source albums on the next candidate; make tag
stripping non-fatal (cosmetic step no longer blocks DB record).

F05 organizer.py: re-validate dest.parent after appending the sub-folder
in use_parent_prefix mode — a symlinked sub-dir could otherwise escape
completed_base.

F06 filter.py + scanner.py: add basename as deterministic tiebreaker in
first-track sort so equal track-number keys never depend on input order.

F07 filter.py: fix combined-artist density check to measure album density
against non-artist tokens only; previously artist tokens inflated the
ratio and admitted false positives like 'Seven Samurai' for 'Seven'.

F08 database.py: add artist_key/album_key columns (NFKC+casefold) to both
models; migration adds columns + backfills existing rows; is_downloaded and
is_skipped query by key column for Unicode-correct case-insensitive lookups.

F09 cli.py: apply filter_responses_by_query to preferred-user pseudo-
response before build_candidates so the preferred-user fast-path is
subject to the same album-density guard as the network search path.

F10 scanner.py: log tag-read failures at DEBUG instead of silently
discarding them; adds loguru module-level logger to scanner module.

F11 organizer.py: walk all ancestor directories up to completed_base for
empty-dir cleanup instead of only trying 2 levels deep.

F12 filter.py + scanner.py: extract shared _first_track_stem() helper
to eliminate duplicated first-track sorting logic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
R2-F02b download.py: clear ever_started on late-discovery re-attach so
queue-front timer (front_since) is correctly armed for retry transfers
that had previously entered InProgress.

R2-F09b cli.py: apply filter_responses_by_query to --search-user pseudo-
response (same density guard as the preferred-user fast path); skip if
filter rejects all files from the explicit user.

R2-F03b cli.py: always assign raw_responses = q_filtered in artist-only
fallback paths (both primary-empty and secondary-rejected branches);
removing the 'if q_filtered' guard means a density-rejected fallback
correctly yields no candidates instead of keeping mismatched results.

R2-F04b cli.py: on DB write failure after successful organize, log at
error level and return True instead of False; files are already on disk
so continuing to try more candidates would mix tracks from two sources.

R2-F08b database.py: backfill migration now selects rows where either
artist_key OR album_key is NULL (previously only artist_key IS NULL),
preventing partially-migrated rows from escaping the backfill.

R2-F01 filter.py: VA-clash check now uses check_artist (last 3 path
components) instead of check_album (last 2), matching the artist-token
check scope and catching 'Various Artists' in disc-subdivided paths
(e.g. root/Various Artists/Best Of James/CD1/track.flac).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@binhex binhex merged commit 6b57ddc into main Apr 14, 2026
binhex added a commit that referenced this pull request Apr 19, 2026
…gs-1776030818

Anvil/fix adversarial review findings 1776030818
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant