Releases: apiad/fsgc
Releases · apiad/fsgc
v0.5.0
[0.5.0] - 2026-06-09
Fixed
- Sweeper undercounted
freed_bytesforstale_dirREVIEW deletions. The CLI passedbehavioral_paths: [m.path for m in matches]intoSweeper.sweep, which then re-derived size withpath.stat().st_size if path.is_file() else 0— always zero for a directory. The rolled-upBehavioralMatch.size_bytes(computed byScanner._finalize_behavioral_matches) was silently discarded, so the end-of-sweep "Moved to trash X" total and the JSONLsize_bytesaudit field both reported 0 for stale code projects. The sweeper now consumes thematcheslist directly and trusts the recorded size. New regression test intests/test_sweeper.py::test_sweeper_uses_recorded_size_for_stale_dir_review.
Internal
make lintnow runsmypy src/. AGENTS.md has long claimed "Strict mypy — annotate everything", but the makefile only ranruff. The gate now matches the convention, and four pre-existing strict-mypy errors inbehavior.py,trail.py, andscanner.pyare fixed.
Behavioral abandonment heuristics (NEW)
behaviors.yamlcatalog ships four v1 rules that catch what signatures can't: Stale Code Project (180-day.git/HEADmtime), Old Download (90-day file mtime under**/Downloads/*), Forgotten Archive (90-day file mtime, archive/installer extensions anywhere), Old Large ML Weights (180-day file mtime, weight extensions, ≥500 MB).- REVIEW section in the proposal. Behavioral matches appear under a clearly-labelled
🔍 Reviewheader below the structural🗑 Garbagegroups. Never auto-checked, distinct color, and a typed-yesgate fires before any REVIEW item is swept. - JSONL journal gains
review: true. Sweep entries flagged sojq 'select(.review)' ~/.local/share/fsgc/sweep-log.jsonlreturns exactly the behavioral deletions. - Trail cache integration.
stale_dirmatches (e.g. Stale Code Project) persist alongside the trail and restore on cache hit — once a stale repo is flagged it stays flagged across subsequent warm scans.stale_filematches inside cached subtrees are a documented limitation;fsgc scan --no-cacheis the escape hatch. - Gitlink tolerance.
.gitas a regular file (worktrees, submodules) no longer aborts the subtree walk — the rule simply skips. - Detection cost. One extra
os.statper candidate directory for the git-head signal; zero extra syscalls on file rules (reuses thestatalready done by_get_entries).
Verification (Behavioral heuristics)
- New
tests/test_behavior.py(9 tests) covers rule loading + validation + shipped catalog. - New
tests/test_scanner_behavioral.py(8 tests) covers stale_dir + stale_file detection + cache roundtrip + gitlink tolerance. - New
tests/test_review_flow.py(4 tests) covers prompt gating + proposal rendering. - Real-world acceptance on
~/(cold cache, 30 s budget, VPS host with sparse user data): elapsed 33.9 s, structural=3 groups, review=0 groups, zero exceptions.
Added
- Sweeper module (
fsgc.sweeper): Extracted the deletion path into a dedicatedSweeperclass with structuredSweepResult/DeletionRecordrecords, replacing the inline loop in__main__.sweep(). The CLI now formats results; the sweeper decides what to delete. - Unsafe-root guard: Sweeper refuses to delete the filesystem root, the user's home directory, and a built-in list of system paths (
/usr,/etc,/var,/boot,/bin,/lib, …) regardless of signature match. - Symlink guard: Symlinks are never followed during sweep — the symlink itself is preserved and the target is untouched, even when the symlink's name matches a signature pattern.
- Sentinel re-verification at sweep time: Each node is re-stat'd before deletion to confirm at least one signature sentinel is still present, catching the race where a sentinel disappeared between scan and confirm.
- Trash-by-default deletion (
send2trash): Confirmed sweeps now move directories to the system trash instead of unlinking them. Opt out with--permanentfor the prior rmtree behavior. The confirmation prompt distinguishes "Move to Trash" from "PERMANENT Deletion" so the user knows which mode is active. - JSONL sweep journal: Every record (trashed, deleted, dry-run, skipped, or errored) is appended as one JSON line to
~/.local/share/fsgc/sweep-log.jsonlfor audit + recovery. Disable with--no-journal. - Test coverage for the deletion path: 18 tests in
tests/test_sweeper.pycovering dry-run/run, unsafe-root, symlinks, sentinel re-verification, missing paths, OSError tolerance, freed-bytes accounting, trash vs permanent modes, trash failure handling, and journal output (single + multi-invocation + every-outcome).
Changed
aggregator.group_by_signature()now includes the matchedSignaturein each group dict (used by the sweeper for sentinel re-verification).- Sweep output is per-record (trashed / deleted / skipped / errored) with skip reasons surfaced to the user; reclaimed-bytes total now reflects bytes actually freed rather than scanned size.
Dependencies
- Added
send2trash >= 1.8for cross-platform recoverable deletion.
Performance
- Parallel sweep:
Sweeper.max_concurrencyruns deletions on aThreadPoolExecutor(default 1 for library use; CLI threads through--workers, default 8).shutil.rmtreeandsend2trashrelease the GIL during syscalls so a million-filenode_modulesno longer blocks the rest of the queue. Records stay reassembled in submission order; the journal serializes via a mutex so no entries are lost under concurrency. - Live progress bar: Sweeps now render a Rich
Progress(spinner, bar, M/N items, bytes/s, elapsed) that updates per-record. The per-record chatter in the previous output was replaced by a post-sweep summary listing every error and skipped item, so failures stay visible without scrolling through the deletion log.
Heuristics overhaul (BREAKING — no backcompat)
- Recovery-tier schema:
Signature.priority: floatremoved;Signature.recovery: Recovery(enum:trivial/local/network) takes its place. The tier caps the score (1.0 / 0.7 / 0.4) and expresses how costly the directory is to restore —trivialregenerates automatically offline,localrebuilds from sources in the same tree,networkrequires re-downloading. - Score formula rewritten:
score = age_factor × RECOVERY_CAP[recovery]. Recency was 10% of the prior formula; it's now the multiplier. The deadp_score = 1.0 * 0.6constant was removed entirely. Old + trivial surfaces first; young or network-bound sinks to the bottom. - Group sort by score, not raw size:
aggregator.group_by_signaturenow sorts by(avg_score, size)descending so the user-facing proposal matches the recovery-tier ordering. Previously, large but actively-used.venvtrees would appear above small but stale browser caches. - Min-age check uses
max(atime, mtime): Linux defaults tonoatimemounts, makingatimeunreliable. mtime tracks directory-content churn (entries added/removed), which is the right "still in use" signal. - Dangerous signatures removed: Bare
**/binand**/obj(no sentinels in the YAML, despite docs claiming.dll/.pdb) are no longer shipped — they would have matched~/Workspace/bin/,.venv/bin/,~/.local/bin/. Re-add per-user via~/.config/fsgc/signatures.yamlif needed. node_modulessentinel fixed: Removed the bogus literal"node_modules"sentinel (a directory name, not a file).package.jsonremains the sole sentinel.__pycache__gainsmin_age_days: 1to avoid being swept mid-build.
Catalog expansion
- Per-profile browser caches (Linux): Chrome / Chromium / Brave / Microsoft Edge / Vivaldi each get
**/.config/<browser>/<Profile>/Cachepatterns (plus Chrome'sCode Cache,GPUCache,Service Worker/CacheStorage— where the multi-GB actually lives, vs the often-empty~/.cache/google-chrome). - Firefox
**/.cache/mozilla/firefox/*/cache2. - Electron desktop apps: Discord, Spotify, JetBrains, plus Cursor and VS Code
CachedDataon top of the existing VS Code / Slack rules. - uv interpreters
**/.local/share/uv/python(often multi-GB of downloaded CPython builds). - System trash
**/.local/share/Trash/{files,info}— emptying the trash is now part of the sweep proposal. - Snap / Flatpak per-app caches
**/.cache/snap,**/.var/app/*/cache. - Generic build outputs now require strong sentinels:
**/buildrequires*.o/*.a/*.lib/CMakeCache.txt;**/distrequires*.whl/*.tar.gz/*.egg-info.
Verification
- Total signatures: 52 (up from 32).
- All 60 tests pass;
test_engine.pyrewritten with 9 focused tests on the new formula (recovery cap, age scaling, min-age cutoff, atime-vs-mtime, tier ordering). - Smoke test on
~/Workspace/repos/surfaced 8.6 GB recoverable across 6 groups in a single scan; with the new sort, old + trivial caches sit above large-but-fresh.venvtrees as intended.
Trail cache rewrite (BREAKING — no migration)
- All
.gctrailfiles are gone. The previous implementation wrote a binary.gctrailper directory >100 MB, scattering hundreds of pollution files across~, including inside.venvtrees that were themselves about to be deleted (self-defeating). Removed the format, the persist, and the read paths. 383 stale.gctrailfiles were deleted from the workspace at the same time. - New
TrailStorebacked bybeaver-db(SQLite). Single file at~/.cache/fsgc/trails.db, keyed by absolute path. Schema:{scanned_at, fingerprint, total_size, entry_count, atime, mtime, file_evidence, top_children: [(name, score, size)]}. 30-day TTL applied at write time so stale entries naturally age out. - In-memory cache layer.
BeaverDB's sync facade marshals every call onto a single background "Reactor" thread, causing SQLite lock contention under 8 concurrent workers.TrailStorenow bulk-load...
v0.4.0
Added
- Plan for porting MCTS scanner engine to Rust (
plans/port-mcts-to-rust.md).
Changed
- Implemented dynamic worker count using
DEFAULT_WORKERSinsrc/fsgc/__main__.py. - Optimized
os.scandirfetching and addedimport timetosrc/fsgc/scanner.pyas part of "Low-Hanging Fruit" optimizations.
Fixed
- Addressed an accidental syntax error in
src/fsgc/__main__.py.
v0.3.0
[0.3.0] - 2026-03-18
Added
- Stochastic MCTS-based Scanner: New informed search strategy using Monte Carlo Tree Search (MCTS) to prioritize high-value garbage branches.
-
Parallelization: Bounded worker pool using
asyncio.to_threadfor concurrent filesystem exploration. -
Incremental Metadata Propagation: Push-based upward metadata updates for
$O(1)$ root snapshots and improved wide-tree performance. -
Documentation Suite: Comprehensive
docs/directory with MkDocs/Material theme integration. - CI/CD: Automated testing and PyPI publication workflows via GitHub Actions.
- Real-time Metrics: Scan speed indicator (MB/s) and summary statistics in the TUI.
-
Graceful Interruption: Robust
Ctrl+Chandling in the scanning phase. -
Sentinel Verification: Content-based verification for garbage signatures (e.g., checking for
package.jsoninnode_modules).
Changed
- Refactored
Scannerto an async-first model. - Optimized signature matching with name-based fast-paths and caching.
- Enhanced
GCTrailbinary schema to store top subdirectories for informed selection.
Fixed
- Performance bottlenecks in wide directory tree traversals.
- Quadratic complexity in MCTS node selection.
- Redundant signature matching across iterations.