Skip to content

FileOrganizer v8.2.0 - Phantom-category audit + cleanup

Choose a tag to compare

@SysAdminDoc SysAdminDoc released this 29 Apr 00:39
· 311 commits to main since this release

v8.2.0 — Phantom-category audit + cleanup

A full project audit of the multi-source classification pipeline, plus the
research and documentation that powers the next-phase Python app.

Headline numbers

  • 0 phantom category dirs remaining (was 13 on G:\ + 253 on I:\Organized)
  • 0 misplaced AE templates in non-AE categories (was 367 across 21 categories)
  • 3,796 collision dirs merged via dedup (was 1,229+ pairs)
  • 15,633 items migrated out of legacy I:\Organized phantom dirs in 24 seconds (was estimated 8+ hours with the previous robocopy /MOVE approach)
  • +622 GB freed on G:\ via Stock Footage - Abstract & VFX migration to I:\
  • 30,867 journaled moves in organize_moves.db

Source-code bug fixes (3 root causes of every phantom on disk)

  1. fix_stock_ae_items.py — keyword rule produced After Effects - Promo & Advertising (not in canonical taxonomy). Merged into After Effects - Product Promo.
  2. merge_stock.py — AE Organized fallback f"After Effects - {sub.name}" invented category names. Replaced with strict allowlist.
  3. review_resolver.py — SYSTEM_PROMPT had ~11 ground-truth rules pointing at non-existent categories (Photoshop - Print & Stationery, After Effects - Backgrounds, etc.). All rewritten + canonicalize() validator added.

Performance: 60-1000x speedup on same-drive moves

robocopy /MOVE always copies file bytes even on the same drive. Replaced with per-child os.rename for same-drive (metadata-only NTFS rename); cross-drive still uses robocopy /256 /COPY:DAT. Applied to both fix_phantom_categories.py and fix_duplicates.py.

New tooling

  • fix_phantom_categories.py — non-canonical category migration with auditable log.
  • fix_flagged_misclassifications.py — corrects 4 known thematic-folder mis-routings.
  • find_misclassified_tutorials.py — detects AE tutorial videos parked in stock dirs.
  • resolve_review_manual.py, resolve_unknown_vh.py — hand-curated _Review cleanup.
  • manual_ae_classifications.py — 134 items classified by hand, zero AI calls.

Robustness fixes

  • organize_run.robust_move now handles cross-drive single-file moves correctly (was creating dst as a directory and breaking on .zip/.rar files).
  • fix_duplicates.py writes its log incrementally every 50 merges (was end-of-run only — a kill mid-pass lost the audit trail).
  • fix_stock_ae_items.py scan-dir list expanded from 5 to 18 categories (excludes Plugins, Cinematic FX, Color Grading, Sound Effects — those legitimately have AE files).
  • --no-ai flag on fix_stock_ae_items.py for keyword-only mode.

Taxonomy

  • CATEGORY_ALIASES in organize_run.py expanded by ~250 entries covering every observed phantom + the entire legacy I:\Organized hierarchy + a _web_template_collapse() helper for Web Template - <subcat>Web Template.

Documentation

  • AUDIT_LESSONS.md — 15 architectural lessons distilled from the audit, intended as the design-rules checklist for the future Python app.
  • RESEARCH_IDEAS.md — 12 improvement areas drawn from similar projects (Eagle App, digiKam, Hydrus, MusicBrainz Picard, Sonarr/Radarr, Czkawka, fclones, organize-cli, Adobe Bridge), industry standards (XMP, IPTC, EXIF, ID3, JSON-LD), and a concrete next-quarter roadmap.

See CHANGELOG.md for the full list and CLAUDE.md for the gotchas index.