Skip to content

refactor(core): overhaul file filtering, fix set -e traps, and add extraction summaries#1

Merged
JacksonFergusonDev merged 1 commit intomainfrom
refactor/overhaul-file-filtering
Apr 16, 2026
Merged

refactor(core): overhaul file filtering, fix set -e traps, and add extraction summaries#1
JacksonFergusonDev merged 1 commit intomainfrom
refactor/overhaul-file-filtering

Conversation

@JacksonFergusonDev
Copy link
Copy Markdown
Owner

Description

This PR addresses several edge cases in the context extraction pipeline related to file classification, terminal feedback, and Bash strict mode constraints.

Previously, the file --mime-encoding check was too slow, misclassified empty files (like __init__.py) as binary, and allowed noisy text-based assets (like .svg or minified CSS) to leak into the LLM context. Furthermore, the focal files command lacked visibility into skipped or truncated files and contained a set -e trap during file counting.

Changes

  • Global Noise Filtering: Implemented a centralized FOCAL_NOISE_EXTS array in lib/core.sh (mirroring the Python backend). These extensions are compiled into regex for the formatter and injected into fd flags to keep the fzf UI clean of noisy assets.
  • Empty File Preservation: Added a [ ! -s "$file" ] intercept in format_file_for_llm to ensure empty structural files (e.g., __init__.py) are captured rather than discarded as binaries.
  • Monolithic Text Failsafe: Replaced data-specific truncation with a global 1500-line limit for all readable text files to prevent context window blowouts from massive log files or database dumps.
  • Bash Strict Mode Fix: Replaced post-increment operators (((var++))) in libexec/files with safe arithmetic assignments (var=$((var + 1))) to prevent premature script termination when initialized at zero under set -e.
  • Extraction Summary: focal files now tallies the status of all processed files and prints a clean, macroscopic summary to stdout upon completion.

…traction summaries

- Centralized noise/asset filtering in core.sh to explicitly exclude media and data binaries, hiding them from the fzf menu.
- Added a file size intercept to prevent empty files (__init__.py) from being incorrectly flagged and skipped as binary.
- Implemented a 1500-line global failsafe for text files to protect LLM context windows from monolithic files.
- Fixed a `set -euo pipefail` bug in libexec/files caused by post-incrementing `((var++))` from zero.
- Added a terminal summary report to `focal files` detailing exact counts of added, empty, truncated, and skipped files.
@JacksonFergusonDev JacksonFergusonDev merged commit f3431f6 into main Apr 16, 2026
4 checks passed
@JacksonFergusonDev JacksonFergusonDev deleted the refactor/overhaul-file-filtering branch April 16, 2026 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant