Skip to content

feat: replace rglob with os.walk and per-language directory pruning#1539

Merged
KRRT7 merged 6 commits intomainfrom
faster-file-discovery
Feb 19, 2026
Merged

feat: replace rglob with os.walk and per-language directory pruning#1539
KRRT7 merged 6 commits intomainfrom
faster-file-discovery

Conversation

@KRRT7
Copy link
Collaborator

@KRRT7 KRRT7 commented Feb 19, 2026

Summary

  • Replace Path.rglob in get_files_for_language() with a single os.walk pass that prunes excluded directories in-place, avoiding traversal of .venv, node_modules, .git, etc.
  • Add dir_excludes property to LanguageSupport protocol so each language declares its own exclusion patterns (exact names, prefix*, *suffix)
  • Python excludes ~22 patterns (caches, build dirs, VCS); JS/TS excludes 9 (bundler output, framework caches)

Test plan

  • pytest tests/test_function_discovery.py — 14 passed
  • pytest tests/test_languages/test_function_discovery_integration.py — 14 passed
  • ruff check clean on all changed files

File discovery used Path.rglob per extension, traversing excluded dirs
(e.g. .venv, node_modules) before filtering. Switch to a single os.walk
pass with in-place dirs[:] pruning. Each language now declares its own
dir_excludes patterns (exact, prefix*, *suffix) on the LanguageSupport
protocol, parsed by parse_dir_excludes() at walk time.
@KRRT7
Copy link
Collaborator Author

KRRT7 commented Feb 19, 2026

codeflash should optimize this

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
KRRT7 and others added 2 commits February 18, 2026 19:52
@claude
Copy link
Contributor

claude bot commented Feb 19, 2026

PR Review Summary

Prek Checks

✅ All checks pass (ruff check and ruff format). No fixes needed.

Mypy

53 errors found in the 4 changed files, but all are pre-existing — none are introduced by this PR. The new code (parse_dir_excludes, dir_excludes properties, os.walk refactoring) is type-clean.

Code Review

The refactoring from rglob to os.walk with per-language directory pruning looks correct. One existing issue remains unfixed:

⚠️ Test bug (from previous review, still open): Three test files pass Language as the 2nd positional argument to get_files_for_language(), where it gets interpreted as ignore_paths instead of language:

  • tests/test_languages/test_javascript_e2e.py:85get_files_for_language(js_project_dir, Language.JAVASCRIPT)
  • tests/test_languages/test_typescript_e2e.py:105get_files_for_language(ts_project_dir, Language.TYPESCRIPT)
  • tests/test_languages/test_vitest_e2e.py:96get_files_for_language(vitest_project_dir, Language.TYPESCRIPT)

Since Language(str, Enum) is iterable, this silently iterates over characters and creates nonsense Path objects. Fix: use language= keyword arg.

Test Coverage

File Stmts (main) Stmts (PR) Cover (main) Cover (PR) Δ
codeflash/discovery/functions_to_optimize.py 524 547 69% 70% +1%
codeflash/languages/base.py 109 111 99% 99%
codeflash/languages/javascript/support.py 951 954 74% 74%
codeflash/languages/python/support.py 276 279 51% 52% +1%
Total 1860 1891 71% 71%

✅ No coverage regression. Slight improvement in functions_to_optimize.py (+1%) and python/support.py (+1%).

Note: 8 pre-existing test failures in tests/test_tracer.py (unrelated to this PR).


Last updated: 2026-02-19

Remove codeflash-specific tessl tiles, add new pypi tiles from
pyproject.toml, and run uv sync --upgrade to bump dependencies.
Replace skills gitignores with MCP server config for codex and gemini.
@KRRT7 KRRT7 merged commit 47f5887 into main Feb 19, 2026
26 of 28 checks passed
@KRRT7 KRRT7 deleted the faster-file-discovery branch February 19, 2026 01:39
KRRT7 added a commit that referenced this pull request Feb 19, 2026
feat: replace rglob with os.walk and per-language directory pruning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant