Skip to content

fix(loader): match ignore patterns on repo-relative paths#13

Open
Bbowlby22 wants to merge 2 commits intoHKUDS:mainfrom
Bbowlby22:omnilore-upstream-loader-ignore-fix
Open

fix(loader): match ignore patterns on repo-relative paths#13
Bbowlby22 wants to merge 2 commits intoHKUDS:mainfrom
Bbowlby22:omnilore-upstream-loader-ignore-fix

Conversation

@Bbowlby22
Copy link
Copy Markdown

Summary

  • fix directory ignore matching in RepositoryLoader.scan_files() to use repo-relative paths
  • keep existing .gitignore merge behavior (effective_ignore) intact
  • ensure wildcard directory patterns (for example output/, .venv/) are honored reliably

Root cause

should_ignore_path() evaluates gitwildmatch patterns against paths relative to the repo, but the current directory filter passed absolute paths from os.walk(). This caused false misses and over-indexing in large repos.

Change

  • convert each walked directory to a normalized path relative to self.repo_path
  • test both rel_path and rel_path + "/" for directory-style patterns
  • keep file-level ignore checks unchanged except normalizing relative_path

Scope

  • loader-only bug fix (fastcode/loader.py)
  • no config-default changes

Validation

  • python -m py_compile fastcode/loader.py
  • behavior verified in OmniLore FastCode integration on large workspace indexing

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ignore-directory matching in RepositoryLoader.scan_files() by ensuring gitwildmatch ignore patterns are evaluated against repo-relative (normalized) paths, improving correctness when skipping directories like output/ or .venv/.

Changes:

  • Convert walked directory paths to normalized repo-relative paths before applying ignore patterns.
  • Check both rel_path and rel_path + "/" to reliably match directory-style patterns.
  • Normalize relative_path for file-level ignore checks.
Comments suppressed due to low confidence (1)

fastcode/loader.py:276

  • relative_path is already normalized when assigned, but it’s normalized again when building the metadata dict ("relative_path": normalize_path(relative_path)). This second normalization is redundant; consider storing relative_path directly to avoid extra work in tight loops.
                relative_path = normalize_path(
                    os.path.relpath(file_path, self.repo_path)
                )

                # Check if should ignore
                if should_ignore_path(relative_path, effective_ignore):
                    continue
                
                # Check if supported extension
                if not is_supported_file(file_path, self.supported_extensions):
                    continue
                
                # Check file size
                try:
                    file_size = os.path.getsize(file_path)
                    if file_size > max_file_size_bytes:
                        self.logger.warning(
                            f"Skipping large file: {relative_path} "
                            f"({file_size / 1024 / 1024:.2f} MB)"
                        )
                        continue
                    
                    files.append({
                        "path": normalize_path(file_path),
                        "relative_path": normalize_path(relative_path),
                        "size": file_size,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread fastcode/loader.py
Comment on lines 232 to +247
for root, dirs, filenames in os.walk(self.repo_path):
# Filter out ignored directories
dirs[:] = [d for d in dirs if not should_ignore_path(
os.path.join(root, d), effective_ignore
)]
# Filter ignored directories using repo-relative paths so gitwildmatch
# patterns like "output/" or ".venv/" match consistently.
filtered_dirs = []
for d in dirs:
abs_dir_path = os.path.join(root, d)
rel_dir_path = normalize_path(
os.path.relpath(abs_dir_path, self.repo_path)
)
rel_dir_with_trailing = f"{rel_dir_path}/"
if should_ignore_path(
rel_dir_path, effective_ignore
) or should_ignore_path(rel_dir_with_trailing, effective_ignore):
continue
filtered_dirs.append(d)
dirs[:] = filtered_dirs
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should_ignore_path() recompiles a PathSpec on every call (see fastcode/utils.py). In this loop it’s now invoked multiple times per directory and once per file, which can be a significant CPU cost on large repos. Consider compiling the ignore spec once in scan_files() (e.g., build a PathSpec from effective_ignore) and then calling spec.match_file(...) for both directory and file checks.

Copilot uses AI. Check for mistakes.
@Bbowlby22
Copy link
Copy Markdown
Author

Correction to previous comment (shell interpolation noise):

Added follow-up perf hardening requested by review:

  • precompile PathSpec once per scan_files() using merged ignore patterns (effective_ignore)
  • reuse matcher for directory and file checks
  • preserve repo-relative matching and trailing-slash directory semantics

Validation:

  • python -m py_compile fastcode/loader.py

New commit on this PR branch:

  • 860eed1perf(loader): precompile ignore matcher during repository scan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants