WIP: Add basic filter command as suggested in #834

spoorcc · 2025-10-18T22:43:31Z

Support for pre-commit hooks
Fixes #19

Description by Korbit AI

What change is being made?

Add a basic dfetch filter command that can list or pass through files to a command, integrate it into the CLI, and update supporting utilities, logging, and tooling configuration (pre-commit hooks, changelog, docs).

Why are these changes being made?

To provide a first-class file-filtering capability that can operate on manifest-scoped projects or stdin/args, and to wire it into the existing CLI and supporting utilities for robust usage and testing. This PR also updates tooling integration and documentation to reflect the new feature.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

Support for pre-commit hooks Fixes #19

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Incorrect exception type caught for argparse errors ▹ view
	Magic String Attribute Lookup ▹ view
	Command execution logged at DEBUG level ▹ view
	Naive string splitting for command parsing ▹ view
	Memory inefficient stdin processing ▹ view
	Inefficient O(n*m) project path lookup ▹ view
	Expensive path resolution per file ▹ view
	Mixed Responsibilities in Entry Point ▹ view
	Unclear list variable names ▹ view
	Non-descriptive tuple return type ▹ view

Files scanned

File Path	Reviewed
dfetch/log.py	✅
dfetch/util/util.py	✅
dfetch/main.py	✅
dfetch/util/cmdline.py	✅
dfetch/commands/filter.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

dfetch/__main__.py

korbit-ai · 2025-10-18T22:45:42Z

dfetch/__main__.py

+    if args.verbose or not getattr(args.func, "SILENT", False):
+        logger.print_title()


Magic String Attribute Lookup

Tell me more

What is the issue?

The use of a magic string 'SILENT' as an attribute lookup makes the code's intent unclear without additional context.

Why this matters

Future maintainers will need to search for where SILENT is defined and understand its purpose. This creates cognitive overhead and potential maintenance issues.

Suggested change ∙ Feature Preview

# Define a constant at module level SILENT_COMMAND_FLAG = 'SILENT' # Use in the code if args.verbose or not getattr(args.func, SILENT_COMMAND_FLAG, False): logger.print_title()

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-18T22:45:42Z

dfetch/util/cmdline.py

+    logger: logging.Logger, cmd: Union[str, list[str]]
+) -> "subprocess.CompletedProcess[Any]":
+    """Run a command and log the output, and raise if something goes wrong."""
+    logger.debug(f"Running {cmd}")


Command execution logged at DEBUG level

Tell me more

What is the issue?

The log message for running a command uses the DEBUG level, which might be too low for important command executions.

Why this matters

Using DEBUG level for command execution logs may result in these important events being missed in production environments where DEBUG logs are typically disabled.

Suggested change ∙ Feature Preview

Change the log level to INFO for command execution:

logger.info(f"Running command: {cmd}")

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-18T22:45:42Z

dfetch/util/cmdline.py

+    if not isinstance(cmd, list):
+        cmd = cmd.split(" ")


Naive string splitting for command parsing

Tell me more

What is the issue?

String splitting on single space fails for commands with multiple consecutive spaces or complex arguments.

Why this matters

This naive splitting approach will create empty strings in the command list when there are multiple spaces, potentially causing subprocess execution failures or incorrect argument parsing.

Suggested change ∙ Feature Preview

Use shlex.split() instead of str.split(" ") to properly handle shell-like command parsing with quoted arguments and multiple spaces:

import shlex if not isinstance(cmd, list): cmd = shlex.split(cmd)

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

dfetch/commands/filter.py

korbit-ai · 2025-10-18T22:45:42Z

dfetch/commands/filter.py

+        for project_path in project_paths:
+            try:
+                file.relative_to(project_path)
+                return project_path
+            except ValueError:
+                continue


Inefficient O(n*m) project path lookup

Tell me more

What is the issue?

The file-in-project check performs O(n) linear search through all project paths for each file, resulting in O(n*m) complexity where n is files and m is projects.

Why this matters

With many files and projects, this nested loop creates quadratic time complexity that will significantly slow down filtering operations as the number of projects grows.

Suggested change ∙ Feature Preview

Pre-sort project paths by depth (deepest first) and use early termination, or consider using a trie-based structure for path prefix matching to reduce average case complexity.

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-18T22:45:43Z

dfetch/commands/filter.py

+        block_outside: list[str] = []
+
+        for path_or_arg in input_list:
+            arg_abs_path = Path(pwd / path_or_arg.strip()).resolve()


Expensive path resolution per file

Tell me more

What is the issue?

Path resolution with resolve() is called for every input file, which involves expensive filesystem operations including symlink resolution and path canonicalization.

Why this matters

The resolve() method performs multiple filesystem syscalls per file, creating significant I/O overhead that scales linearly with the number of input files and can become a bottleneck for large file sets.

Suggested change ∙ Feature Preview

Cache resolved paths or use absolute path construction without full resolution when symlink handling isn't critical:

arg_abs_path = (pwd / path_or_arg.strip()).absolute()

Only call resolve() when necessary for symlink handling.

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-18T22:45:43Z

dfetch/commands/filter.py

+            help="Arguments to pass to the command",
+        )
+
+    def __call__(self, args: argparse.Namespace) -> None:


Mixed Responsibilities in Entry Point

Tell me more

What is the issue?

The call method mixes configuration, business logic, and output handling in a single method.

Why this matters

This violates the Single Responsibility Principle and makes the code less maintainable and harder to test individual components.

Suggested change ∙ Feature Preview

Split the call method into separate methods for configuration, filtering, and output handling:

def __call__(self, args: argparse.Namespace) -> None: self._configure_logging(args) filtered_args = self._process_filtering(args) self._handle_output(args, filtered_args)

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-18T22:45:43Z

dfetch/commands/filter.py

+        block_inside: list[str] = []
+        block_outside: list[str] = []


Unclear list variable names

Tell me more

What is the issue?

The variable names 'block_inside' and 'block_outside' are not immediately clear about what they represent in the context of file filtering.

Why this matters

Unclear variable names force readers to trace through the code to understand their purpose, increasing cognitive load.

Suggested change ∙ Feature Preview

files_inside_projects: list[str] = [] files_outside_projects: list[str] = []

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-18T22:45:43Z

dfetch/commands/filter.py

+    def _filter_files(
+        self, pwd: Path, topdir: Path, project_paths: set[Path], input_list: list[str]
+    ) -> tuple[list[str], list[str]]:


Non-descriptive tuple return type

Tell me more

What is the issue?

The return type annotation using tuple[list[str], list[str]] is not descriptive enough to understand what the two lists represent.

Why this matters

Generic tuple return types make it difficult to understand the meaning of each component without looking at the implementation.

Suggested change ∙ Feature Preview

from typing import NamedTuple class FilterResult(NamedTuple): files_inside_projects: list[str] files_outside_projects: list[str] def _filter_files( self, pwd: Path, topdir: Path, project_paths: set[Path], input_list: list[str] ) -> FilterResult:

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

spoorcc marked this pull request as draft October 18, 2025 22:44

spoorcc added 3 commits October 18, 2025 22:45

Make it possible to have silent commands

fe3a0fe

Add basic filter command as suggested in

3546322

Support for pre-commit hooks Fixes #19

wip feature

b97688f

korbit-ai bot reviewed Oct 18, 2025

View reviewed changes

spoorcc force-pushed the spoorcc/issue19 branch from 2dbdf3c to b97688f Compare October 18, 2025 22:45

spoorcc temporarily deployed to testpypi October 18, 2025 22:47 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Add basic filter command as suggested in #834

WIP: Add basic filter command as suggested in #834

spoorcc commented Oct 18, 2025 •

edited by korbit-ai bot

Loading

Uh oh!

korbit-ai bot left a comment •

edited

Loading

Uh oh!

Uh oh!

korbit-ai bot Oct 18, 2025

Uh oh!

korbit-ai bot Oct 18, 2025

Uh oh!

korbit-ai bot Oct 18, 2025

Uh oh!

Uh oh!

korbit-ai bot Oct 18, 2025

Uh oh!

korbit-ai bot Oct 18, 2025

Uh oh!

korbit-ai bot Oct 18, 2025

Uh oh!

korbit-ai bot Oct 18, 2025

Uh oh!

korbit-ai bot Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if args.verbose or not getattr(args.func, "SILENT", False):
		logger.print_title()

WIP: Add basic filter command as suggested in #834

Are you sure you want to change the base?

WIP: Add basic filter command as suggested in #834

Conversation

spoorcc commented Oct 18, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

Uh oh!

korbit-ai bot Oct 18, 2025

Choose a reason for hiding this comment

Magic String Attribute Lookup

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 18, 2025

Choose a reason for hiding this comment

Command execution logged at DEBUG level

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 18, 2025

Choose a reason for hiding this comment

Naive string splitting for command parsing

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

Uh oh!

korbit-ai bot Oct 18, 2025

Choose a reason for hiding this comment

Inefficient O(n*m) project path lookup

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 18, 2025

Choose a reason for hiding this comment

Expensive path resolution per file

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 18, 2025

Choose a reason for hiding this comment

Mixed Responsibilities in Entry Point

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 18, 2025

Choose a reason for hiding this comment

Unclear list variable names

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 18, 2025

Choose a reason for hiding this comment

Non-descriptive tuple return type

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

spoorcc commented Oct 18, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot left a comment •

edited

Loading