Skip to content

Conversation

@spoorcc
Copy link
Contributor

@spoorcc spoorcc commented Oct 18, 2025

Support for pre-commit hooks
Fixes #19

Description by Korbit AI

What change is being made?

Add a basic dfetch filter command that can list or pass through files to a command, integrate it into the CLI, and update supporting utilities, logging, and tooling configuration (pre-commit hooks, changelog, docs).

Why are these changes being made?

To provide a first-class file-filtering capability that can operate on manifest-scoped projects or stdin/args, and to wire it into the existing CLI and supporting utilities for robust usage and testing. This PR also updates tooling integration and documentation to reflect the new feature.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

@spoorcc spoorcc marked this pull request as draft October 18, 2025 22:44
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Error Handling Incorrect exception type caught for argparse errors ▹ view
Readability Magic String Attribute Lookup ▹ view
Logging Command execution logged at DEBUG level ▹ view
Performance Naive string splitting for command parsing ▹ view
Performance Memory inefficient stdin processing ▹ view
Performance Inefficient O(n*m) project path lookup ▹ view
Performance Expensive path resolution per file ▹ view
Design Mixed Responsibilities in Entry Point ▹ view
Readability Unclear list variable names ▹ view
Readability Non-descriptive tuple return type ▹ view
Files scanned
File Path Reviewed
dfetch/log.py
dfetch/util/util.py
dfetch/main.py
dfetch/util/cmdline.py
dfetch/commands/filter.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

Comment on lines +71 to +72
if args.verbose or not getattr(args.func, "SILENT", False):
logger.print_title()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic String Attribute Lookup category Readability

Tell me more
What is the issue?

The use of a magic string 'SILENT' as an attribute lookup makes the code's intent unclear without additional context.

Why this matters

Future maintainers will need to search for where SILENT is defined and understand its purpose. This creates cognitive overhead and potential maintenance issues.

Suggested change ∙ Feature Preview
# Define a constant at module level
SILENT_COMMAND_FLAG = 'SILENT'

# Use in the code
if args.verbose or not getattr(args.func, SILENT_COMMAND_FLAG, False):
    logger.print_title()
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

logger: logging.Logger, cmd: Union[str, list[str]]
) -> "subprocess.CompletedProcess[Any]":
"""Run a command and log the output, and raise if something goes wrong."""
logger.debug(f"Running {cmd}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command execution logged at DEBUG level category Logging

Tell me more
What is the issue?

The log message for running a command uses the DEBUG level, which might be too low for important command executions.

Why this matters

Using DEBUG level for command execution logs may result in these important events being missed in production environments where DEBUG logs are typically disabled.

Suggested change ∙ Feature Preview

Change the log level to INFO for command execution:

logger.info(f"Running command: {cmd}")
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Comment on lines +78 to +79
if not isinstance(cmd, list):
cmd = cmd.split(" ")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive string splitting for command parsing category Performance

Tell me more
What is the issue?

String splitting on single space fails for commands with multiple consecutive spaces or complex arguments.

Why this matters

This naive splitting approach will create empty strings in the command list when there are multiple spaces, potentially causing subprocess execution failures or incorrect argument parsing.

Suggested change ∙ Feature Preview

Use shlex.split() instead of str.split(" ") to properly handle shell-like command parsing with quoted arguments and multiple spaces:

import shlex

if not isinstance(cmd, list):
    cmd = shlex.split(cmd)
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Comment on lines +136 to +141
for project_path in project_paths:
try:
file.relative_to(project_path)
return project_path
except ValueError:
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inefficient O(n*m) project path lookup category Performance

Tell me more
What is the issue?

The file-in-project check performs O(n) linear search through all project paths for each file, resulting in O(n*m) complexity where n is files and m is projects.

Why this matters

With many files and projects, this nested loop creates quadratic time complexity that will significantly slow down filtering operations as the number of projects grows.

Suggested change ∙ Feature Preview

Pre-sort project paths by depth (deepest first) and use early termination, or consider using a trie-based structure for path prefix matching to reduce average case complexity.

Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

block_outside: list[str] = []

for path_or_arg in input_list:
arg_abs_path = Path(pwd / path_or_arg.strip()).resolve()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expensive path resolution per file category Performance

Tell me more
What is the issue?

Path resolution with resolve() is called for every input file, which involves expensive filesystem operations including symlink resolution and path canonicalization.

Why this matters

The resolve() method performs multiple filesystem syscalls per file, creating significant I/O overhead that scales linearly with the number of input files and can become a bottleneck for large file sets.

Suggested change ∙ Feature Preview

Cache resolved paths or use absolute path construction without full resolution when symlink handling isn't critical:

arg_abs_path = (pwd / path_or_arg.strip()).absolute()

Only call resolve() when necessary for symlink handling.

Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

help="Arguments to pass to the command",
)

def __call__(self, args: argparse.Namespace) -> None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed Responsibilities in Entry Point category Design

Tell me more
What is the issue?

The call method mixes configuration, business logic, and output handling in a single method.

Why this matters

This violates the Single Responsibility Principle and makes the code less maintainable and harder to test individual components.

Suggested change ∙ Feature Preview

Split the call method into separate methods for configuration, filtering, and output handling:

def __call__(self, args: argparse.Namespace) -> None:
    self._configure_logging(args)
    filtered_args = self._process_filtering(args)
    self._handle_output(args, filtered_args)
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Comment on lines +105 to +106
block_inside: list[str] = []
block_outside: list[str] = []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear list variable names category Readability

Tell me more
What is the issue?

The variable names 'block_inside' and 'block_outside' are not immediately clear about what they represent in the context of file filtering.

Why this matters

Unclear variable names force readers to trace through the code to understand their purpose, increasing cognitive load.

Suggested change ∙ Feature Preview
files_inside_projects: list[str] = []
files_outside_projects: list[str] = []
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Comment on lines +101 to +103
def _filter_files(
self, pwd: Path, topdir: Path, project_paths: set[Path], input_list: list[str]
) -> tuple[list[str], list[str]]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-descriptive tuple return type category Readability

Tell me more
What is the issue?

The return type annotation using tuple[list[str], list[str]] is not descriptive enough to understand what the two lists represent.

Why this matters

Generic tuple return types make it difficult to understand the meaning of each component without looking at the implementation.

Suggested change ∙ Feature Preview
from typing import NamedTuple

class FilterResult(NamedTuple):
    files_inside_projects: list[str]
    files_outside_projects: list[str]

def _filter_files(
    self, pwd: Path, topdir: Path, project_paths: set[Path], input_list: list[str]
) -> FilterResult:
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for pre-commit hooks

2 participants