-
Notifications
You must be signed in to change notification settings - Fork 5
WIP: Add basic filter command as suggested in #834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Support for pre-commit hooks Fixes #19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
| Category | Issue | Status |
|---|---|---|
| Incorrect exception type caught for argparse errors ▹ view | ||
| Magic String Attribute Lookup ▹ view | ||
| Command execution logged at DEBUG level ▹ view | ||
| Naive string splitting for command parsing ▹ view | ||
| Memory inefficient stdin processing ▹ view | ||
| Inefficient O(n*m) project path lookup ▹ view | ||
| Expensive path resolution per file ▹ view | ||
| Mixed Responsibilities in Entry Point ▹ view | ||
| Unclear list variable names ▹ view | ||
| Non-descriptive tuple return type ▹ view |
Files scanned
| File Path | Reviewed |
|---|---|
| dfetch/log.py | ✅ |
| dfetch/util/util.py | ✅ |
| dfetch/main.py | ✅ |
| dfetch/util/cmdline.py | ✅ |
| dfetch/commands/filter.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
| if args.verbose or not getattr(args.func, "SILENT", False): | ||
| logger.print_title() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic String Attribute Lookup 
Tell me more
What is the issue?
The use of a magic string 'SILENT' as an attribute lookup makes the code's intent unclear without additional context.
Why this matters
Future maintainers will need to search for where SILENT is defined and understand its purpose. This creates cognitive overhead and potential maintenance issues.
Suggested change ∙ Feature Preview
# Define a constant at module level
SILENT_COMMAND_FLAG = 'SILENT'
# Use in the code
if args.verbose or not getattr(args.func, SILENT_COMMAND_FLAG, False):
logger.print_title()Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| logger: logging.Logger, cmd: Union[str, list[str]] | ||
| ) -> "subprocess.CompletedProcess[Any]": | ||
| """Run a command and log the output, and raise if something goes wrong.""" | ||
| logger.debug(f"Running {cmd}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Command execution logged at DEBUG level 
Tell me more
What is the issue?
The log message for running a command uses the DEBUG level, which might be too low for important command executions.
Why this matters
Using DEBUG level for command execution logs may result in these important events being missed in production environments where DEBUG logs are typically disabled.
Suggested change ∙ Feature Preview
Change the log level to INFO for command execution:
logger.info(f"Running command: {cmd}")Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| if not isinstance(cmd, list): | ||
| cmd = cmd.split(" ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive string splitting for command parsing 
Tell me more
What is the issue?
String splitting on single space fails for commands with multiple consecutive spaces or complex arguments.
Why this matters
This naive splitting approach will create empty strings in the command list when there are multiple spaces, potentially causing subprocess execution failures or incorrect argument parsing.
Suggested change ∙ Feature Preview
Use shlex.split() instead of str.split(" ") to properly handle shell-like command parsing with quoted arguments and multiple spaces:
import shlex
if not isinstance(cmd, list):
cmd = shlex.split(cmd)Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| for project_path in project_paths: | ||
| try: | ||
| file.relative_to(project_path) | ||
| return project_path | ||
| except ValueError: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficient O(n*m) project path lookup 
Tell me more
What is the issue?
The file-in-project check performs O(n) linear search through all project paths for each file, resulting in O(n*m) complexity where n is files and m is projects.
Why this matters
With many files and projects, this nested loop creates quadratic time complexity that will significantly slow down filtering operations as the number of projects grows.
Suggested change ∙ Feature Preview
Pre-sort project paths by depth (deepest first) and use early termination, or consider using a trie-based structure for path prefix matching to reduce average case complexity.
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| block_outside: list[str] = [] | ||
|
|
||
| for path_or_arg in input_list: | ||
| arg_abs_path = Path(pwd / path_or_arg.strip()).resolve() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expensive path resolution per file 
Tell me more
What is the issue?
Path resolution with resolve() is called for every input file, which involves expensive filesystem operations including symlink resolution and path canonicalization.
Why this matters
The resolve() method performs multiple filesystem syscalls per file, creating significant I/O overhead that scales linearly with the number of input files and can become a bottleneck for large file sets.
Suggested change ∙ Feature Preview
Cache resolved paths or use absolute path construction without full resolution when symlink handling isn't critical:
arg_abs_path = (pwd / path_or_arg.strip()).absolute()Only call resolve() when necessary for symlink handling.
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| help="Arguments to pass to the command", | ||
| ) | ||
|
|
||
| def __call__(self, args: argparse.Namespace) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixed Responsibilities in Entry Point 
Tell me more
What is the issue?
The call method mixes configuration, business logic, and output handling in a single method.
Why this matters
This violates the Single Responsibility Principle and makes the code less maintainable and harder to test individual components.
Suggested change ∙ Feature Preview
Split the call method into separate methods for configuration, filtering, and output handling:
def __call__(self, args: argparse.Namespace) -> None:
self._configure_logging(args)
filtered_args = self._process_filtering(args)
self._handle_output(args, filtered_args)Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| block_inside: list[str] = [] | ||
| block_outside: list[str] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear list variable names 
Tell me more
What is the issue?
The variable names 'block_inside' and 'block_outside' are not immediately clear about what they represent in the context of file filtering.
Why this matters
Unclear variable names force readers to trace through the code to understand their purpose, increasing cognitive load.
Suggested change ∙ Feature Preview
files_inside_projects: list[str] = []
files_outside_projects: list[str] = []Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| def _filter_files( | ||
| self, pwd: Path, topdir: Path, project_paths: set[Path], input_list: list[str] | ||
| ) -> tuple[list[str], list[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-descriptive tuple return type 
Tell me more
What is the issue?
The return type annotation using tuple[list[str], list[str]] is not descriptive enough to understand what the two lists represent.
Why this matters
Generic tuple return types make it difficult to understand the meaning of each component without looking at the implementation.
Suggested change ∙ Feature Preview
from typing import NamedTuple
class FilterResult(NamedTuple):
files_inside_projects: list[str]
files_outside_projects: list[str]
def _filter_files(
self, pwd: Path, topdir: Path, project_paths: set[Path], input_list: list[str]
) -> FilterResult:Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
2dbdf3c to
b97688f
Compare
Support for pre-commit hooks
Fixes #19
Description by Korbit AI
What change is being made?
Add a basic dfetch filter command that can list or pass through files to a command, integrate it into the CLI, and update supporting utilities, logging, and tooling configuration (pre-commit hooks, changelog, docs).
Why are these changes being made?
To provide a first-class file-filtering capability that can operate on manifest-scoped projects or stdin/args, and to wire it into the existing CLI and supporting utilities for robust usage and testing. This PR also updates tooling integration and documentation to reflect the new feature.