Skip to content

117 accept a file with grep#124

Merged
noamteyssier merged 3 commits intodevelopmentfrom
117-accept-a-file-with-grep
Nov 6, 2025
Merged

117 accept a file with grep#124
noamteyssier merged 3 commits intodevelopmentfrom
117-accept-a-file-with-grep

Conversation

@noamteyssier
Copy link
Copy Markdown
Collaborator

No description provided.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @noamteyssier, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the bqtools grep command by enabling the specification of search patterns from external files. This feature greatly improves the tool's utility for complex filtering tasks involving numerous patterns, offering more flexibility and scalability. The changes involve new command-line arguments, robust internal pattern parsing, and updated documentation to guide users.

Highlights

  • File-based Pattern Input: The bqtools grep command now supports reading search patterns from external files using new --file, --sfile, and --xfile CLI options.
  • Flexible Pattern Specification: Users can provide patterns specifically for primary sequences (--sfile), extended sequences (--xfile), or a combination of both (--file), allowing for more organized and extensive pattern sets.
  • Forced OR Logic for File Patterns: When patterns are supplied via files, the matching logic automatically defaults to OR, meaning a record will match if any pattern in the file is found.
  • Enhanced Pattern Handling: The internal pattern processing logic has been updated to seamlessly integrate file-based patterns with existing CLI patterns, fuzzy matching, and pattern counting features.
  • Documentation Update: The README.md has been updated with clear examples demonstrating the usage of the new file-based pattern input features for bqtools grep.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a useful feature to grep, allowing patterns to be read from a file. The implementation involves significant refactoring of the argument parsing and pattern matching logic to accommodate both CLI and file-based patterns, which is handled well. My review focuses on improving robustness and maintainability. I've pointed out a potential panic from invalid CLI regex patterns, a case where invalid patterns from a file are silently ignored, and opportunities to reduce code duplication in the matcher implementations. Addressing these points will make the new functionality more reliable and the codebase easier to maintain.

Comment on lines +241 to +247
for line in contents.lines() {
if let Ok(regex) = regex::bytes::Regex::new(&line) {
regexes.push(regex);
} else {
eprintln!("Invalid regex pattern: {}", line);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Invalid regular expressions from a pattern file are currently skipped, with an error message printed to stderr. This might be missed by the user, leading them to believe all patterns were used when some were silently ignored. It would be more robust to fail and return an error, forcing the user to fix the invalid pattern.

Also, regex::bytes::Regex::new(&line) should be regex::bytes::Regex::new(line) as line is already a &str.

        for line in contents.lines().filter(|l| !l.is_empty()) {
            match regex::bytes::Regex::new(line) {
                Ok(regex) => regexes.push(regex),
                Err(e) => bail!("Invalid regex pattern in file: '{}'. Error: {}", line, e),
            }
        }

Comment on lines +50 to 82
fn match_primary(
&mut self,
sequence: &[u8],
matches: &mut MatchRanges,
and_logic: bool,
) -> bool {
if self.pat1.is_empty() {
return true;
}
self.pat1.iter().all(|pat| {
find_and_insert_matches(
pat,
sequence,
matches,
&mut self.searcher,
self.k,
self.inexact,
)
})
if and_logic {
self.pat1.iter().all(|pat| {
find_and_insert_matches(
pat,
sequence,
matches,
&mut self.searcher,
self.k,
self.inexact,
)
})
} else {
self.pat1.iter().any(|pat| {
find_and_insert_matches(
pat,
sequence,
matches,
&mut self.searcher,
self.k,
self.inexact,
)
})
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's significant code duplication between match_primary and match_secondary. The logic for iterating with all or any based on and_logic is identical. This could be extracted into a helper function to improve maintainability and reduce redundancy. The same pattern of duplication appears in match_either and also in regex_matcher.rs.

@noamteyssier noamteyssier merged commit f7d8ee7 into development Nov 6, 2025
6 checks passed
@noamteyssier noamteyssier deleted the 117-accept-a-file-with-grep branch November 6, 2025 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant