-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFE] block matching in ugrep #369
Comments
This works out-of-the-box with ugrep with lazy quantifiers and Given a $ ug 'BEGIN.*\n(.*\n)*?.*END' FILE No new features are necessary. But perhaps the syntax is a bit off-putting, although it is short and only requires placing the pattern To match a $ ug 'BEGIN(.*\n)*?.*END' FILE |
For example, to block-match C++ % ug '/\*(.*\n)*?.*\*+\/' file.cpp |
Thanks @genivia-inc . Just an additional question at this point. What if I wanted to invert the match and output only the blocks that don't match a specific pattern but do match the BEGIN and END criteria? |
Option To include begin and end lines, it will be necessary to use option
$ ug -P -e '\A(.*\n)*?.*BEGIN' -e 'END.*\n(.*\n)*?.*BEGIN' -e 'END(.*\n)*\Z' FILE where the third pattern doesn't need to use a lazy quantifier since it matches greedily to the end of the file. Because of this lazy/non-lazy "conflict" at a zero-width anchor $ ug -P -e '\A(.*\n)*?.*BEGIN' -e 'END.*\n(.*\n)*?(.*BEGIN|\Z)' FILE All of this assumes that |
I've updated the previous post, because anchors I'll take a look at the anchors with POSIX lazy quantifiers (without option |
OK. I've made a few minor changes to permit the use of anchors such as Will release as update 5.1.1 soon. |
Fixed. It appeared to be a minor problem with the DFA construction algorithm that was updated, causing anchors combined with lazy quantifiers to become too greedy. I have a set of unit tests. But tests are not 100% covering combinations like this one. |
I'm adding this issue to the Discussions tab, so folks who have the same question can find an answer (that is not closed). |
Is it possible to print a whole paragraph (text separated by empty lines) if paragraph contains some match? For example I get 3 matches with
(Added I heard that some old grep implementations had |
Never heard of the Use $ ug -o '.*PATTERN(.|\n)*?\n\n' file.txt |
Thanks for the answer! I just remembered this |
Well, starting with
Also, this |
Right, I had mentioned option $ ug -o '(\n.+)*PATTERN(.|\n)*?\n\n' file.txt Other patterns are possible, but some may cause backtracking that impacts performance. Starting with an |
To avoid the extra line without $ ug '(\n.+)*.*PATTERN(.|\n)*?\n(?=\n)' file.txt |
That's non-intuitive. Also, this approach (as in the example) misses first and last paragraphs (because there could be no Just thinking: maybe it would be cool to have additional pattern like |
It is important to consider the fact that ugrep like other grep is not a tool like awk that separates input into records and records can be separated into fields. Rather, ugrep is a (multi)line-oriented search tool. It does this efficiently in a streaming way, i.e. by not storing an entire file in memory. That is critical when searching recursively to not bog down a machine when files are large. In fact, ugrep has no limit on file size to search. |
No, it is not. Let me explain why. It is how the default matching works, i.e. anything that matches on a line results in outputting that entire line (unless option For example, say the input is:
then matching You're probably thinking of |
I meant "non-intuitive" that This is slightly different from ripgrep:
But when prepending
Judging from this, for ripgrep, it seems, that I am not criticizing your approach, though, it's just curious observation. |
Hello hello.
I've been wondering whether it could be feasible to implement a block-level or paragraph match mode for ugrep.
There are a few implementations in the wild doing just that, but they usually are dedicated tools.
By block-level I mean a tool which would match on a block of contiguous lines defined by regex patterns matching one a start line and another an end line of the block.
It would be interesting in order to allow, for instance, to match and return only matching blocks or only non-matching blocks against a given pattern.
Thanks in advance.
The text was updated successfully, but these errors were encountered: