New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support searching across multiple lines #176

Closed
isobit opened this Issue Oct 13, 2016 · 95 comments

Comments

Projects
None yet
@isobit

isobit commented Oct 13, 2016

Say for example I'm trying to find instances of click that reside in a listeners block, like so:

listeners: {
    foo: ...
    click: ....
}

According to the Rust regex docs, I should be able to do: rg '(?s)listeners.+click', but this doesn't seem to work. Does ripgrep not support multiline regex?

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Oct 13, 2016

Does ripgrep not support multiline regex?

Correct. Not even the s flag will help, because ripgrep explicitly instructs the regex automaton to never match \n. Like grep, ripgrep is a line oriented search tool.

ripgrep can perform a search in two different ways. One of them reads a chunk of bytes at a time and searches it. The other memory maps the file and searches that all at once. The former has a number of advantages, including being faster when searching a large number of small files in parallel and being able to search streams in constant memory. The latter has the advantage of being faster for single files (sometimes) and much simpler to implement.

The former only works because search is line oriented. A multiline regex can technically match, say, 2GB of data, which is completely incompatible with searching small chunks at a time.

The latter could be made to work with multiline search, but memory maps can't search stdin for example. So a multiline search on stdin would have to block and read all of stdin into memory before searching. (There exists a way around even this, but it requires changing the regex engine to be capable of incremental search, which is an even bigger change, but theoretically possible.)

multiline searching therefore comes with significant implementation complexity, and IMO is a pretty niche use case. I can also imagine it having a pretty big impact on the printing code. This fact alone is a good reason why it may never be in ripgrep proper, but perhaps once #162 is done, others can take a crack at it.

This is a good example of a feature that The Silver Searcher has that ripgrep may either never have or won't have for a long time.

@BurntSushi BurntSushi closed this Oct 13, 2016

@isobit

This comment has been minimized.

isobit commented Oct 13, 2016

Gotcha, thanks for the explanation. I really like ripgrep as a tool, just was hoping to use it for this case too 😉 .

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Oct 13, 2016

@joshglendenning Yeah, I admit, it would be nice, and if it were easy, I'd have no problems with it. While I do consider it niche, I have no doubts that it would be quite useful!

Once I split out most of the pieces of ripgrep to library form, perhaps there will be interest in building other tools for more niche use cases! I will keep this case in mind as I do that though.

@maxbrunsfeld

This comment has been minimized.

maxbrunsfeld commented Jan 9, 2017

This is a really cool tool, but I might suggest including this as a caveat in the README, alongside the comparisons to ag, since ag does support multi-line patterns.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Jan 10, 2017

@maxbrunsfeld I've been meaning to add an "anti pitch" section to the README like the one in my blog post. That's now done. Thanks for the reminder!

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Mar 17, 2017

I'm going to re-open this, because it's one of the most highly requested features.

Nothing has changed about the problems I outlined above. However, multiline search needn't be the default. If we provide it as a flag, then we can do what we need to do to support multiline search only when that flag is provided. The critical thing that multiline search needs is a complete sequence of bytes in memory to search. Memory maps can provide this, but failing that, we would need to read the entire file into memory before starting a search.

Other than using heap space proportional to the file being searched, the fundamental issue with this flag is when it's used in conjunction with searching stdin. Namely, ripgrep will need to block until EOF is read on stdin before a search can even start. Alternatively, multiline search simply wouldn't be allowed on stdin. The silver searcher will in fact do this silently when searching stdin:

/* TODO: this will only match single lines. multi-line regexes silently don't match */
void search_stream(FILE *stream, const char *path) {
    // ...
}

I don't like the "silent" idea, but stopping ripgrep with an error is certainly something I'd be open to. Neither seem like good choices to me, but I don't think it should block this feature altogether.

N.B. This is a significant feature and it would have to be part of the libripgrep effort.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Mar 17, 2017

The other thing I forgot to mention is that multiline search will negate inner literal optimizations. Normal prefix and, in special cases, suffix, literal optimizations will still be performed as part of the regex engine. (I've long thought about making inner literal optimizations work on arbitrary strings, but it's hard.)

@BurntSushi BurntSushi added this to the libripgrep milestone Mar 17, 2017

@gulshan

This comment has been minimized.

gulshan commented Mar 17, 2017

A naive question/suggestion. Assuming single lines are being loaded for search now, can that be changed to n lines, n set to 10 or 20 or something like that? While a line gets in, another gets out of the load in FIFO fashion? This will not be technically correct for all cases, but may be enough for most cases.

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 17, 2017

How significant are the trade-offs to the user experience?
If doing multiline is more expensive, I'm fine with that as long as single line performance is not impacted.

Would you actually need a special flag to ripgrep? or can you reliably determine from the expression itself?

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Mar 17, 2017

Great questions! Keep'em coming.

Assuming single lines are being loaded for search now

They are not. If they were, ripgrep would be very slow. The reasons for this are a bit subtle, but basically, "it's faster to search a huge chunk than it is to break it into little pieces and then search each piece." "Huge chunk" in this case might be the size of some internal buffer, perhaps, 8KB.

If you're curious about how a fast grep tool works in more detail, check out this section in my blog post on ripgrep: http://blog.burntsushi.net/ripgrep/#anatomy-of-a-grep

can that be changed to n lines, n set to 10 or 20 or something like that? While a line gets in, another gets out of the load in FIFO fashion? This will not be technically correct for all cases, but may be enough for most cases.

If you have a regex like a\s+b, then it's not possible to determine the length of the match up front. You have three choices:

  1. You use a regex engine that supports incremental search. (This is somewhat at odds with performance if "incremental" means "byte at a time." So for something like this, you'd need an incremental engine that can process chunks at a time.) ripgrep's regex engine doesn't support this.
  2. You feed the regex engine every byte you got. (The Plan.)
  3. You arbitrarily cap the size of the match. This will invariably get things wrong and there's no way to escape.

I still actually strongly believe that multiline search is a very niche feature, but it is one that can be quite useful when the situation calls for it. (A text editor is perhaps one such situation, but ripgrep is first and foremost a command line tool where multiline search feels a lot less common.) Therefore, taking approach (3) doesn't seem worth it. In the common case, memory maps will work just fine and your OS will manage the memory for you. It's only the corner cases that are sub-optimal: when memory maps can't be used (e.g., on virtual files or stdin).

How significant are the trade-offs to the user experience? If doing multiline is more expensive, I'm fine with that as long as single line performance is not impacted.

If --multiline is behind a flag, then I'm pretty confident that the standard UX of ripgrep won't be impacted. Including performance.

Would you actually need a special flag to ripgrep? or can you reliably determine from the expression itself?

A flag is 100% necessary. A regex like a\s+b shouldn't match across multiple lines by default, because that's what we've all come to expect from line oriented searchers. But it is totally plausible that you might want it to. That's when you'd pass a flag.

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 17, 2017

I still actually strongly believe that multiline search is a very niche feature, but it is one that can be quite useful when the situation calls for it.

I would agree use is actually niche, but desire to use is not.

  1. It is a bit non intuitive how to properly write a multiline expression. Especially if the engine doesn't support the . dotAll matching and even worse if you want to constrain to a range like next N lines.
  2. Due to 1, many use incomplete results although not always knowingly. Most coding languages can have line breaks almost anywhere.

I would say if you are searching for 2 terms and completeness is important then using multiline would often be your default. However, writing an expression to find termA followed by termB within 5 or less lines is likely not something that rolls off of the fingertips of someone who occasionally uses regular expressions although I think many would find it useful and use such expressions if more intuitive to write.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Mar 17, 2017

@dakaraphi Good points. I'd like to use your comment to constrain this feature, namely, that multiline search is the ability to apply a regex whose matches may span an arbitrary number of lines.

With that said:

  1. It would be plausible to make . match \n by default if multiline mode is enabled.
  2. The use case of "where do A and B co-occur within N lines of each other" is definitely something I agree can be useful. It's possible to some extent to do this with a regex, e.g., A([^\n]*\n){0,5}[^\n]*B|B([^\n]*\n){0,5}[^\n]*A, but that is a little painful. Extending this to three terms would probably be horrifying.

I think (2) is something that's enabled by multiline search, although, today, you can do something similar with contexts: rg B -C5 | rg A -C5 for example works to some extent. Regardless, it might be wiser to categorize this into a separate feature whose UX can be more thoughtfully designed. Others have requested similarish things, as in #346 and #360. sift is a tool that has support for this kind of matching, so we may be able to crib ideas from them.

With all that said, we must be careful not to get too far away from what ripgrep is supposed to be good at doing: searching lines. :-) I say this because there has to be a point at which "write code for your specialized search" becomes a valid thing to say. The key is figuring out where that point is.

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 17, 2017

multiline search is the ability to apply a regex whose matches may span an arbitrary number of lines

Just to make sure I understand the intention, could you state that as what you see ripgrep would not do that possibly other regex engines do when searching multiline?

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Mar 17, 2017

@dakaraphi Sorry, the intention of me saying that was to push UX concerns like "how do I find co-occurring terms, A and B, within a fixed number of lines" out of multiline support. i.e., I don't think that particular UX should be addressed as part of standard multiline support, but should instead be considered as a separate feature (that may or may not happen). :-)

I don't think there's anything ripgrep would do differently in terms of UX with respect to the silver searcher, other than 1) not doing it by default and 2) probably not doing silent things.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Mar 17, 2017

Are there are other tools that support multiline search other than the silver searcher?

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 17, 2017

I'm not sure about command line tools. Prior to using VS Code I was using Brackets which supported multiline file search. I believe other editors like Sublime, Notepad++ etc also support multiline.

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 17, 2017

I don't think that particular UX should be addressed as part of standard multiline support, but should instead be considered as a separate feature (that may or may not happen). :-)

ok right. Yes I'm not sure if that really should be part of something like ripgrep or not. For example, I've been thinking about maybe writing some extension for VS Code like a regex helper or such that would take something like common patterns or templates and you just plugin the values for such use cases and it would generate the regex.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Mar 17, 2017

@dakaraphi Great! I think we're on the same page now. :-) Thanks for poking!

paldepind added a commit to paldepind/ripgrep that referenced this issue Mar 23, 2017

Remove statement about never supporting multiline search
After [this comment](BurntSushi#176 (comment)) it seems like the statement about never supporting multiline search should be removed.

@dakaraphi dakaraphi referenced this issue Apr 29, 2017

Closed

Support multi-line search for Global search #13155

0 of 3 tasks complete
@rshpeley

This comment has been minimized.

rshpeley commented Apr 30, 2017

@dakaraphi directed me here from Microsoft/vscode #13155

It looks like one of the most common requests for searching across multiple lines is related to text editors. At the moment, my needs are very simple. If I can get a match across multiple files in a project for a multiline selection -- even if it's fully literal -- I could work with it. For most text editors, the menu option to search across multiple lines is separate than a simple search, and so a ripgrep flag, as @BurntSushi suggested, would naturally fit this use case.

I'm still making it through @BurntSushi's anatomy of a grep link, but it appears to me that a multiline search for text editors mostly requires a literal search with some multiple literals (white space, line endings) and therefore the search won't even make it to the regex engine for these cases.

Isn't the multiple line selection just a contiguous sequence of bytes (in the fully literal case) to be matched in a buffer? Or am I missing something related to optimisation here?

I'm sure people will come up with cases where a regex in a multiline search/replace would be mighty handy, but I think support for the simpler multiple literal multiline case would be a good start to give some text editors (such as vscode and atom) missing functionality.

btw, a most excellent ripgrep article @BurntSushi!

@priyadarshan

This comment has been minimized.

priyadarshan commented Apr 30, 2017

Multi-line searching would be a boon to many. See for example this use case.

@thijsvandien

This comment has been minimized.

thijsvandien commented Aug 8, 2018

Being the one who suggested -X, I am happy to accept -U as the outcome.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Aug 16, 2018

All righty, here is what I have for docs for multiline mode. Do folks mind giving them a quick skim to make sure I haven't missed anything? Are there any obvious unanswered questions that the docs could cover?

-U, --multiline
    Enable matching across multiple lines.

    When multiline mode is enabled, ripgrep will lift the restriction that a match
    cannot include a line terminator. For example, when multiline mode is not
    codepoint other than \n. Similarly, the regex \n is explicitly forbidden, and if
    you try to use it, ripgrep will return an error. However, when multiline and
    regexes like \n are permitted.

    An important caveat here is that multiline mode does not change the match
    semantics of .. Namely, in most regex matchers, a .  will by default match any
    character other than \n, and this is true in ripgrep as well. In order to make .
    match \n, you must enable the "dot all" flag inside the regex. For example, both
    (?s).  and (?s:.)  have the same semantics, where .  will match any character,
    including \n. Alternatively, the --multiline-dotall flag may be passed to make
    the "dot all" behavior the default. This flag only applies when mulitline search
    is enabled.

    There is no limit on the number of the lines that a single match can span.

    WARNING: Because of how the underlying regex engine works, multiline searches may
    be slower than normal line oriented searches, and they may also use more memory.
    In particular, when multiline mode is enabled, ripgrep requires that each file it
    searches appear as if it exists contiguously in memory (either by reading it on
    to the heap or memory mapping it). Things that cannot be memory mapped (such as
    stdin) will be consumed until EOF before searching can begin. In general, ripgrep
    will only do these things when necessary. That is, even if you use the
    --multiline flag but your regex cannot match over multiple lines, then ripgrep
    won’t consume unnecessary resources. Nevertheless, if you only care about matches
    spanning at most one line, then it is always better to disable multiline mode.

    This flag can be disabled with --no-multiline.

--multiline-dotall
    This flag causes .  to match new lines when multiline searching is enabled. This
    flag has no effect if multiline searching isn’t enabled.

    Normally, a .  will match any character except for newlines. While this behavior
    typically isn’t relevant for line oriented matching (since matches can span at
    most one line), this can be useful when searching with the -U/--multiline flag.
@roblourens

This comment has been minimized.

roblourens commented Aug 16, 2018

For example, when multiline mode is not
codepoint other than \n.

...

However, when multiline and
regexes like \n are permitted.

Are those sentences missing a word or something?

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Aug 16, 2018

Weird. Looks like I botched the copy & paste! Let's try again. Here's -U/--multiline:

Enable matching across multiple lines.

When multiline mode is enabled, ripgrep will lift the restriction that a match
cannot include a line terminator. For example, when multiline mode is not
enabled (the default), then the regex '\\p{any}' will match any Unicode
codepoint other than '\\n'. Similarly, the regex '\\n' is explicitly forbidden,
and if you try to use it, ripgrep will return an error. However, when multiline
mode is enabled, '\\p{any}' will match any Unicode codepoint including '\\n'
and regexes like '\\n' are permitted.

An important caveat here is that multiline mode does not change the match
semantics of '.'. Namely, in most regex matchers, a '.' will by default match
any character other than '\\n', and this is true in ripgrep as well. In order
to make '.' match '\\n', you must enable the \"dot all\" flag inside the regex.
For example, both '(?s).' and '(?s:.)' have the same semantics, where '.' will
match any character, including '\\n'. Alternatively, the '--multiline-dotall'
flag may be passed to make the \"dot all\" behavior the default. This flag only
applies when mulitline search is enabled.

There is no limit on the number of the lines that a single match can span.

**WARNING**: Because of how the underlying regex engine works, multiline
searches may be slower than normal line oriented searches, and they may also
use more memory. In particular, when multiline mode is enabled, ripgrep
requires that each file it searches appear as if it exists contiguously in
memory (either by reading it on to the heap or by memory mapping it). Things
that cannot be memory mapped (such as stdin) will be consumed until EOF before
searching can begin. In general, ripgrep will only do these things when
necessary. That is, even if you use the --multiline flag but your regex cannot
match over multiple lines, then ripgrep won't consume unnecessary resources.
Nevertheless, if you only care about matches spanning at most one line, then it
is always better to disable multiline mode.

This flag can be disabled with --no-multiline.

And --multiline-dotall:

This flag causes '.' to match new lines when multiline searching is enabled.

This flag has no effect if multiline searching isn't enabled.

Normally, a '.' will match any character except for newlines. While this
behavior typically isn't relevant for line oriented matching (since matches
can span at most one line), this can be useful when searching with the
-U/--multiline flag.
@waldyrious

This comment has been minimized.

waldyrious commented Aug 16, 2018

As an occasional regex user, I think this is pretty clear. Some minor stylistic suggestions:

cannot include a line terminator. For example, when multiline mode is not

Instead of "For example", I'd use "In particular", "namely", "specifically", or some other equivalent expression.

However, when multiline
mode is enabled, '\p{any}' will match any Unicode codepoint including '\n'
and regexes like '\n' are permitted.

I'd use commas around "including '\n'", which IMO makes the sentence structure (and intended reading flow) slightly more explicit.

An important caveat here

I don't think the "here" is necessary. Or in other words, IMO it is not specific enough to be useful.

applies when mulitline search is enabled.

Typo: "mulitline" --> "multiline".

slower than normal line oriented searches

I'd hyphenate "line oriented searches" --> "line-oriented searches".

either by reading it on to the heap

"onto"?

or by memory mapping it). Things that cannot be memory mapped

Suggestion: "memory-mapping" and "memory-mapped".

even if you use the --multiline flag but your regex cannot
match over multiple lines, then ripgrep won't consume unnecessary resources.

I'm not sure this sentence's structure conveys the intended message clearly. Do you think you could rephrase it somehow? I think what's confusing me is the "even if" / "but" / "then" structure.

any character except for newlines.

"any character except newlines." -- simpler and has the same meaning.

line oriented matching

"line-oriented matching", as suggested for similar expressions above.

this can be useful when searching with the
-U/--multiline flag.

Just for completeness, I'd add a note at the end explicitly mentioning that multiline mode by default assumes a false dotall flag.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Aug 16, 2018

@waldyrious Awesome! I think I took all of your suggestions except for the first. Here's paragraph containing the portion you requested to have rewritten. What do you think?

**WARNING**: Because of how the underlying regex engine works, multiline
searches may be slower than normal line-oriented searches, and they may also
use more memory. In particular, when multiline mode is enabled, ripgrep
requires that each file it searches appear as if it exists contiguously in
memory (either by reading it onto the heap or by memory-mapping it). Things
that cannot be memory-mapped (such as stdin) will be consumed until EOF before
searching can begin. In general, ripgrep will only do these things when
necessary. Specifically, if the --multiline flag is provided by the regex
cannot match over multiple lines, then ripgrep won't read each file into memory
before searching it. Nevertheless, if you only care about matches spanning at
most one line, then it is always better to disable multiline mode.
@thijsvandien

This comment has been minimized.

thijsvandien commented Aug 16, 2018

Sorry to bring this topic up again after it was closed. This is the first time I hear about --multiline-dotall. If we’re going to have that “stronger version”, wouldn’t it make sense to use a short switch that is available both in lower case (--multiline) and upper case (--multiline-dotall), like -z and -Z? That would be a good reason to have a different switch than most other tools, because none offer both options, as far as I’m aware.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Aug 16, 2018

@thijsvandien Nah. The --multiline-dotall flag is probably intended to be something that goes in your config file (or an alias) as something that's always set, if those are the semantics you prefer by default.

@thijsvandien

This comment has been minimized.

thijsvandien commented Aug 17, 2018

Ah, I see now that it is meant as a modifier rather than an alternative with different semantics.

@mateon1

This comment has been minimized.

mateon1 commented Aug 17, 2018

Some nits:

This flag causes '.' to match new lines

Should be newlines for consistency reasons

requires that each file it searches appear as if it exists contiguously in

appears, but perhaps this section could be worded differently.
Maybe: ... ripgrep requires that the searched file is laid out/mapped/allocated contiguously in memory
I'm unsure which wording is the best (I prefer laid out, but maybe that's not appropriate for documentation), but all three sound better to me than the existing version.

Specifically, if the --multiline flag is provided by the regex
cannot match over multiple lines

s/by/but/

@waldyrious

This comment has been minimized.

waldyrious commented Aug 17, 2018

@BurntSushi I'm glad you agree with the suggestions! The reworded sentence is indeed much clearer, after fixing the typo pointed out by @mateon1.

Here's the diff of that sentence, for future reference/convenience:

-That is, even if you use the --multiline flag but your regex cannot
-match over multiple lines, then ripgrep won't consume unnecessary resources.
+Specifically, if the --multiline flag is provided but the regex
+cannot match over multiple lines, then ripgrep won't read each file into memory
+before searching it.

Now that I re-read that, I'm not sure "cannot match" is the best choice of words, since it can imply both a neutral statement or an imperative enforcement. (Not sure I'm being clear myself; let me know if I should rephrase!)

I suppose you're referring to the case where the regex does not contain any patterns that would match newlines, or it contains . without the dotall flag being activated. Is that correct?

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Aug 17, 2018

@mateon1 Thanks! I took your advice, and chose "laid out."

@waldyrious

I suppose you're referring to the case where the regex does not contain any patterns that would match newlines, or it contains . without the dotall flag being activated. Is that correct?

Yes. Whether dotall is enabled or not is mostly orthogonal; what matters is whether a \n exists in any of the possible matches of a regex. Enabling dotall and uttering . is one way to achieve that, but a literal \n, \s, \p{any} and so on also achieve that.

It is possible I should just remove this part of the docs. I'm not sure. I put it there as a way of saying that even if you enable multiline mode but don't make use it, you generally won't pay (much) for it. But maybe that's not that important.

@waldyrious

This comment has been minimized.

waldyrious commented Aug 17, 2018

I think it wouldn't be a problem if it were removed, but it is useful information so I'd have a slight preference to keep it.

IMO changing that sentence to something like this:

"Specifically, if the --multiline flag is provided, but the regex cannot match over multiple lines does not contain patterns that would match \n characters, then ripgrep won't read will automatically avoid reading each file into memory before searching it."

...would make it sufficiently unambiguous.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Aug 17, 2018

@waldyrious I like it. Much better. Thanks! :)

BurntSushi added a commit that referenced this issue Aug 19, 2018

changelog: massive update for libripgrep
This commit updates the CHANGELOG to reflect all the work done to make
libripgrep a reality.

* Closes #162 (libripgrep)
* Closes #176 (multiline search)
* Closes #188 (opt-in PCRE2 support)
* Closes #244 (JSON output)
* Closes #416 (Windows CRLF support)
* Closes #917 (trim prefix whitespace)
* Closes #993 (add --null-data flag)
* Closes #997 (--passthru works with --replace)

* Fixes #2 (memory maps and context handling work)
* Fixes #200 (ripgrep stops when pipe is closed)
* Fixes #389 (more intuitive `-w/--word-regexp`)
* Fixes #643 (detection of stdin on Windows is better)
* Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird)
* Fixes #764 (coalesce color escapes)
* Fixes #922 (memory maps failing is no big deal)
* Fixes #937 (color escapes no longer used for empty matches)
* Fixes #940 (--passthru does not impact exit status)
* Fixes #1013 (show runtime CPU features in --version output)

BurntSushi added a commit that referenced this issue Aug 19, 2018

changelog: massive update for libripgrep
This commit updates the CHANGELOG to reflect all the work done to make
libripgrep a reality.

* Closes #162 (libripgrep)
* Closes #176 (multiline search)
* Closes #188 (opt-in PCRE2 support)
* Closes #244 (JSON output)
* Closes #416 (Windows CRLF support)
* Closes #917 (trim prefix whitespace)
* Closes #993 (add --null-data flag)
* Closes #997 (--passthru works with --replace)

* Fixes #2 (memory maps and context handling work)
* Fixes #200 (ripgrep stops when pipe is closed)
* Fixes #389 (more intuitive `-w/--word-regexp`)
* Fixes #643 (detection of stdin on Windows is better)
* Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird)
* Fixes #764 (coalesce color escapes)
* Fixes #922 (memory maps failing is no big deal)
* Fixes #937 (color escapes no longer used for empty matches)
* Fixes #940 (--passthru does not impact exit status)
* Fixes #1013 (show runtime CPU features in --version output)

BurntSushi added a commit that referenced this issue Aug 20, 2018

changelog: massive update for libripgrep
This commit updates the CHANGELOG to reflect all the work done to make
libripgrep a reality.

* Closes #162 (libripgrep)
* Closes #176 (multiline search)
* Closes #188 (opt-in PCRE2 support)
* Closes #244 (JSON output)
* Closes #416 (Windows CRLF support)
* Closes #917 (trim prefix whitespace)
* Closes #993 (add --null-data flag)
* Closes #997 (--passthru works with --replace)

* Fixes #2 (memory maps and context handling work)
* Fixes #200 (ripgrep stops when pipe is closed)
* Fixes #389 (more intuitive `-w/--word-regexp`)
* Fixes #643 (detection of stdin on Windows is better)
* Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird)
* Fixes #764 (coalesce color escapes)
* Fixes #922 (memory maps failing is no big deal)
* Fixes #937 (color escapes no longer used for empty matches)
* Fixes #940 (--passthru does not impact exit status)
* Fixes #1013 (show runtime CPU features in --version output)
@myfairsyer

This comment has been minimized.

myfairsyer commented Aug 23, 2018

Will \n only match \n / 0x0A or any common single line break (\r?\n) (or if you take the classic MacOS and BBC into account ((\n\r?)|(\r\n?)))

(I do know that both styles exist among regex engines but couldn't tell which is which)

Sry if there is an answer to that somewhere.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Aug 23, 2018

\n only matches \n.

Current master has a --crlf option that causes $ to match \r\n line breaks in addition to \n.

I'm not aware of any regex engines that permit a literal \n to match \r\n. Some regex engines certainly allow for a looser definition of what "line terminator" actually means when necessary, e.g., when matching the ^ or $ anchors. If you know of a regex engine that permits a literal \n to match \r\n then I'd like to have a link to that so I can investigate!

@roblourens

This comment has been minimized.

roblourens commented Aug 23, 2018

VS Code matches \r\n on \n when ctrl+f searching in a single file, it's useful in an editor but I wouldn't use that as inspiration for ripgrep.

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Aug 23, 2018

@roblourens

This comment has been minimized.

roblourens commented Aug 23, 2018

No, it's just something vscode does.

@myfairsyer

This comment has been minimized.

myfairsyer commented Aug 24, 2018

If you know of a regex engine that permits a literal \n to match \r\n then I'd like to have a link to that so I can investigate!

@BurntSushi Most probably I only encountered it in text editors like VSCode.

it's useful in an editor but I wouldn't use that as inspiration for ripgrep.

@roblourens Would you mind to elaborate?
And does that mean that VSCode will behave differently inside an editor and when searching across files?

@roblourens

This comment has been minimized.

roblourens commented Aug 24, 2018

Personally I don't prefer "magic" like that, but yeah I'll have to see whether we can rewrite \n to \r?\n so that search across files works the same as search inside files.

@myfairsyer

This comment has been minimized.

myfairsyer commented Aug 24, 2018

it's useful in an editor but I wouldn't use that as inspiration for ripgrep.

@roblourens Would you mind to elaborate?

Personally I don't prefer "magic" like that

@roblourens
I was rather driving at the distinction between text editor and ripgrep.
I couldn't quite follow.
Is it b/c you consider ripgrep as a command line tool having a more advanced audience which demands more control and less magic than a graphical text editor?

@BurntSushi
I don't want to derail or hijack this therad for irrelevant discussions.
You said you'd like to know more and investigate and found VSCode's behavior interesting.
If you don't anymore tell me.

@wmww

This comment has been minimized.

wmww commented Oct 31, 2018

Currently, if you try to make a multiline search without the -U/--multiline option, ripgrep errors with the literal '"\n"' is not allowed in a regex. Would it make sense to mention the existence of a multiline enabling option here?

@BurntSushi

This comment has been minimized.

Owner

BurntSushi commented Oct 31, 2018

@wmww That should already be done on master. See: #1055

Also, please file new issues for new requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment