New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore using a native third-party search tool such as ripgrep or Silver Searcher #19983

Closed
dakaraphi opened this Issue Feb 5, 2017 · 50 comments

Comments

Projects
None yet
@dakaraphi

dakaraphi commented Feb 5, 2017

I am very impressed with the performance of the new parallel search; however, there is an opportunity to take search speed to the absolute limit by optionally allowing a user to configure ripgrep as the search provider.

Even with the new search speed, ripgrep is still an order of magnitude faster. Ripgrep is actually an order of magnitude faster than pretty much anything.

http://blog.burntsushi.net/ripgrep/

@roblourens

This comment has been minimized.

Member

roblourens commented Feb 5, 2017

I was going to look at The Silver Searcher this month, I had skipped ripgrep because it doesn't support multiline search. I know ripgrep can be faster though. Do you think there's enough difference between different search tools that it would be worth allowing people to hook up arbitrary ones to vscode?

@dakaraphi

This comment has been minimized.

dakaraphi commented Feb 6, 2017

Yes, I was just thinking that maybe there should be an API that allows extension authors to hookup whatever they want. However, I'm not sure there is anything that interesting beyond ripgrep and Silver Searcher. When I think of awesome search tools, those are what come to mind.

So, I just did some comparisons on my project. Searching for a particular term, I get these results:

  • ripgrep : 0.3 secs
  • ag : 0.8 secs
  • vscode : 6 secs. Note, this is actually a significant regression. I just created #19993
  • vscode(1.8.1): 2 secs.

If we can get back to 2 sec performance again, then there is less difference between vscode and ag, but ripgrep is still significantly faster. However, as you said, you do lose multiline searches. If vscode implements multiline searching, then I would be happy with doing multiline with vscode's builtin search and using ripgrep for most of my other searching as that would be most common.

@roblourens roblourens added this to the February 2017 milestone Feb 7, 2017

@roblourens roblourens changed the title from Offer option to configure ripgrep as vscode file search provider to Explore using a native third-party search tool such as ripgrep or Silver Searcher Feb 7, 2017

@roblourens

This comment has been minimized.

Member

roblourens commented Feb 7, 2017

Ripgrep would be perfect, but I really want multiline search support. The problem with Silver Searcher right now is that it can't handle ignoring ** patterns, which is problematic for supporting gitignore files and our ignore glob patterns. ggreer/the_silver_searcher#530

The ripgrep blog post you posted lists other tools, but I eliminate them on various other grounds, like lacking windows support or perf that apparently breaks down.

There are also some specialist tools I've found, like ICgrep or Hyperscan, that focus on advanced unicode or regex features.

Considering all this, we should either

  • use ripgrep, but never support multiline search
  • use ag, but get extra results from ag, and filter them with our glob magic afterwards
  • add an extension API, so people can use whatever tool they like

Still leaning towards ag even though working around the ** issue would be very annoying.

@dakaraphi

This comment has been minimized.

dakaraphi commented Feb 7, 2017

My thoughts are something like this in order of preference:

  1. use ripgrep. However, with these conditions
    1. We are able to restore the original search speed of 1.8.1
    2. There are plans to support multiline searching in the future for vscode builtin search
  2. use Silver Searcher: If both conditions above for ripgrep are not true, then for me it seems Silver Searcher would be best choice.
  3. Extension API: If for whatever reason we can't make a confident decision for 1 or 2
@roblourens

This comment has been minimized.

Member

roblourens commented Feb 7, 2017

For 1., if we were using ripgrep, it would replace vscode search entirely, so it would be much faster than 1.8.1, and there would be no multiline search.

@dakaraphi

This comment has been minimized.

dakaraphi commented Feb 7, 2017

Ahh, I didn't realize you were considering it as a replacement. So you would then bundle ripgrep or Silver Searcher as part of vscode?
If the internal search wouldn't be supported any longer, I suppose I would then prefer Silver Searcher.

@roblourens

This comment has been minimized.

Member

roblourens commented Feb 7, 2017

Yeah that's the idea, to have it drive the search viewlet behind the scenes. Possibly could also be involved in driving quick open.

@BurntSushi

This comment has been minimized.

BurntSushi commented Feb 19, 2017

(ripgrep author here.) What do you folks use multiline search for? I've long considered it something I'd be unlikely to add support for, but I've been known to bend if there's strong demand for it. Alternatively, maybe there's a compromise that can be reached.

Note that ** isn't the only thing that ag doesn't support in gitignore files. ripgrep's support for gitignore matching is pretty dang close to 100% and remains fast. e.g., If you have lots of gitignore files or a single giant one, then ag slows down quite a bit compared to ripgrep.

Are there other things you folks care about? What about Unicode support? Support for searching UTF-16 (planned, not actually available yet)?

@BurntSushi

This comment has been minimized.

BurntSushi commented Feb 19, 2017

I'm also in the process of moving a lot of code in ripgrep out into distinct distinct Rust libraries, which would give you a lot more control over how search operates. But, you'd need to build out a C FFI for it, which wouldn't be especially hard, but it wouldn't be something someone could bang out in a day either.

@dakaraphi

This comment has been minimized.

dakaraphi commented Feb 19, 2017

@BurntSushi here are some sample use cases of multiline search.

The most often is simply Code statements often don't always exist as single lines
someFunctionCall( arg1, arg2 )
Can be written like this

someFunctionCall(arg1,
arg2)`

  1. I often am interested in terms that appear near each other or in the same file. Questions like...
    1. Which classes make use of x
    2. Where do we query for type X using join with Y. Likely these terms will be near each other, but not on same line
  2. Where are empty try catch blocks where exceptions were not handled.
@BurntSushi

This comment has been minimized.

BurntSushi commented Feb 19, 2017

@dakaraphi If ripgrep asked you to use two distinct regexes, would that suffice? Or do you want to use one regex?

@dakaraphi

This comment has been minimized.

dakaraphi commented Feb 19, 2017

@BurntSushi If I follow what you imply, then that would only help answer if 2 different terms exist in the same file.
However, a sample regex might look like this where I want to find something that is near:
termA(.|\n){0,200}termB
or
termA(.*\n.*){0,3}termB

or example searching for xml tag with given id
<extension(.|\n)*?id="A"

@roblourens

This comment has been minimized.

Member

roblourens commented Feb 20, 2017

I'll do a writeup for this investigation next week, but for vscode's purposes, we're interested in multiline search, UTF-16 support, and also I like a search that returns results in sorted order by path, which ripgrep doesn't do right now.

@BurntSushi

This comment has been minimized.

BurntSushi commented Feb 20, 2017

I don't think any search tool with parallelism returns results in sorted order. ripgrep does have the --sort-files option which I think will do what you want, but it disables parallelism.

@roblourens

This comment has been minimized.

Member

roblourens commented Feb 20, 2017

Silver Searcher does actually - I see why it could be a perf hit to order the results though.

@BurntSushi

This comment has been minimized.

BurntSushi commented Feb 20, 2017

@roblourens

This comment has been minimized.

Member

roblourens commented Feb 22, 2017

You're right - looking at it more closely, SS tends to be closer to being in order, and often is in order when I run it in my vscode workspace, but not always.

@roblourens roblourens modified the milestones: Backlog, February 2017 Feb 22, 2017

@roblourens

This comment has been minimized.

Member

roblourens commented Feb 24, 2017

And I'm glad to hear that UTF-16 support is on the roadmap. Any idea in what timeframe you'd expect to look at it?

@BurntSushi

This comment has been minimized.

BurntSushi commented Feb 24, 2017

Any idea in what timeframe you'd expect to look at it?

No, sorry. "Within the next year" is probably the best I can do. Hopefully sooner.

@dakaraphi

This comment has been minimized.

dakaraphi commented Feb 24, 2017

Given there isn't an ideal match around features you wish to provide out of the box and uncertainty around the timing of the availability of those features, does it make sense to reevaluate the option of simply providing an API that extension authors can use to integrate external search providers?

It would probably be useful also if the API provided the ability to invoke VS Code's builtin search as a fallback if the extension is able to detect a type of regular expression might not be supported in the external tool, it can then pass it on to VS Code.

@roblourens

This comment has been minimized.

Member

roblourens commented Feb 24, 2017

@BurntSushi I could also fall back to VS Code's builtin search for UTF-16 files, but don't want to duplicate the file tree walking work that ripgrep does. An easy compromise would be if ripgrep prints a message each time it encounters a UTF-16 (or binary) file. I imagine this hidden behind an option but it would also be useful for CLI users who are missing matches because they don't realize a file is an unsupported encoding.

@dakaraphi Thought about it, but it seems like overkill. Creating an extension API is a lot of work and there are probably only a few search providers in the world that anyone would want to use. I want to focus on the out of box experience.

@dakaraphi

This comment has been minimized.

dakaraphi commented Feb 25, 2017

Using ripgrep as primary and VS Code builtin as fallback seems like a good solution.

I would actually be very happy with that compromise. Especially if VS Code could implement multiline search, then multiline regular expressions could just be passed on to VS Code. VS Code's builtin is fast enough that it wouldn't be a bad solution given that multiline searching will be less common.

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 17, 2017

Speed is awesome, but I will have to strongly disagree here. Not being able to do something entirely is fairly a significant disadvantage. I certainly can understand there is no desire to support another engine that you have to implement yourself.

I would suggest actually consider having a secondary external engine. Silver searcher or platinum searcher as a fallback. If Platinum searcher is easier to integrate, it doesn't have to be the fastest, but feature completeness could then be provided without having to support your own implementation.

@BurntSushi

This comment has been minimized.

BurntSushi commented Mar 17, 2017

@dakaraphi I don't think the platinum searcher has any functionality that ripgrep doesn't have at this point. It doesn't appear to have multiline search and its regex engine is FSM based like ripgrep's. The only real choice available to you if you want multiline search and PCRE is the silver searcher.

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 17, 2017

@BurntSushi ahh ok thanks. Yes, the most important feature for me would be multiline. That's a big one. I do use it somewhat often. PCRE would be nice, but I could live without it.

@BurntSushi

This comment has been minimized.

BurntSushi commented Mar 17, 2017

@dakaraphi I've thought about multiline search for a long time. You folks aren't the only ones who have requested it. I re-opened the issue on ripgrep's tracker and left some thoughts: BurntSushi/ripgrep#176 (comment)

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 17, 2017

@roblourens @BurntSushi Thanks for making this happen and bringing to VS Code in such a short time! Truly is a pleasure to use.

@lnicola

This comment has been minimized.

lnicola commented Mar 17, 2017

Does any of this apply to searches in a file that's being edited? That is, the "Find" command, not "Find in Files".

@roblourens

This comment has been minimized.

Member

roblourens commented Mar 17, 2017

No

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 17, 2017

So I have given some additional thought to feature gap of things like look arounds.
Typically additional regex features are about further constraining the results in some way.

I think some of the feature gap would be mitigated if the search results could be easily sent to a new editor. Then you could further search the results using the more feature rich in editor regex engine. Potentially it would also be useful to have a way to send results directly to an editor and have a much higher result limit cap.

There is already a request for this for other use cases. So this would just be an additional benefit.
See #17920

roblourens added a commit that referenced this issue Mar 17, 2017

roblourens added a commit that referenced this issue Mar 18, 2017

roblourens added a commit that referenced this issue Mar 20, 2017

@octref

This comment has been minimized.

Member

octref commented Mar 22, 2017

Was looking for how I can use ag to search faster in VSCode and found this issue.
Tried it out and search takes milliseconds in my fairly large web project. Huge improvement on my workflow.
Thanks @roblourens and @BurntSushi!

@Ethan-VisualVocal

This comment has been minimized.

Ethan-VisualVocal commented Mar 22, 2017

@dakaraphi This is a feature of Sublime Text that I miss in VSCode -- ST just automatically dumps everything into a special, searchable "Find Results" tab that also doesn't auto-clear between searches.

(Relying on this feels a bit like a crutch, like maybe I could have gotten ideal results if I'd composed my original search filters + regex better, but I end up using it often anyway because I want to keep my brain on the original task at hand.)

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 22, 2017

@Ethan-VisualVocal having the ability to dump the results to a document tab opens up some very useful possibilities; however, I currently strongly prefer VSCode's default implementation for finding and navigating code. I find it much better at browsing the files from the results.
I just want the option of being able to capture the results in a document, as there are times when it is very useful and not just as the potential work around here for feature gap of ripgrep regular expression support.

@ThunderEX

This comment has been minimized.

ThunderEX commented Mar 23, 2017

Can we just use git grep to replace original "find in files" feature?
git grep is just a built-in command of git. Compared to ripgrep:

  • has similar powerful feature as ripgrep
  • built-in of git, cross-platform of course, and no additional code/binary required unless user don't have git
  • support gitignore, by nature!
  • a little lower than ripgrep but still faster than current search feature
  • mature, should support unicode but untested.
@jonathandturner

This comment has been minimized.

jonathandturner commented Mar 23, 2017

@ThunderEX - git grep doesn't work for non-git directories, unless it's some option I haven't seen.

@ThunderEX

This comment has been minimized.

ThunderEX commented Mar 24, 2017

@roblourens

This comment has been minimized.

Member

roblourens commented Mar 27, 2017

I didn't realize that git grep works on non-git dirs. But we're now shipping with ripgrep for the March release so I'm closing this issue.

@roblourens roblourens closed this Mar 27, 2017

@dakaraphi

This comment has been minimized.

dakaraphi commented Mar 27, 2017

git grep doesn't have multiline. However ripgrep is now investigating adding that feature.
That will be a greater win.
BurntSushi/ripgrep#176

@roblourens roblourens referenced this issue Mar 27, 2017

Closed

Test: ripgrep search #23319

4 of 4 tasks complete
@BurntSushi

This comment has been minimized.

BurntSushi commented Mar 28, 2017

I don't think it supports UTF-16 either.

@chrmarti chrmarti removed their assignment May 17, 2017

@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 18, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.