Switch files collection to use the ignore crate#205
Conversation
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
st0012
left a comment
There was a problem hiding this comment.
Is there other libraries that has less transient dependencies? 8 seems a lot
But if not, the time saved is IMO worth it as well
We can reproduce something manually. See for example what Sorbet does: https://github.com/sorbet/sorbet/blob/master/common/common.cc#L289-L444 I also experimented with a combo of Though this PR is merely suggesting that we can do better. |
| .threads(thread::available_parallelism().map(std::num::NonZero::get).unwrap_or(4)) // use all available threads | ||
| .types(types) // only index ruby files | ||
| .hidden(true) // ignore hidden files | ||
| .git_ignore(true) // ignore gitignore files |
There was a problem hiding this comment.
We may want this one to be false. For example, Prism auto generates the node.rb file. When working on Prism, you'd still like to get features for it.
There was a problem hiding this comment.
The git_ignore was a good way to get node_modules and stuffs out of the way though.
iirc in the current indexer we have a config file .index.yml should we bring that back now so we can configure all of this?
There was a problem hiding this comment.
Yes, ideally configuration. I'm not sure if the answer is a file for the indexer or a configuration API, but something for sure.
|
We may want to tune the generation of the synthetic corpus with deeper nesting of directories. I'm surprised that the ratio of time spent listing files is so much smaller in the corpus vs shop/world. |
As I'm working to bring back #205, it just makes more sense to have the listing and indexing concepts separate. Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
As I'm working to bring back #205, it just makes more sense to have the listing and indexing concepts separate. Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Proof of concept showing how we could reduce the time spent listing files using the ignore. I'm sure we can do even better but this already gets us to a better state.
Comparison on shop/world:
Comparison on corpus/huge: