Skip to content

Switch files collection to use the ignore crate#205

Closed
Morriar wants to merge 1 commit into
mainfrom
at-faster-glob
Closed

Switch files collection to use the ignore crate#205
Morriar wants to merge 1 commit into
mainfrom
at-faster-glob

Conversation

@Morriar
Copy link
Copy Markdown
Contributor

@Morriar Morriar commented Oct 15, 2025

Proof of concept showing how we could reduce the time spent listing files using the ignore. I'm sure we can do even better but this already gets us to a better state.

Comparison on shop/world:

Timing breakdown    before              after

Initialization      0.003s (  0.0%)    0.002s (  0.0%)
Listing             5.587s ( 51.5%)    1.755s ( 23.1%)
Indexing            4.970s ( 45.8%)    5.568s ( 73.3%)
Querying            0.293s (  2.7%)    0.272s (  3.6%)
Database            0.000s (  0.0%)    0.000s (  0.0%)
Cleanup             0.000s (  0.0%)    0.000s (  0.0%)
Total:             10.853s             7.597s         

Comparison on corpus/huge:

Timing breakdown    before              after

Initialization      0.004s (  0.0%)    0.002s (  0.0%)
Listing             0.167s (  0.9%)    0.099s (  0.6%)
Indexing           11.354s ( 64.1%)   10.541s ( 62.3%)
Querying            6.184s ( 34.9%)    6.287s ( 37.1%)
Database            0.000s (  0.0%)    0.000s (  0.0%)
Cleanup             0.000s (  0.0%)    0.000s (  0.0%)
Total:             17.709s            16.928s

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
@Morriar Morriar self-assigned this Oct 15, 2025
@Morriar Morriar requested a review from a team as a code owner October 15, 2025 17:49
@Morriar Morriar added the enhancement New feature or request label Oct 15, 2025
Copy link
Copy Markdown
Member

@st0012 st0012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there other libraries that has less transient dependencies? 8 seems a lot
But if not, the time saved is IMO worth it as well

@Morriar
Copy link
Copy Markdown
Contributor Author

Morriar commented Oct 15, 2025

Is there other libraries that has less transient dependencies?

We can reproduce something manually. See for example what Sorbet does: https://github.com/sorbet/sorbet/blob/master/common/common.cc#L289-L444

I also experimented with a combo of walkdir and globset but I think ignore comes with a nice API and quite a lot of features (including Windows support?).

Though this PR is merely suggesting that we can do better.

.threads(thread::available_parallelism().map(std::num::NonZero::get).unwrap_or(4)) // use all available threads
.types(types) // only index ruby files
.hidden(true) // ignore hidden files
.git_ignore(true) // ignore gitignore files
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want this one to be false. For example, Prism auto generates the node.rb file. When working on Prism, you'd still like to get features for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git_ignore was a good way to get node_modules and stuffs out of the way though.

iirc in the current indexer we have a config file .index.yml should we bring that back now so we can configure all of this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ideally configuration. I'm not sure if the answer is a file for the indexer or a configuration API, but something for sure.

@vinistock
Copy link
Copy Markdown
Member

We may want to tune the generation of the synthetic corpus with deeper nesting of directories. I'm surprised that the ratio of time spent listing files is so much smaller in the corpus vs shop/world.

Morriar added a commit that referenced this pull request Dec 9, 2025
As I'm working to bring back #205, it just
makes more sense to have the listing and indexing concepts separate.

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Morriar added a commit that referenced this pull request Dec 10, 2025
As I'm working to bring back #205, it just
makes more sense to have the listing and indexing concepts separate.

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
@Morriar Morriar closed this in #394 Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants