Skip to content

Conversation

jdconrad
Copy link
Contributor

Description

Based on the discussion from this email thread we could add a set of classes to compile timings for different pieces of a query or multiple queries. This could be used to better debug changes in performance moving forward.

Solution

This change adds a multitude of new classes to help profile query timings. These classes have all been added to the Lucene sandbox including a simple extension of a IndexSearcher with the name ProfileIndexSearcher that includes a QueryProfiler. The QueryProfiler includes the total time spent in each of the following categories along with the number of times visited:

  • create weight
  • build scorer
  • next doc
  • advance
  • score
  • match

Tests

Tests have been added to ensure that a simple query generate values within a ProfileIndexSearcher.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

/**
* A simple extension of {@link IndexSearcher} to add a {@link QueryProfiler} that can be set to
* test query timings.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an example of how it may be used in the javadocs?

Copy link
Member

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good direction to me, just added a few suggestions.

super(reader);
}

public void setProfiler(QueryProfiler profiler) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add QueryProfiler as a constructor parameter and ensure it's always non-null. That'd simplify this class a bit, and make it clear that it's always meant to be used for profiling.

We could even take this further and remove the QueryProfiler class, folding its logic into ProfileIndexSearcher. It doesn't seem like a helpful abstraction on its own?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea a lot as fewer classes in this case is ideal. My question here is do want to have to create a new IndexSearcher for each query?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong intuition here -- maybe @jpountz has an opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's totally fine. IndexReaders are costly to create, but IndexSearchers are very cheap.

MatcherAssert.assertThat(rewriteTime, greaterThan(0L));
}

public void testCollector() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have one test showing how the collector would actually be used in a search, through something like IndexSearcher#search(Query query, Collector results).

@jdconrad
Copy link
Contributor Author

@jpountz @jtibshirani Thank you for the feedback! I will address these soon.

@jdconrad
Copy link
Contributor Author

Just a note that these comments are still valid and not really outdated. I changed packages and renamed some of the files so it's no longer referencing the correct path.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments around visibility but otherwise this looks good to me!

@jtibshirani
Copy link
Member

@jdconrad asked that I finish off this work as he'll be away for several weeks. I pushed a few changes:

  • Adjust class visibility and other small clean-ups
  • Fold QueryProfiler into QueryProfilerIndexSearcher
  • Rename profiler collectors and create dedicated test

I'll plan to merge soon unless there are more comments.

@jtibshirani jtibshirani merged commit 40f66a4 into apache:main Jun 9, 2021
mikemccand pushed a commit to mikemccand/lucene that referenced this pull request Sep 3, 2021
This change adds new IndexSearcher and Collector implementations to profile
search execution and break down the timings. The breakdown includes the total
time spent in each of the following categories along with the number of times
visited: create weight, build scorer, next doc, advance, score, match.

Co-authored-by: Julie Tibshirani <julietibs@gmail.com>
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 25, 2023
The profile functionality relies on two custom collector implementations
that wrap a given collector to monitor its execution time and expose
the profile results tree. The functionality has been contributed to
Lucene with apache/lucene#144 hence we can start
migrating to relying on the functionalities that Lucene offers for it.

We keep our own collector which extends ProfilerCollector from Lucene
and exposes profile results that extends ProfilerCollectorResult which
can be serialized over the wire as well as to xcontent.
javanna added a commit to elastic/elasticsearch that referenced this pull request Apr 25, 2023
The profile functionality relies on two custom collector implementations
that wrap a given collector to monitor its execution time and expose
the profile results tree. The functionality has been contributed to
Lucene with apache/lucene#144 hence we can start
migrating to relying on the functionalities that Lucene offers for it.

We keep our own collector which extends ProfilerCollector from Lucene
and exposes profile results that extends ProfilerCollectorResult which
can be serialized over the wire as well as to xcontent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants