Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation #12347

Open
sohami opened this issue Jun 5, 2023 · 15 comments
Assignees

Comments

@sohami
Copy link
Contributor

sohami commented Jun 5, 2023

Description

For concurrent segment search, lucene uses the slices method to compute the number of work units which can be processed concurrently.

a) It calculates slices in the constructor of IndexSearcher with default thresholds for document count and segment counts.
b) Provides an implementation of SliceExecutor (i.e. QueueSizeBasedExecutor) based on executor type which applies the backpressure in concurrent execution based on a limiting factor of 1.5 times the passed in threadpool maxPoolSize.

In OpenSearch, there is a search threadpool which serves the search request to all the lucene indices (or OpenSearch shards) assigned to a node. Each node can get the requests to some or all the indices on that node.
I am exploring a mechanism such that I can dynamically control the max slices for each lucene index search request. For example: search requests to some indices on that node to have max 4 slices each and others to have 2 slices each. Then the threadpool shared to execute these slices does not have any limiting factor. In this model the top level search threadpool will limit the number of active search requests which will limit the number of work units in the SliceExecutor threadpool.

For this the derived implementation of IndexSearcher can get an input value in the constructor to control the slice count computation. Even though the slice method is protected it gets called from the constructor of base IndexSearcher class which prevents the derived class from using the passed in input.

To achieve this I am making change along the lines as suggested on discussion thread in dev mailing list to get some feedback

@jpountz
Copy link
Contributor

jpountz commented Jun 9, 2023

I wonder if we should make SliceExecutor configurable, or if we should instead fix it? Having both the Executor and the SliceExecutor be user-configurable feels too much to me, we should either make the executor configurable, or if we think that this abstraction is not sophisticated enough then have a more complex abstraction configurable whole concrete implementations would likely wrap an executor?

SliceExecutor is based on the assumption that it is sometimes a good idea to run in the current thread instead of delegating to the executor. I have become increasingly doubtful about this because it make it very hard to reason about the total number of threads that may run searches, since the threadpool that would call IndexSearcher#search would have part of its threads running searches, and part of its threads joining tasks that are running in the executor threadpool. Should we simplify this and always run searches in the executor threadpool? And the other threadpool (the one that calls IndexSearcher#search) would only be joining tasks running in the executor and running cheap operations like TopDocs#merge to combine the results of the various slices?

@javanna
Copy link
Contributor

javanna commented Jun 9, 2023

I agree that we should either have the executor or the slice executor configurable.

On fixing the slice executor without making it configurable, my guts feeling is that it is going to be difficult to guarantee default behaviour that is good for all the consumers.

I agree that executing slices on either the caller thread or the executor makes it difficult to reason about what happens where and to appropriately size thread pools. Though to have a clear distinction between a coordinator thread pool and a collection thread pool, sequential execution should also be offloaded to the executor which is somehow controversial. I am a bit nervous about making this the default, hence the thought on exposing a SliceExecutor abstraction that is configurable and can make these decisions. Could it be an idea to move some portions of IndexSearcher#search(Query, CollectorManager) to the SliceExecutor itself, for it to deal with task creation and execution as a whole?

@sohami
Copy link
Contributor Author

sohami commented Jun 9, 2023

Thanks @jpountz and @javanna for the discussion. I also agree that we should have either of executor or SliceExecutor. However, given there are already constructors accepting executor adding constructor to only accept SliceExecutor will break the consumers using IndexSearcher(IndexReader r, Executor executor) and passing null for executor. Thinking out loud I feel those usage is also incorrect as those consumers should just use IndexSearcher(IndexReader r) instead. I am not sure if this will be fine considering the behavior can still remain same which means it will still be backward compatible ?

I agree that executing slices on either the caller thread or the executor makes it difficult to reason about what happens where and to appropriately size thread pools. Though to have a clear distinction between a coordinator thread pool and a collection thread pool, sequential execution should also be offloaded to the executor which is somehow controversial. I am a bit nervous about making this the default, hence the thought on exposing a SliceExecutor abstraction that is configurable and can make these decisions

I agree here that exposing SliceExecutor is giving the extension opportunity to customize it based on their use case and default implementation can remain as is.

Could it be an idea to move some portions of IndexSearcher#search(Query, CollectorManager) to the SliceExecutor itself, for it to deal with task creation and execution as a whole?

I am not sure on this as currently SliceExecutor doesn't know anything about details of the tasks passed to it and just controls how the tasks needs to be executed. I think task creation should still remain in IndexSearcher otherwise we may end up making SliceExecutor aware about internals like Collector or CollectorManager

@sohami
Copy link
Contributor Author

sohami commented Jun 12, 2023

@jpountz and @javanna Seems like there is plan of feature freeze by this Fri (06/16) for next release. I was really hoping if we can get it in the 9.x release and would appreciate your help on this issue and how to progress on it.

@javanna
Copy link
Contributor

javanna commented Jun 13, 2023

heya @sohami, while I suggested making SliceExecutor pluggable, I also understand the implications that Adrien brought up. Especially, it does not feel like configuring slices is the right reason to make slice executor pluggable at this stage. We may still do it in the future. In order to unblock you I'll make a counter proposal: would it work to provide the slices as a constructor argument instead, for cases where the default slicing does not satisfy your requirements? Assuming that you have access to the IndexReaderContext before creating your index searcher extension, you could create slices externally and provide them?

@sohami
Copy link
Contributor Author

sohami commented Jun 13, 2023

@javanna I am looking to provide both custom slice computation and custom SliceExecutor. Thinking on similar lines to take the slices in constructor, can we provide a constructor which takes in both a SliceProvider and SliceExecutor like: IndexSearcher(IndexReader reader, Function<List<LeafReaderContext>, LeafSlice[]> sliceProvider, SliceExecutor sliceExecutor). This way we will provide customization for both and keep the slice computation separate from SliceExecutor as today.

@javanna
Copy link
Contributor

javanna commented Jun 13, 2023

My suggestion would be to take a LeafSlice[] as constructor argument rather than a SliceProvider or a function. Could you share what you'd want to customize around the slice execution? I've also been looking at it and there's a number of things that I'd want to adjust. I am open to making things pluggable but I think there should be a good reason compared to changing the default behaviour.

@sohami
Copy link
Contributor Author

sohami commented Jun 13, 2023

Could you share what you'd want to customize around the slice execution? I've also been looking at it and there's a number of things that I'd want to adjust. I am open to making things pluggable but I think there should be a good reason compared to changing the default behaviour.

In default, SliceExecutor (i.e. QueueSizeBasedExecutor) it applies the backpressure in concurrent execution based on a limiting factor of 1.5 times the passed in threadpool maxPoolSize. If the queue size goes beyond that it ends up executing all the tasks in caller thread.

In OpenSearch, there is a search threadpool which serves the search request to all the lucene indices (or OpenSearch shards) assigned to a node. Each node can get the requests to some or all the indices on that node.
With custom slice computation to control the max slices per request/index the limiting factor in SliceExecutor will not be needed. For example: if all the indices on a node is limited to have 2/4 slices each then the threadpool shared to execute these slices does not need to have any limiting factor. In this model the top level search threadpool will limit the number of active search requests on the node which will limit the number of work units in the SliceExecutor threadpool and will keep the behavior uniform across the requests.

@javanna
Copy link
Contributor

javanna commented Jun 14, 2023

heya @sohami thanks a lot for sharing more context.

With custom slice computation to control the max slices per request/index the limiting factor in SliceExecutor will not be needed.

Good point, agreed. Also, QueueSizeBasedExecutor is quite opinionated and non configurable, and it gets applied based on an instanceof check on the provided executor which is not fantastic.

Another thought on my end: executing sometimes on the caller thread, and sometimes on the executor makes things hard to reason about: how do you size the two thread pools if you can't easily tell what load they are subjected to?

Instead of making the slice executor configurable then, I would considering removing it entirely, and forcing the collection to always to happen on the separate thread pool. I think we'll need to figure out how to handle rejections from the executor thread pool, as today the collection happens on the caller thread whenever there's a rejection which I don't think is a behaviour we want to keep. We could also leave this to the executor implementation that is provided.

I believe that the QueueSizeBasedExecutor was contributed by OpenSearch: would the approach suggested above be feasible for you folks? I am thinking it would simplify things and provide a better user experience for Lucene users.

@sohami
Copy link
Contributor Author

sohami commented Jun 14, 2023

@javanna Thanks for your input.

Another thought on my end: executing sometimes on the caller thread, and sometimes on the executor makes things hard to reason about: how do you size the two thread pools if you can't easily tell what load they are subjected to?

Instead of making the slice executor configurable then, I would considering removing it entirely, and forcing the collection to always to happen on the separate thread pool. I think we'll need to figure out how to handle rejections from the executor thread pool, as today the collection happens on the caller thread whenever there's a rejection which I don't think is a behaviour we want to keep. We could also leave this to the executor implementation that is provided.

As you mentioned earlier as well (and I agree) it is hard to understand the default which works best for all the usage. So providing a way to customize it will provide the flexibility to the users to adhere to their use cases. I think that way we can see what custom mechanism used across users works well and then change the default later as needed. I would also like to try to just remove the limiting factor but keep the mechanism to execute the last slice on the caller thread, so SliceExecutor type interface will still be useful.

I think for now can we split the issue into 2. We can potentially make the change for 1st one now and follow up with 2nd one. Thoughts ?

  1. Take the LeafSlice[] in constructor to allow for custom slice computation.
  2. Discuss different options to customize SliceExecutor or we will want to replace it with some other interface

@javanna
Copy link
Contributor

javanna commented Jun 14, 2023

Take the LeafSlice[] in constructor to allow for custom slice computation.

Sounds good, I'll happily review that change.

Discuss different options to customize SliceExecutor or we will want to replace it with some other interface

Ok to discussing, I do think that making things pluggable is a change that's difficult to revert in terms of backwards compatibility, and I think we should put some effort into changing the current behaviour before we add new public abstractions.

@atris
Copy link
Contributor

atris commented Jun 14, 2023

Take the LeafSlice[] in constructor to allow for custom slice computation.

Sounds good, I'll happily review that change.

Discuss different options to customize SliceExecutor or we will want to replace it with some other interface

Ok to discussing, I do think that making things pluggable is a change that's difficult to revert in terms of backwards compatibility, and I think we should put some effort into changing the current behaviour before we add new public abstractions.

Strong -1 t replacing the interface. I think it has worked well for many users for a while and it would be breaking back compatibility to serve a specific use case.

I am just catching up on this thread -- why does the current SliceExecutor not work for extension in this case?

@sohami
Copy link
Contributor Author

sohami commented Jun 14, 2023

@atris To summarize, there are 2 separate functionality I am looking to add:

  1. Custom slice computation which the extension can provide. For this we can provide a constructor in IndexSearcher which takes in LeafSlice array from extension. I think probably there is no concern with this.

  2. Mechanism to provide custom SliceExecutor implementation or deprecate this with some other mechanism. I would ideally like to provide a mechanism for extensions to be able to give a custom implementation of it. The default implementations takes into consideration certain limiting factor to apply back pressure which will not be needed in all the cases (as shared above) and will also simplify the reasoning behind which slices got executed on which thread-pool. So keeping the existing default and giving the flexibility to customize it is what I guess will be helpful here. This is still being discussed and would be great to hear your feedback as well.

@sohami
Copy link
Contributor Author

sohami commented Jun 14, 2023

@javanna @atris I have create a PR (#12374) for item 1 above for now.

@javanna
Copy link
Contributor

javanna commented Jun 15, 2023

Strong -1 t replacing the interface. I think it has worked well for many users for a while and it would be breaking back compatibility to serve a specific use case.

@atris which interface do you mean? It's a package private class. There is no breaking change associated to that. I am talking about improving the default behaviour around concurrent execution of slices, as opposed to making the execution code a public interface. Flexibility can be nice, but it required consumers to implement their strategy, while I am hoping that we can come up with default behaviour that all our users can benefit from, out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants