Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting #12180

Closed
gsmiller opened this issue Mar 2, 2023 · 11 comments

Comments

@gsmiller
Copy link
Contributor

gsmiller commented Mar 2, 2023

Description

LUCENE-9476 added the ability to do bulk ordinal -> path lookups, but we have no bulk lookup in the other direction. I think we can find some efficiency gains if we add bulk lookup for paths, which we could then put behind a Facets#getSpecificValues API for retrieving facet values for multiple paths in bulk. There's some nuance down in the depths of MultiTerms, so maybe there's nothing to be gained here, but I think there ought to be (I'll try to look in more detail later, but wanted to open an issue to track the idea in case I forget). A few thoughts:

  1. We can seek the terms dictionaries in a single direction if we sort the paths before looking them up.
  2. We should be able to reuse postings.
@gsmiller gsmiller changed the title Add bulk path -> ordinal lookup for taxonomy faceting Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting Mar 3, 2023
epotyom added a commit to epotyom/lucene that referenced this issue Nov 6, 2023
epotyom added a commit to epotyom/lucene that referenced this issue Nov 7, 2023
epotyom added a commit to epotyom/lucene that referenced this issue Nov 8, 2023
@uschindler
Copy link
Contributor

Hi, the commit causes test failures like this from time to time:

org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader > testGetPathAndOrdinalsRandomMultithreading FAILED
    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=276, name=Thread-152, state=RUNNABLE, group=TGRP-TestDirectoryTaxonomyReader]

        Caused by:
        java.lang.IllegalArgumentException: bound must be positive
            at __randomizedtesting.SeedInfo.seed([EB36AAC7E6947189]:0)
            at java.base/java.util.Random.nextInt(Random.java:322)
            at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.Xoroshiro128PlusRandom.nextInt(Xoroshiro128PlusRandom.java:73)
            at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.AssertingRandom.nextInt(AssertingRandom.java:87)
            at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.assertGettingPaths(TestDirectoryTaxonomyReader.java:519)
            at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader$4.run(TestDirectoryTaxonomyReader.java:721)

And hundreds of more threads following. Looks like sometimes the ordinals array is initialized by zero length. The followup random then fail because the upper bound passed to random#nextInt(limit) must be limit>=1.

@uschindler uschindler linked a pull request Nov 9, 2023 that will close this issue
@uschindler uschindler added this to the 9.9.0 milestone Nov 9, 2023
@mikemccand
Copy link
Member

OK fixed @uschindler -- sorry!

@mikemccand
Copy link
Member

And thanks @epotyom!

@gsmiller
Copy link
Contributor Author

gsmiller commented Nov 9, 2023

Thanks @epotyom! Should we consider a follow up PR that leverages this new bulk lookup by adding something like Facets#getSpecificValues that gets facet values for multiple paths at once? What do you think?

@epotyom
Copy link
Contributor

epotyom commented Nov 9, 2023

@gsmiller yes, I'll be working on that now as well as adding benchmark task for getSpecificValues, as was discussed with Mike in #12769 (review) .

@gsmiller
Copy link
Contributor Author

gsmiller commented Nov 9, 2023

@epotyom got it, thanks! Didn't see that earlier conversation.

@ChrisHegarty
Copy link
Contributor

Hi,

I'd like to ask a clarifying question as part of the 9.9.0 release manager duties, since this currently-unresolved issue is associated with Milestone 9.9.0.

It's clear that progress has been made on this issue, with the addition of the new TaxonomyReader::getBulkOrdinals method. Is more work planned, or should this issue be closed? It's fine either way, I just want to ensure that we are clear about the targeted release, if any.

@mikemccand
Copy link
Member

Thanks @ChrisHegarty -- this one is done.

epotyom added a commit to epotyom/lucene that referenced this issue Nov 30, 2023
epotyom added a commit to epotyom/lucene that referenced this issue Nov 30, 2023
@epotyom
Copy link
Contributor

epotyom commented Nov 30, 2023

@mikemccand , @ChrisHegarty ,

Sorry for confusion, but this issue is not fully done yet. This issue includes:

  1. [done] Add TaxonomyReader#getBulkOrdinals method (Add TaxonomyReader#getBulkOrdinals method (#12180) #12769)
  2. [in review now] Add Facets#getBulkSpecificValues method which uses getBulkOrdinals to get multiple specific values at once. (Add Facets#getBulkSpecificValues method #12862)
  3. [not started yet] Adding performance test task for the new APIs

@mikemccand
Copy link
Member

Sorry for confusion, but this issue is not fully done yet

Ahh, thanks @epotyom. Since this issue was partially merged and released in 9.9.0, can you open a new issue for the followon tasks?

[not started yet] Adding performance test task for the new APIs

This would be great to have, but should not block step 2 -- we are free to introduce new APIs without fully knowing their performance characteristics yet.

epotyom added a commit to epotyom/lucene that referenced this issue Dec 8, 2023
@epotyom
Copy link
Contributor

epotyom commented Dec 11, 2023

@mikemccand ,

can you open a new issue for the followon tasks?

Opened #12919

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants