Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting the actual state observed in LUCENE-9328 #1625

Closed
wants to merge 2 commits into from

Conversation

mkhludnev
Copy link
Member

Hi, @ctargett. I'm not sure if it's valid to stroke through, or it's better to just drop this word?

Hi, @ctargett. I'm not sure if it's valid to stroke through, or it's better to just drop this word?
@@ -24,7 +24,7 @@ The standard way that Solr builds the index is with an _inverted index_. This st

For other features that we now commonly associate with search, such as sorting, faceting, and highlighting, this approach is not very efficient. The faceting engine, for example, must look up each term that appears in each document that will make up the result set and pull the document IDs in order to build the facet list. In Solr, this is maintained in memory, and can be slow to load (depending on the number of documents, terms, etc.).

In Lucene 4.0, a new approach was introduced. DocValue fields are now column-oriented fields with a document-to-value mapping built at index time. This approach promises to relieve some of the memory requirements of the fieldCache and make lookups for faceting, sorting, and grouping much faster.
In Lucene 4.0, a new approach was introduced. DocValue fields are now column-oriented fields with a document-to-value mapping built at index time. This approach promises to relieve some of the memory requirements of the fieldCache and make lookups for faceting, sorting much faster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to take a word out of the list and leave only 2 items, you should remove the comma and add "and" instead: ...make lookups for faceting and sorting much faster.

@ctargett
Copy link
Contributor

I commented with a specific recommendation on the change, but in general wonder if it's worth it? LUCENE-9328 is marked as an Improvement, which would imply to me that we never should have expected grouping to be faster with docValues? If that's the case, then the docs have been incorrect and this change makes sense.

However, if grouping was correctly documented as a specific thing that should be faster but isn't today because of a regression, then the Jira should be a Bug and this change makes less sense because we don't document every single Bug in Solr or Lucene in the Ref Guide - we'd have hundreds of tiny edits as things break and get fixed and we'd miss a lot of them. There's obviously differences of degrees here - if SSL totally broke for an entire release, we'd probably want to document that, but grouping being slower for a release or two (if that's the case, I didn't study the Jiras so maybe it's been longer), that's a less pressing edit IMO.

A related point is that if this has been wrong all along but is now going to be supported in the upcoming release (not sure the timing of LUCENE-9328), making this change now means you'll have to add the word back in another commit before release and it wouldn't be worth removing it now.

@mkhludnev
Copy link
Member Author

it wouldn't be worth removing it now.

ok. It's fair. Thanks. @ctargett

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants