Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inter segment tests for text search and fix bug for Lucene query parser creation #5226

Merged
merged 2 commits into from Apr 9, 2020

Conversation

siddharthteotia
Copy link
Contributor

Add inter segment tests for text search and fix bug for Lucene query parser creation. Parser should be instantiated per query (search expression). Also added text search tests for the SQL path

and fix bug for lucene query parser creation

Parser is JavaCC based. So it has to be created
per query.

Also added text search tests for the SQL path
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@@ -133,7 +131,8 @@ public MutableRoaringBitmap getDocIds(Object value) {
MutableRoaringBitmap docIds = new MutableRoaringBitmap();
Collector docIDCollector = new LuceneDocIdCollector(docIds, _docIdTranslator);
try {
Query query = _queryParser.parse(searchQuery);
QueryParser parser = new QueryParser(_column, new StandardAnalyzer());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to reuse some components here? Will this cause too much garbage?
Please also add some comments so that future developers know that it cannot be reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the parser is JavaCC based and so it is stateful. We have to instantiate it per query (this is the bug). Analyzer on the other hand seems stateless so I reverted that part to be instantiated just once.

@siddharthteotia
Copy link
Contributor Author

siddharthteotia commented Apr 9, 2020

Ran the benchmark again locally to compare the numbers to previous run (when the doc id cache optimizations were implemented #5199). No difference.

@siddharthteotia siddharthteotia merged commit 62a3e54 into apache:master Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants