New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OAK-9783: useInExcerpt support #578
Conversation
…-oak into OAK-9783 * 'trunk' of https://github.com/AngelaFabregues/jackrabbit-oak: OAK-9779 : PermissionConstants.PERMISSION_PROPERTY_NAMES does not list rep:isAllow OAK-9686: replace Elastic RHLC with Elastic Java Client (apache#568) (Patch submitted by Angela Fabregues + Fabrizio Fortinio)
…-oak into OAK-9783 * 'trunk' of https://github.com/AngelaFabregues/jackrabbit-oak: OAK-9778 : Improve exception message with OakAccessControl0013 regarding ACE duplicates
This is still a draft. Better merge it once all Elastic excerpt tests are passed. |
…-oak into OAK-9783 * 'trunk' of https://github.com/AngelaFabregues/jackrabbit-oak: (27 commits) releng: use newest maven-fluido-skin 1.11.0 (apache#591) OAK-9798 : Inconsistent handling of supported permissions in CompositePermissionProviderOr ElasticResultRowAsyncIterator should not limit result set to 10k results (apache#588) OAK-9795 : Best practices: explicitly discourage ac setup for anonymous OAK-9793 : AbstractRestrictionProvider: validation to respect aggregation for unsupported paths OAK-9791 : Missing check for restriction node being present OAK-301: Document Oak - improved examples and formatting OAK-9757 : Override DocumentStore.getNodeNameLimit in wrapper classes Revert "OAK-9757 : used MongoStatus.isVersion api to check node name limit" OAK-9757 : used MongoStatus.isVersion api to check node name limit OAK-9757 : removed un-necessary junit OAK-9757 : move isNodeNameLong check to addNode api of commitbuilder OAK-9757 : fixed issue where size limit would come wrong for version less than 4.2.0 OAK-9757 : fixed mongo util errors OAK-9757 : removed mongoversion class to use serverversion and changed documentstore api name OAK-9757 : moved the node name length check to documentnodestate while creating/adding new node OAK-9757 : created new api in documentstore to get size limit for node name OAK-9757 : increased node name limit for mongo 4.2 version OAK-9757: increased node name limit for mongo 4.2 version OAK-9757 : removed un-necessary junit ...
…-oak into OAK-9783 * 'trunk' of https://github.com/AngelaFabregues/jackrabbit-oak: OAK-9798 : fix import in CompositePermissionProviderOrTest. accidential use of sun.reflect.generics.reflectiveObjects.NotImplementedException
…to keep it coherence with the rest of properties in the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed it partially. I think we should simplify the PR (see my comment ElasticDocument). I will complete the review after discussing that point.
oak-search/src/test/java/org/apache/jackrabbit/oak/plugins/index/ExcerptTest.java
Outdated
Show resolved
Hide resolved
oak-search/src/test/java/org/apache/jackrabbit/oak/plugins/index/ExcerptTest.java
Outdated
Show resolved
Hide resolved
oak-search/src/test/java/org/apache/jackrabbit/oak/plugins/index/ExcerptTest.java
Outdated
Show resolved
Hide resolved
oak-search/src/test/java/org/apache/jackrabbit/oak/plugins/index/ExcerptTest.java
Outdated
Show resolved
Hide resolved
oak-search/src/test/java/org/apache/jackrabbit/oak/plugins/index/ExcerptTest.java
Outdated
Show resolved
Hide resolved
...tic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/index/ElasticDocument.java
Outdated
Show resolved
Hide resolved
...tic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/index/ElasticDocument.java
Outdated
Show resolved
Hide resolved
...g/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java
Outdated
Show resolved
Hide resolved
...g/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java
Outdated
Show resolved
Hide resolved
...lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneDocumentMaker.java
Outdated
Show resolved
Hide resolved
…Definiton to avoid deprecation steps
…ex/ExcerptTest.java Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>
…gins/index/elastic/query/async/ElasticResultRowAsyncIterator.java Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>
…gins/index/elastic/query/async/ElasticResultRowAsyncIterator.java Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>
…bit-oak into OAK-9783 * 'OAK-9783' of https://github.com/AngelaFabregues/jackrabbit-oak: Update oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java Update oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java Update oak-search/src/test/java/org/apache/jackrabbit/oak/plugins/index/ExcerptTest.java
…Definiton to avoid deprecation steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general query here - in case of lucene - we use useInExcerpt property to enable/disable if a value of a given property would be used in excerpts (since this would increase the index size).
But I couldn't find dependency on this property in this PR - so is it that elastic supports excerpts by default ? And we just need to handle for query time ?
@nit0906 for Elastic, excerpts (aka highlight) get extracted from the source. We have decided to always store the complete source. Disabling it could save some space but it would have some consequences. See |
* OAK-9783 - the ExcerptTest is portable * OAK-9783 - the ExcerptTest is moved to oak-search * OAK-9783 - Excerpt Test ported to Elastic * OAK-9783 - enable the storing of fields in elasticsearch * OAK-9783 - enable access to useInExcerpt in property definition * OAK-9783 - enable the check for excerpts while making the document * OAK-9783 - adding to store in elasticsearch source the properties to useInExcerpts * OAK-9783 - excerpt implementation with a single excerpt value per excerpted field * OAK-9783 - assertEventually for excerpts * OAK-9783 - excerpt tests following full text index common test code. * OAK-9783 - re-formating the code * OAK-9783 - skipping relative excerpt test * OAK-9783 - limiting excerpt fragments to 1 to match the lucene excerpt configuration * OAK-9783 - fix duplicate ingestion of fields in an elastic document. * OAK-9783 - using useInExcerpt instead of stored in PropertyDefiniton to keep it coherence with the rest of properties in the class. * OAK-9783 - using addProperty instead of adding addToStoreInSource. * OAK-9783 - reverting using useInExcerpt instead of stored in PropertyDefiniton to avoid deprecation steps * OAK-9783 - Removing redundant null init before try-catch * Update oak-search/src/test/java/org/apache/jackrabbit/oak/plugins/index/ExcerptTest.java Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com> * Update oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com> * Update oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com> * OAK-9783 - Avoid full java package imports, better class by class. * OAK-9783 - Avoid full java package imports, better class by class. * OAK-9783 - Check useInExcerpts as part of the indexProperty() * OAK-9783 - reverting using useInExcerpt instead of stored in PropertyDefiniton to avoid deprecation steps * excerpt: simplified logic, removing support for regexp * excerpt: simplified logic, removing support for regexp * excerpt: simplified logic, removing support for regexp * excerpt: simplified logic, removing support for regexp * excerpt: simplified logic, removing support for regexp * excerpt: improved query logic * doc: useInExcerpt usage * minor improvement in ElasticRequestHandler.highlight Co-authored-by: AngelaFabregues <angela.fabregues@netcentric.biz> Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>
Making the ExcerptTest portable from oak-search to the LuceneExcerptTest in oak-lucene.
Porting it to oak-search-elastic.
Enabling the ingestion of fields to use in excerpts into elastic.
Implementing the excerpts:
jackrabbit-oak/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java
Line 147 in a0d1d8c
jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java
Line 711 in a87a663
jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java
Line 684 in a87a663
Whenever we merge this PR, the fields to be used in excerpts that weren't indexed already will be indexed now using the default dynamic mapping strategy (keyword + text). Consider optimising the mapping property definition before merging.