Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAK-9783: useInExcerpt support #578

Merged
merged 44 commits into from Jun 30, 2022
Merged

Conversation

AngelaFabregues
Copy link
Contributor

@AngelaFabregues AngelaFabregues commented May 24, 2022

Making the ExcerptTest portable from oak-search to the LuceneExcerptTest in oak-lucene.
Porting it to oak-search-elastic.
Enabling the ingestion of fields to use in excerpts into elastic.
Implementing the excerpts:

Whenever we merge this PR, the fields to be used in excerpts that weren't indexed already will be indexed now using the default dynamic mapping strategy (keyword + text). Consider optimising the mapping property definition before merging.

AngelaFabregues and others added 6 commits May 24, 2022 13:15
…-oak into OAK-9783

* 'trunk' of https://github.com/AngelaFabregues/jackrabbit-oak:
  OAK-9779 : PermissionConstants.PERMISSION_PROPERTY_NAMES does not list rep:isAllow
  OAK-9686: replace Elastic RHLC with Elastic Java Client (apache#568) (Patch submitted by Angela Fabregues + Fabrizio Fortinio)
…-oak into OAK-9783

* 'trunk' of https://github.com/AngelaFabregues/jackrabbit-oak:
  OAK-9778 : Improve exception message with OakAccessControl0013 regarding ACE duplicates
@AngelaFabregues AngelaFabregues marked this pull request as draft May 31, 2022 11:38
@AngelaFabregues
Copy link
Contributor Author

This is still a draft. Better merge it once all Elastic excerpt tests are passed.

…-oak into OAK-9783

* 'trunk' of https://github.com/AngelaFabregues/jackrabbit-oak: (27 commits)
  releng: use newest maven-fluido-skin 1.11.0 (apache#591)
  OAK-9798 : Inconsistent handling of supported permissions in CompositePermissionProviderOr
  ElasticResultRowAsyncIterator should not limit result set to 10k results (apache#588)
  OAK-9795 : Best practices: explicitly discourage ac setup for anonymous
  OAK-9793 : AbstractRestrictionProvider: validation to respect aggregation for unsupported paths
  OAK-9791 : Missing check for restriction node being present
  OAK-301: Document Oak - improved examples and formatting
  OAK-9757 : Override DocumentStore.getNodeNameLimit in wrapper classes
  Revert "OAK-9757 : used MongoStatus.isVersion api to check node name limit"
  OAK-9757 : used MongoStatus.isVersion api to check node name limit
  OAK-9757 : removed un-necessary junit
  OAK-9757 : move isNodeNameLong check to addNode api of commitbuilder
  OAK-9757 : fixed issue where size limit would come wrong for version less than 4.2.0
  OAK-9757 : fixed mongo util errors
  OAK-9757 : removed mongoversion class to use serverversion and changed documentstore api name
  OAK-9757 : moved the node name length check to documentnodestate while creating/adding new node
  OAK-9757 : created new api in documentstore to get size limit for node name
  OAK-9757 : increased node name limit for mongo 4.2 version
  OAK-9757: increased node name limit for mongo 4.2 version
  OAK-9757 : removed un-necessary junit
  ...
…-oak into OAK-9783

* 'trunk' of https://github.com/AngelaFabregues/jackrabbit-oak:
  OAK-9798 : fix import in CompositePermissionProviderOrTest. accidential use of sun.reflect.generics.reflectiveObjects.NotImplementedException
@AngelaFabregues AngelaFabregues marked this pull request as ready for review June 16, 2022 13:26
Copy link
Contributor

@fabriziofortino fabriziofortino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed it partially. I think we should simplify the PR (see my comment ElasticDocument). I will complete the review after discussing that point.

angelafabreguesv and others added 4 commits June 20, 2022 14:34
…ex/ExcerptTest.java

Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>
…gins/index/elastic/query/async/ElasticResultRowAsyncIterator.java

Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>
AngelaFabregues and others added 16 commits June 20, 2022 15:08
…gins/index/elastic/query/async/ElasticResultRowAsyncIterator.java

Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>
…bit-oak into OAK-9783

* 'OAK-9783' of https://github.com/AngelaFabregues/jackrabbit-oak:
  Update oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java
  Update oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java
  Update oak-search/src/test/java/org/apache/jackrabbit/oak/plugins/index/ExcerptTest.java
@fabriziofortino fabriziofortino changed the title Oak 9783 OAK-9783: useInExcerpt support Jun 23, 2022
Copy link
Contributor

@nit0906 nit0906 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general query here - in case of lucene - we use useInExcerpt property to enable/disable if a value of a given property would be used in excerpts (since this would increase the index size).

But I couldn't find dependency on this property in this PR - so is it that elastic supports excerpts by default ? And we just need to handle for query time ?

@fabriziofortino
Copy link
Contributor

@nit0906 for Elastic, excerpts (aka highlight) get extracted from the source. We have decided to always store the complete source. Disabling it could save some space but it would have some consequences. See Think before disabling the _source field in https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html

@fabriziofortino fabriziofortino merged commit e5deed5 into apache:trunk Jun 30, 2022
@fabriziofortino fabriziofortino deleted the OAK-9783 branch June 30, 2022 17:43
rishabhdaim pushed a commit to rishabhdaim/jackrabbit-oak that referenced this pull request Jul 4, 2022
* OAK-9783 - the ExcerptTest is portable

* OAK-9783 - the ExcerptTest is moved to oak-search

* OAK-9783 - Excerpt Test ported to Elastic

* OAK-9783 - enable the storing of fields in elasticsearch

* OAK-9783 - enable access to useInExcerpt in property definition

* OAK-9783 - enable the check for excerpts while making the document

* OAK-9783 - adding to store in elasticsearch source the properties to useInExcerpts

* OAK-9783 - excerpt implementation with a single excerpt value per excerpted field

* OAK-9783 - assertEventually for excerpts

* OAK-9783 - excerpt tests following full text index common test code.

* OAK-9783 - re-formating the code

* OAK-9783 - skipping relative excerpt test

* OAK-9783 - limiting excerpt fragments to 1 to match the lucene excerpt configuration

* OAK-9783 - fix duplicate ingestion of fields in an elastic document.

* OAK-9783 - using useInExcerpt instead of stored in PropertyDefiniton to keep it coherence with the rest of properties in the class.

* OAK-9783 - using addProperty instead of adding addToStoreInSource.

* OAK-9783 - reverting using useInExcerpt instead of stored in PropertyDefiniton to avoid deprecation steps

* OAK-9783 - Removing redundant null init before try-catch

* Update oak-search/src/test/java/org/apache/jackrabbit/oak/plugins/index/ExcerptTest.java

Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>

* Update oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java

Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>

* Update oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java

Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>

* OAK-9783 - Avoid full java package imports, better class by class.

* OAK-9783 - Avoid full java package imports, better class by class.

* OAK-9783 - Check useInExcerpts as part of the indexProperty()

* OAK-9783 - reverting using useInExcerpt instead of stored in PropertyDefiniton to avoid deprecation steps

* excerpt: simplified logic, removing support for regexp

* excerpt: simplified logic, removing support for regexp

* excerpt: simplified logic, removing support for regexp

* excerpt: simplified logic, removing support for regexp

* excerpt: simplified logic, removing support for regexp

* excerpt: improved query logic

* doc: useInExcerpt usage

* minor improvement in ElasticRequestHandler.highlight

Co-authored-by: AngelaFabregues <angela.fabregues@netcentric.biz>
Co-authored-by: Fabrizio Fortino <fabrizio.fortino@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants