-
Notifications
You must be signed in to change notification settings - Fork 2.6k
WIP:SOLR-13129 #549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
WIP:SOLR-13129 #549
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
17972e4
SOLR-13129: added new nested page in Solr ref-guide
8702a33
SOLR-13129: improve nested docs
3058396
SOLR-13129: improve nested queries
7194813
add nested.adoc
d528113
add nested docs to ref-guide index
3340e5b
SOLR-13129: improve nested links on index homepage
9e57dd9
SOLR-13129: change legacy mode to root only and fix index
4c32091
SOLR-13129: fix for PR review
db78d70
SOLR-13129: fix for PR review
dbfb39b
SOLR-13129: fix for PR review
5b61c39
SOLR-13129: add space between bullets in index
b1f8ab1
SOLR-13129: remove redundant indexing nested json
9308248
SOLR-13129: change index links for nested docs
62837af
SOLR-13129: add nested faceting and update schema with example config…
15ef21b
SOLR-13129: PR review changes, and more detailed explanation for quer…
4426018
SOLR-13129: fix nested docs links
9fe2418
SOLR-13129: split nested-documents page into separate indexing and se…
b7a20a7
SOLR-13129: added a brief discussion about how updates to parent-chil…
713fdb7
SOLR-13129: link indexing and searching nested docs
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,152 @@ | ||
| = Indexing Nested Child Documents | ||
| // Licensed to the Apache Software Foundation (ASF) under one | ||
| // or more contributor license agreements. See the NOTICE file | ||
| // distributed with this work for additional information | ||
| // regarding copyright ownership. The ASF licenses this file | ||
| // to you under the Apache License, Version 2.0 (the | ||
| // "License"); you may not use this file except in compliance | ||
| // with the License. You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, | ||
| // software distributed under the License is distributed on an | ||
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| // KIND, either express or implied. See the License for the | ||
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| Solr supports indexing nested documents for creating stronger bonds and relationships between documents, | ||
| to be used for updates and <<searching-nested-documents.adoc#searching-nested-documents,Searching Nested Documents>>. + | ||
| Nested documents in Solr can be used to bind a blog post parent document and comments as child documents | ||
| -- or products as parent documents and sizes, colors, or other variations as child documents. + | ||
| The parent with all children is referred to as a "block" and it explains some of the nomenclature of related features. | ||
| At query time, the <<other-parsers.adoc#block-join-query-parsers,Block Join Query Parsers>> can search these relationships, | ||
| and the `<<transforming-result-documents.adoc#child-childdoctransformerfactory,[child]>>` Document Transformer can attach child documents to the result documents. | ||
| In terms of performance, indexing the relationships between documents usually yields much faster queries than an equivalent "query time join", | ||
| since the relationships are already stored in the index and do not need to be computed. | ||
| However, nested documents are less flexible than query time joins as it imposes rules that some applications may not be able to accept. | ||
| Nested documents may be indexed via either the XML or JSON data syntax, and is also supported by <<using-solrj.adoc#using-solrj,SolrJ>> with javabin. | ||
|
|
||
| [NOTE] | ||
| ==== | ||
| A big limitation is that the whole block of parent-children documents must be updated or deleted together, not separately. | ||
| In other words, even if a single child document or the parent document is changed, the whole block of parent-child documents must be indexed together. | ||
| _Solr does not enforce this rule_; if it's violated, you may get sporadic query failures or incorrect results. | ||
| ==== | ||
|
|
||
| == Schema Configuration | ||
|
|
||
| * The schema must include indexed field `\_root_`. The value of that field is populated automatically and is the same for all documents in the block, regardless of the inheritance depth. The ID of the top document in every nested hierarchy is populated in this field. + | ||
| `<field name="\_root_" type="string" indexed="true" stored="false" docValues="false" />` | ||
| * `\_nest_path_` is used to store the path of the document in the hierarchy. This field is optional. + | ||
| `<fieldType name="\_nest_path_" class="solr.NestPathField" /> | ||
| <field name="\_nest_path_" type="_nest_path_" />` | ||
| * `\_nest_parent_` is used to store the `ID` of the parent in the previous level. This field is optional. + | ||
| `<field name="\_nest_parent_" type="string" indexed="true" stored="true"/>` | ||
| * Nested documents are very much documents in their own right even if certain nested documents hold different information from the parent. | ||
| Therefore: | ||
| ** the schema must be able to represent the fields of any document | ||
| ** it may be infeasible to use `required` | ||
| ** even child documents need a unique `ID` | ||
| * If you associate a child document as a field (e.g., comment), that field need not be defined in the schema, and probably | ||
| shouldn't be as it would be confusing. There is no child document field type. | ||
|
|
||
| == Rudimentary Root-only schemas | ||
|
|
||
| * These schemas do not contain any other nested related fields apart from `\_root_`. + | ||
| In this mode relationship types(field names) between parents and their children are not saved. + | ||
| In this case <<searching-nested-documents.adoc#child-doc-transformer,[child]>> transformer returns all children under the `\_childDocuments_` field. | ||
| * Typically you should have a field that differentiates a root doc from any nested children. However this isn't strictly necessary; so long as it's possible to write a query that can select only root documents somehow. Such a query is needed for the <<other-parsers.adoc#block-join-query-parsers,block join query parsers>> and <<searching-nested-documents.adoc#child-doc-transformer,[child]>> doc transformer to function. | ||
|
|
||
| === XML Examples | ||
|
|
||
| For example, here are two documents and their child documents. | ||
| It illustrates two styles of adding child documents; the first is associated via a field "comment" (preferred), | ||
| and the second is done in the classic way now referred to as an "anonymous" or "unlabelled" child document. | ||
| This field label relationship is available to the URP chain in Solr but is ultimately discarded. | ||
| Solr 8 will save the relationship. | ||
|
|
||
| [source,xml] | ||
| ---- | ||
| <add> | ||
| <doc> | ||
| <field name="ID">1</field> | ||
| <field name="title">Solr adds block join support</field> | ||
| <field name="content_type">parentDocument</field> | ||
| <field name="content"> | ||
| <doc> | ||
| <field name="ID">2</field> | ||
| <field name="comments">SolrCloud supports it too!</field> | ||
| </doc> | ||
| </field> | ||
| </doc> | ||
| <doc> | ||
| <field name="ID">3</field> | ||
| <field name="title">New Lucene and Solr release is out</field> | ||
| <field name="content_type">parentDocument</field> | ||
| <doc> | ||
| <field name="ID">4</field> | ||
| <field name="comments">Lots of new features</field> | ||
| </doc> | ||
| </doc> | ||
| </add> | ||
| ---- | ||
|
|
||
| In this example, we have indexed the parent documents with the field `content_type`, which has the value "parentDocument". | ||
| We could have also used a boolean field, such as `isParent`, with a value of "true", or any other similar approach. | ||
|
|
||
| === JSON Examples | ||
|
|
||
| This example is equivalent to the XML example above. | ||
| Again, the field labelled relationship is preferred. | ||
| The labelled relationship here is one child document but could have been wrapped in array brackets. | ||
| For the anonymous relationship, note the special `\_childDocuments_` key whose contents must be an array of child documents. | ||
|
|
||
| [source,json] | ||
| ---- | ||
| [ | ||
| { | ||
| "ID": "1", | ||
| "title": "Solr adds block join support", | ||
| "content_type": "parentDocument", | ||
| "comments": [{ | ||
| "ID": "2", | ||
| "content": "SolrCloud supports it too!" | ||
| }, | ||
| { | ||
| "ID": "3", | ||
| "content": "New filter syntax" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "ID": "4", | ||
| "title": "New Lucene and Solr release is out", | ||
| "content_type": "parentDocument", | ||
| "_childDocuments_": [ | ||
| { | ||
| "ID": "5", | ||
| "comments": "Lots of new features" | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| ---- | ||
|
|
||
| .Root-Only Mode | ||
| [NOTE] | ||
| In Root-only schemas, these two documents will result in the same docs being indexed (Root-only schemas do not honor nested relationships). | ||
| When queried, child docs will be appended to _childDocuments_ key. | ||
|
|
||
| == Updating Nested Documents | ||
|
|
||
| Currently Solr supports updating whole hierarchies using atomic updates. Documents should be updated by the Root (top) | ||
| document's ID, and the update should contain all its children. This is needed considering Solr deletes the old hierarchy, | ||
| since the update term is `\_root_:id`. In case some child documents are omitted from the update command, | ||
| said documents will be deleted from the index. | ||
|
|
||
| .Updating By a Child Document's ID | ||
| [NOTE] | ||
| An update by ID to a child document will index a new document with the same ID as the one in the nested hierarchy, | ||
| yet the new document will not be indexed as a child, but rather as a new document outside of the nested hierarchy. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add to this page a brief discussion about how updates to parent-child blocks are handled? Specifically, if I send in an updated block that should effectively delete some child docs because while they are in the index, they are not in the new block, what happens? If I should do a separate delete, we should say that clearly; if the "missing" children will be automatically deleted, we should say that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another sub-heading Updating Nested Documents