Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16183: XML Loader: support indexing single nested child document #1448

Merged
merged 9 commits into from Mar 14, 2023

Conversation

vinayakphegde
Copy link
Contributor

https://issues.apache.org/jira/browse/SOLR-16183

Description

The XML Loader always treats labelled nested children as a "multi-valued pseudo-field", whereas the JSON Loader can also index a single child document as well as an array of 1..n child documents.

The XML Loader should have a corresponding syntax for a single labelled nested child.

Solution

support a name attribute on a not-in-a- nested to achieve single labelled nested child.

Tests

created a test called testXMLSingleLabeledNestedChild() in AddBlockUpdateTest to test the single labelled nested child.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Copy link
Member

@mkhludnev mkhludnev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I really like it!

@@ -332,6 +333,11 @@ public SolrInputDocument readDoc(XMLStreamReader parser) throws XMLStreamExcepti
} else {
log.debug(message);
}
} else if (NAME.equals(attrName)) {
Copy link
Member

@mkhludnev mkhludnev Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also allows to specify this attribute for <doc it other context, but ignore it then. I can't say it's a blocker; it slightly changes behavior, and may create wrong expectation for users.. In doubts.. .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right
we can pass an extra argument called isSingleLabeledChildDoc to the readDoc function
if isSingleLabeledChildDoc is false and still we have the name attribute, then we can log the warning like we are doing now.
what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if (NAME.equals(attrName)) {
isSingleLabeledChildDoc = true;
doc.addField(attrVal, readDoc(parser));
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another unexpected attribute may hide after the valid name, or perhaps (??) it allows to repeat name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we are not concerned about other attributes, right? before this, we were skipping all the attributes
does it make sense to repeat the name attribute? is there any use case for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. You are right.

}
}
if (isSingleLabeledChildDoc) {
isSingleLabeledChildDoc = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we reduce the scope for this var declaring it at line 426 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true will do that.

@noblepaul
Copy link
Contributor

just out of curiosity, do you really use XML for indexing docs?

@vinayakphegde
Copy link
Contributor Author

just out of curiosity, do you really use XML for indexing docs?

I never used it. not sure about other

@vinayakphegde
Copy link
Contributor Author

can I make it a normal PR now?

@mkhludnev
Copy link
Member

can I make it a normal PR now?

Why not? Looks good to me

@vinayakphegde vinayakphegde marked this pull request as ready for review March 13, 2023 14:46
@vinayakphegde
Copy link
Contributor Author

Hey, @mkhludnev Solr Tests check is failing here. How do I re-trigger that? because all the tests were passed locally.
and do I have to add something to CHANGES.txt?

// flag to prevent spaces after doc from being added
isLabeledChildDoc = true;
if (!doc.containsKey(name)) {
doc.setField(name, Lists.newArrayList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the slight loose in logic, but I think it's ok.

}
}

StringBuilder text = new StringBuilder();
String name = null;
String currentFieldName = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't resist from make it more descriptive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)

@@ -34,6 +34,20 @@ Other Changes
Previously, the modules would come transitively.
(David Smiley)

================== 9.3.0 ==================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ain't sure. Why so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mkhludnev I don't understand the question here
can you elaborate a bit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected === 9.3.0 === header occurrence in main. @vinayakphegde it should be fine.

@epugh
Copy link
Contributor

epugh commented Mar 14, 2023

I love that we're getting these small differences sorted out!

// flag to prevent spaces after doc from being added
isLabeledChildDoc = true;
if (!doc.containsKey(name)) {
doc.setField(name, Lists.newArrayList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's caused too much hard, turn back to wrapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to revert the changes that you added now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on. I'm on it.

@mkhludnev mkhludnev merged commit 6bd9e2a into apache:main Mar 14, 2023
1 of 2 checks passed
@mkhludnev
Copy link
Member

Merged. Thanks @vinayakphegde .
Would you mind to create cherry-pick PR into branch_9x?

@vinayakphegde
Copy link
Contributor Author

Merged. Thanks @vinayakphegde . Would you mind to create cherry-pick PR into branch_9x?

sure

mkhludnev added a commit to mkhludnev/solr that referenced this pull request Mar 15, 2023
…apache#1448)

* support indexing single nested child document

---------

Co-authored-by: Mikhail Khludnev <mkhl@apache.org>
mkhludnev added a commit that referenced this pull request Mar 15, 2023
…#1448) (#1462)

* support indexing single nested child document

---------

Co-authored-by: vinayak hegde <vinayakph123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants