Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr Index processor don't parse the attributes on the otherEntity on an EML object #1361

Closed
taojing2002 opened this issue Jun 19, 2019 · 6 comments
Assignees
Milestone

Comments

@taojing2002
Copy link
Contributor

Eric from GRIL reported that the attributes of some eml objects can't be indexed. It turns out that those attributes are under the otherEntity element. Our processor only parses the attributes under the dataTable element.

@taojing2002
Copy link
Contributor Author

Here is xpath for the attributeName:

//dataTable/attributeList/attribute/attributeName/text()

Chris suggests it can be

//attributeList/attribute/attributeName/text()

Now we have this xpath for the attributeUnit:

/dataTable//standardUnit/text() | //dataTable//customUnit/text()

I propose to change to:

//attributeList/attribute//standardUnit/text() | //attributeList/attribute//customUnit/text()

@taojing2002
Copy link
Contributor Author

The all fields I think should be modified are:

eml.attributeName //dataTable/attributeList/attribute/attributeName/text()
eml.attributeLabel //dataTable/attributeList/attribute/attributeLabel/text()
eml.attributeDescription //dataTable/attributeList/attribute/attributeDefinition/text()
eml.attributeUnit //dataTable//standardUnit/text() | //dataTable//customUnit/text()
eml.attributeTextRoot //dataTable/attributeList/attribute
eml.attributeName.noDupe  //dataTable/attributeList/attribute/attributeName/text()
eml.attributeLabel.noDupe  //dataTable/attributeList/attribute/attributeLabel/text()
eml.attributeDescription.noDupe //dataTable/attributeList/attribute/attributeDefinition/text()
eml.attributeUnit.noDupe //dataTable//standardUnit/text() | //dataTable//customUnit/text()

The new values I propose to be:

eml.attributeName //attributeList/attribute/attributeName/text()
eml.attributeLabel //attributeList/attribute/attributeLabel/text()
eml.attributeDescription //attributeList/attribute/attributeDefinition/text()
eml.attributeUnit //attributeList/attribute//standardUnit/text() | //attributeList/attribute//customUnit/text()
eml.attributeTextRoot //attributeList/attribute
eml.attributeName.noDupe  //attributeList/attribute/attributeName/text()
eml.attributeLabel.noDupe  //attributeList/attribute/attributeLabel/text()
eml.attributeDescription.noDupe //attributeList/attribute/attributeDefinition/text()
eml.attributeUnit.noDupe //attributeList/attribute//standardUnit/text() | //attributeList/attribute//customUnit/text()

Please review the change (particularly the eml.attributeTextRoot, eml.attributeUnit and eml.attributeUnit.noDupe )

The all list of fields can be found:
https://repository.dataone.org/software/cicore/trunk/cn-buildout/dataone-cn-index/usr/share/dataone-cn-index/debian/index-generation-context/application-context-eml-base.xml

@csjx
Copy link
Member

csjx commented Sep 10, 2019

Hi @taojing2002 - This looks correct to me. Thanks for evaluating the XPATHs.

@taojing2002
Copy link
Contributor Author

@csjx Thanks for reviewing.

@datadavev
Copy link

Looks correct to me as well.

@taojing2002
Copy link
Contributor Author

@datadavev thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants