Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty setName elements are regarded errors #15

Closed
cessda-bitbucket-importer opened this issue Oct 15, 2021 · 10 comments
Closed

Empty setName elements are regarded errors #15

cessda-bitbucket-importer opened this issue Oct 15, 2021 · 10 comments
Assignees
Labels
bug Something isn't working major
Milestone

Comments

@cessda-bitbucket-importer
Copy link
Contributor

Original report on BitBucket by Toni Sissala (GitHub: toni-sissala).


#13

The https://validator.oaipmh.com/ considers emtpy setName elements in ListSets response as errors. This is not clearly stated in the OAI-PMH specifications at http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm

There are two issues regarding the Aggregator: language and source -sets. The language-set comes from Kuha2 and could be fixed in kuha_oai_pmh_repo_handler.

The language sets

<set>
<setSpec>language:de</setSpec>
<setName/>
</set>

Can use programmatically generated setName in following way:

<set>
<setSpec>language:de</setSpec>
<setName>Language de</setName>
</set>

Or can be fully discarded in the Aggregator since they were not required by the specs. How should I proceed?

The source sets

<set>
<setSpec>source:127.0.0.1</setSpec>
<setName/>
</set>

Are specific to the aggregator and are configurable via yaml file. Example yaml file https://github.com/cessda/cessda.cdc.aggregator.oai-pmh-repo-handler/blob/main/sources_default.yaml

The source setSpec in the above example comes automatically from the source OAI-envelope (see https://github.com/cessda/cessda.cdc.aggregator.oai-pmh-repo-handler/blob/main/cdcagg_oai/metadataformats.py#lines-248). This is a fallback for sources which do not have a set description in the yaml file. The fallback functionality may not be desirable by CESSDA,. at least it was not in the specs. In either case, the yaml file must support specifying a setName for each set.

I will remove the fallback functionality for sources and develop functionality which will read setName information from the YAML file.

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by John Shepherdson (GitHub: john-shepherdson).


I am not aware of any use case for the language sets, but @schildwaechter may be.

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by Carsten Thiel (GitHub: schildwaechter).


No current use for the language sets, but if it’s already there, I prefer the workaround of specifying a setName as I can imagine interest in this.

One might consider putting the full text language in English, i.e. “German” in the example, but not sure it’s a good idea and/or worth the effort.

On the sources, I agree with dropping fallback as that would be implicit.

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by Toni Sissala (GitHub: toni-sissala).


I created a PR addressing the issues described here. The PR is at [link to pull request removed](link to pull request removed)

I also removed the default mapping profile and changed the default value for configuring it’s path to None. This means that in order to activate source-sets, you need to create a mapping file and configure it using --oai-set-sources-path or env-var CDCAGG_OPRH_OS_SOURCES_PATH.

The described approach means that mapping of aggregator sources to sets and configuring a mapping file path is decoupled from application source code. Same approach is used for the configurable OAI set definitions (arbitrary OAI-sets) which was also one of the specified requirements for the aggregator.

There is an example mapping file at https://github.com/cessda/cessda.cdc.aggregator.oai-pmh-repo-handler/blob/9cfc2d2be9007f5849211b1328846d68790149bb/sources_set.yaml.example

I can of course help in creating the mapping file and configuring it, but I think it should be decoupled from the application source code and default configs.

Any thoughts? @schildwaechter

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by John Shepherdson (GitHub: john-shepherdson).


@toni-sissala We are working to extract the (main) configuration from the code for the other components in the CDC pipeline. The first step is to identify those configuration settings (uses, permitted values, current locations). See https://docs.google.com/document/d/1_hBg590ARtruDfx31UOdHZ36zlNrOfRBoc4eq93maTI/edit

I would be grateful if you could add any relevant information for the aggregator.

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by Toni Sissala (GitHub: toni-sissala).


@john-shepherdson I will add information regarding the aggregator. I don’t have access rights to the document. Can you grant my google account email address removed access?

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by John Shepherdson (GitHub: john-shepherdson).


Access granted

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by Toni Sissala (GitHub: toni-sissala).


Thanks, John!

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by Toni Sissala (GitHub: toni-sissala).


@john-shepherdson I marked parameters that are common in aggregator and other components. The aggregator has mandatory config parameters that must be set, but they are not common with other components' parameters listed in the document, so I did not add those to the list. I will provide more details regarding mandatory and useful configuration parameters in the documentation of the aggregator.

I hope I understood correctly.

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by Toni Sissala (GitHub: toni-sissala).


The PR is accepted and merged, and discussion here is silent. I’m resolving this issue.

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by John Shepherdson (GitHub: john-shepherdson).


No further action required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working major
Projects
None yet
Development

No branches or pull requests

2 participants