Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAI-PMH Filegetter only works for DC #505

Open
bondjimbond opened this issue Jul 2, 2019 · 11 comments
Open

OAI-PMH Filegetter only works for DC #505

bondjimbond opened this issue Jul 2, 2019 · 11 comments

Comments

@bondjimbond
Copy link
Collaborator

I want to extract objects from a repository using the MODS metadataPrefix, but I'm finding that I can't get the files.

It turns out that src/filegetters/OaipmhXpath.php only works when the metadataPrefix is DC:

        // Parse out the dc:identifier whose value starts with 'http'.
        $dom = new \DOMDocument;
        $xml = file_get_contents($raw_metadata_path);
        $dom->loadXML($xml);
        $xpath = new \DOMXPath($dom);
        $xpath->registerNamespace('oai_dc', 'http://www.openarchives.org/OAI/2.0/oai_dc/');
        $xpath->registerNamespace('dc', 'http://purl.org/dc/elements/1.1/');
        $download_url_elements = $xpath->query($this->xpathExpression);

We need to either make this file work for multiple metadataPrefix choices, or have a separate fileGetter for MODS.

@mjordan
Copy link
Collaborator

mjordan commented Jul 2, 2019

@bondjimbond there is https://github.com/MarcusBarnes/mik/blob/master/src/metadataparsers/mods/OaiToMods.php. If you use

[METADATA_PARSER]
class = mods\OaiToMods

in conjunction with

[FETCHER]
metadata_prefix = mods

(or whatever the correct metadataPrefix value is) what happens?

@bondjimbond
Copy link
Collaborator Author

[2019-07-02 18:47:56] ErrorException.ERROR: ErrorException {"message":"DOMXPath::query(): Undefined namespace prefix","code":{"record_key":"oai%3Amruir.mtroyal.ca%3A11205%2F98","raw_metadata_path":"/Volumes/Arca/tmp/oaitest_temp/oai%3Amruir.mtroyal.ca%3A11205%2F98.metadata","dom":"[object] (DOMDocument: {})","xml":"<record xmlns=\"http://www.openarchives.org/OAI/2.0/\">\n            <header>\n                <identifier>oai:mruir.mtroyal.ca:11205/98</identifier>\n                <datestamp>2015-06-08T16:02:09Z</datestamp>\n                <setSpec>com_11205_20</setSpec>\n                <setSpec>com_11205_12</setSpec>\n                <setSpec>col_11205_43</setSpec>\n            </header>\n            <metadata><mods:mods xmlns:mods=\"http://www.loc.gov/mods/v3\" xmlns:doc=\"http://www.lyncode.com/xoai\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd\">\n<mods:name>\n<mods:role>\n<mods:roleTerm type=\"text\">author</mods:roleTerm>\n</mods:role>\n<mods:namePart>Hayman, Richard</mods:namePart>\n</mods:name>\n<mods:extension>\n<mods:dateAccessioned encoding=\"iso8601\">2014-02-13T20:44:02Z</mods:dateAccessioned>\n</mods:extension>\n<mods:extension>\n<mods:dateAvailable encoding=\"iso8601\"/>\n</mods:extension>\n<mods:originInfo>\n<mods:dateIssued encoding=\"iso8601\">2009</mods:dateIssued>\n</mods:originInfo>\n<mods:identifier type=\"citation\">Hayman, R. (2009). Human rights software: Information support solutions for social justice. Information for Social Change, 29, 44-67.</mods:identifier>\n<mods:identifier type=\"issn\">1756-901X</mods:identifier>\n<mods:identifier type=\"uri\">http://hdl.handle.net/11205/98</mods:identifier>\n<mods:abstract>Human rights centres and non-governmental organizations (NGOs) have crucial information support needs, many of which can be met by the existing and ongoing development of information technology software applications. For communication and Internet use, the psiphon program allows for secure and anonymous information exchange and distribution, including firewall circumvention. For data collection, organization, encryption, and storage, Martus software can be deployed to help protect sensitive information and identities. Based on documented projects and websites, the following research examines these emancipatory tools to determine: the technologies in use, emergent, and under development; their possible usage in the critical arenas under discussion; and, the greater effects of these technologies as they relate to social justice and information access in the global information society. The purpose is to raise awareness within human rights communities and information centres about the existence and availability of these tools, so that these groups may find appropriate and accessible solutions that match their information support needs. Further, it is hoped that the information presented here will generate open, intercultural, and international discussions of human rights policy development, strategic planning, and implementation.</mods:abstract>\n<mods:language>\n<mods:languageTerm authority=\"rfc3066\">en</mods:languageTerm>\n</mods:language>\n<mods:accessCondition type=\"useAndReproduction\">Attribution-NonCommercial-NoDerivs 2.5 Canada</mods:accessCondition>\n<mods:subject>\n<mods:topic>Human rights</mods:topic>\n</mods:subject>\n<mods:subject>\n<mods:topic>Social justice</mods:topic>\n</mods:subject>\n<mods:subject>\n<mods:topic>Librarianship</mods:topic>\n</mods:subject>\n<mods:titleInfo>\n<mods:title>Human Rights Software: Information Support Solutions For Social Justice</mods:title>\n</mods:titleInfo>\n<mods:genre>Article</mods:genre>\n<mods:objectIdentifierValue>http://mruir.mtroyal.ca/xmlui/bitstream/11205/98/1/Human+Rights+Software.pdf</mods:objectIdentifierValue>\n</mods:mods>\n</metadata>\n        </record>","xpath":"[object] (DOMXPath: {})"},"severity":2,"file":"/Users/brandon/sfuvault/mik/src/filegetters/OaipmhXpath.php","line":61} []
[2019-07-02 18:47:56] ErrorException.ERROR: ErrorException {"message":"problem writing package","record_key":"oai%3Amruir.mtroyal.ca%3A11205%2F98","details":"[object] (mik\\exceptions\\MikErrorException(code: 0):  at /Users/brandon/sfuvault/mik/mik:105)"} []

And if I leave the METADATA_PARSER section at dc\OaiToDc it's the same...

[2019-07-02 18:47:56] ErrorException.ERROR: ErrorException {"message":"DOMXPath::query(): Undefined namespace prefix","code":{"record_key":"oai%3Amruir.mtroyal.ca%3A11205%2F98","raw_metadata_path":"/Volumes/Arca/tmp/oaitest_temp/oai%3Amruir.mtroyal.ca%3A11205%2F98.metadata","dom":"[object] (DOMDocument: {})","xml":"<record xmlns=\"http://www.openarchives.org/OAI/2.0/\">\n            <header>\n                <identifier>oai:mruir.mtroyal.ca:11205/98</identifier>\n                <datestamp>2015-06-08T16:02:09Z</datestamp>\n                <setSpec>com_11205_20</setSpec>\n                <setSpec>com_11205_12</setSpec>\n                <setSpec>col_11205_43</setSpec>\n            </header>\n            <metadata><mods:mods xmlns:mods=\"http://www.loc.gov/mods/v3\" xmlns:doc=\"http://www.lyncode.com/xoai\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd\">\n<mods:name>\n<mods:role>\n<mods:roleTerm type=\"text\">author</mods:roleTerm>\n</mods:role>\n<mods:namePart>Hayman, Richard</mods:namePart>\n</mods:name>\n<mods:extension>\n<mods:dateAccessioned encoding=\"iso8601\">2014-02-13T20:44:02Z</mods:dateAccessioned>\n</mods:extension>\n<mods:extension>\n<mods:dateAvailable encoding=\"iso8601\"/>\n</mods:extension>\n<mods:originInfo>\n<mods:dateIssued encoding=\"iso8601\">2009</mods:dateIssued>\n</mods:originInfo>\n<mods:identifier type=\"citation\">Hayman, R. (2009). Human rights software: Information support solutions for social justice. Information for Social Change, 29, 44-67.</mods:identifier>\n<mods:identifier type=\"issn\">1756-901X</mods:identifier>\n<mods:identifier type=\"uri\">http://hdl.handle.net/11205/98</mods:identifier>\n<mods:abstract>Human rights centres and non-governmental organizations (NGOs) have crucial information support needs, many of which can be met by the existing and ongoing development of information technology software applications. For communication and Internet use, the psiphon program allows for secure and anonymous information exchange and distribution, including firewall circumvention. For data collection, organization, encryption, and storage, Martus software can be deployed to help protect sensitive information and identities. Based on documented projects and websites, the following research examines these emancipatory tools to determine: the technologies in use, emergent, and under development; their possible usage in the critical arenas under discussion; and, the greater effects of these technologies as they relate to social justice and information access in the global information society. The purpose is to raise awareness within human rights communities and information centres about the existence and availability of these tools, so that these groups may find appropriate and accessible solutions that match their information support needs. Further, it is hoped that the information presented here will generate open, intercultural, and international discussions of human rights policy development, strategic planning, and implementation.</mods:abstract>\n<mods:language>\n<mods:languageTerm authority=\"rfc3066\">en</mods:languageTerm>\n</mods:language>\n<mods:accessCondition type=\"useAndReproduction\">Attribution-NonCommercial-NoDerivs 2.5 Canada</mods:accessCondition>\n<mods:subject>\n<mods:topic>Human rights</mods:topic>\n</mods:subject>\n<mods:subject>\n<mods:topic>Social justice</mods:topic>\n</mods:subject>\n<mods:subject>\n<mods:topic>Librarianship</mods:topic>\n</mods:subject>\n<mods:titleInfo>\n<mods:title>Human Rights Software: Information Support Solutions For Social Justice</mods:title>\n</mods:titleInfo>\n<mods:genre>Article</mods:genre>\n<mods:objectIdentifierValue>http://mruir.mtroyal.ca/xmlui/bitstream/11205/98/1/Human+Rights+Software.pdf</mods:objectIdentifierValue>\n</mods:mods>\n</metadata>\n        </record>","xpath":"[object] (DOMXPath: {})"},"severity":2,"file":"/Users/brandon/sfuvault/mik/src/filegetters/OaipmhXpath.php","line":61} []
[2019-07-02 18:47:56] ErrorException.ERROR: ErrorException {"message":"problem writing package","record_key":"oai%3Amruir.mtroyal.ca%3A11205%2F98","details":"[object] (mik\\exceptions\\MikErrorException(code: 0):  at /Users/brandon/sfuvault/mik/mik:105)"} []

@bondjimbond
Copy link
Collaborator Author

@mjordan Really this is about the FileGetter and not the MetadataParser, isn't it? The problem is that I use an XPath to find the link to download, but XPath can't recognize it because the FileGetter defines a Dublin Core namespace and not a MODS namespace.

@bondjimbond
Copy link
Collaborator Author

I added a new filegetter to #504 to address this. No longer saying "undefined namespace prefix" -- now it's just saying "No content file found in oai-pmh record".

@mjordan
Copy link
Collaborator

mjordan commented Jul 4, 2019

@bondjimbond Since you're requesting MODS over OAI, is it safe to assume that you're source repository is Islandora? If so, then yes, I think we should just be grabbing the MODS datastream as a file and not get tangled up in metadata parsers. In that case, we can just fetch the MODS datastream using the (working?) DC metadata parser and then throw away the resulting DC XML files.

I was sure that we already had the ability to fetch any datastream we wanted using the https://github.com/MarcusBarnes/mik/wiki/Toolchain:-OAI-PMH-for-Islandora-repositories toolchain, but I need to confirm that. If not, it won't be difficult to make that happen.

@bondjimbond
Copy link
Collaborator Author

@mjordan Nope, it's actually a DSpace repository. They've got decent MODS, though, so it's nice to be able to pull that down and tweak it instead of extracting DC and then trying to reverse engineer roleTerms etc.

@mjordan
Copy link
Collaborator

mjordan commented Jul 5, 2019

Does DSpace's MODS have a predictable URL where you can download it (as per my last comment) or do you need to get it via OAI as metadata?

@bondjimbond
Copy link
Collaborator Author

You need to get it via OAI, unfortunately. The filename is made of some mix of parts of the title and some seemingly arbitrary numbers.

@mjordan
Copy link
Collaborator

mjordan commented Jul 5, 2019

Can you send me the OAI endpoint via email?

@bondjimbond
Copy link
Collaborator Author

bondjimbond commented Jul 5, 2019

The .ini file (which includes the endpoint) is attached to #502

@bondjimbond
Copy link
Collaborator Author

Also, here's an example file link: http://mruir.mtroyal.ca/xmlui/bitstream/11205/98/1/Human+Rights+Software.pdf

I think the 11205/98/1 is the handle, but the filename is not really predictable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants