Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 464 #465

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Issue 464 #465

wants to merge 3 commits into from

Conversation

mjordan
Copy link
Collaborator

@mjordan mjordan commented Apr 16, 2018

Github issue: #464

What does this Pull Request do?

Adds a fetcher manipulator that restricts the objects harvested via OAI to ones whose OBJ (or other designated datastream) have one of the specified MIME types.

What's new?

A new class file, src/fetchermanipulators/OaipmhIslandoraByMimetype.php, and some minor cleanup on testing for an HTTP 200 in src/filegetters/OaipmhIslandoraObj.php.

How should this be tested?

There are no PHPUnit tests for this fetcher manipulator.

To test, use the attached .ini file.

First, run MIK with the fetcher manipulator configured to only harvest objects with the MIME type image/jpeg. This will harvest all 73 objects in the collection:

 ./mik -c issue-464.ini 
Commencing MIK.
Filtering 73 records through the OaipmhIslandoraByMimetype fetcher manipulator.
====================================================================================================> 100%
Creating 73 Islandora ingest packages. Please be patient.
====================================================================================================> 100%
Done. Output packages are in /tmp/oaitest_output. Log is at /tmp/oaitest_output/mik.log
Completed in 0.27316334644953 minutes.

Your output directory should contain .xml and .jpeg files for all 73 objects.

Then, uncomment the fetcher manipulator entry in the .ini file with the image/png MIME type and comment out the other entry. Then rerun MIK, making sure that you delete your output and temp directories first:

 ./mik -c issue-464.ini 
Commencing MIK.
Filtering 73 records through the OaipmhIslandoraByMimetype fetcher manipulator.
====================================================================================================> 100%
Creating 0 Islandora ingest packages. Please be patient.
Done. Output packages are in /tmp/oaitest_output. Log is at /tmp/oaitest_output/mik.log
Completed in 0.10025108655294 minutes.

Your output directory should contain no .xml and .jpeg files, since none of the objects in the harvested collection had the image/png MIME type.

Additional Notes

Wiki entry for this new manipulator is at https://github.com/MarcusBarnes/mik/wiki/Fetcher-manipulator:-OaipmhIslandoraByMimetype. We should link to this wiki entry in the "Manipulators" section of https://github.com/MarcusBarnes/mik/wiki/Toolchain:-OAI-PMH-for-Islandora-repositories.

Interested parties

@MarcusBarnes @bondjimbond

issue-464.ini.txt

@bondjimbond
Copy link
Collaborator

I'll hopefully get to testing this tomorrow. Looks like a promising feature.

@bondjimbond
Copy link
Collaborator

Ran the first leg of the test (ini file unchanged), and got problem records for every object. Retrieved XML but no jpeg.

@mjordan
Copy link
Collaborator Author

mjordan commented May 3, 2018

What's in your mik.log?

@mjordan
Copy link
Collaborator Author

mjordan commented May 3, 2018

And manipulator.log

@bondjimbond
Copy link
Collaborator

bondjimbond commented May 3, 2018

[2018-05-03 12:49:02] ErrorException.ERROR: ErrorException {"message":"Undefined index: datastream_ids","code":{"settings":{"CONFIG":{"config_id":"oai-test","last_updated_on":"2017-02-21","last_update_by":"bw"},"SYSTEM":{"date_default_timezone":"America/Vancouver","verify_ca":"0"},"FETCHER":{"class":"Oaipmh","oai_endpoint":"https://nwcc.arcabc.ca/oai2/","set_spec":"nwcc_freda2","metadata_prefix":"oai_dc","temp_directory":"/tmp/oaitest_temp"},"METADATA_PARSER":{"class":"dc\\OaiToDc"},"FILE_GETTER":{"class":"OaipmhIslandoraObj","temp_directory":"/tmp/oaitest_temp"},"WRITER":{"class":"Oaipmh","output_directory":"/tmp/oaitest_output","postwritehooks":["/usr/bin/php extras/scripts/postwritehooks/oai_dc_to_mods.php"]},"MANIPULATORS":{"fetchermanipulators":["OaiMissingFileSet"]},"LOGGING":{"path_to_log":"/tmp/oaitest_output/mik.log","path_to_manipulator_log":"/tmp/oaitest_output/manipulator.log"}}},"severity":8,"file":"/Users/Brandon/mik/src/filegetters/OaipmhIslandoraObj.php","line":41} []
[2018-05-03 12:49:02] ErrorException.ERROR: ErrorException {"message":"problem instantiating fileGetterClass","details":"[object] (mik\exceptions\MikErrorException(code: 0): at /Users/Brandon/mik/mik:105)"} []
[2018-05-03 12:49:06] ErrorException.ERROR: ErrorException {"message":"Undefined variable: filtered_file_list","code":{"file_list":["/tmp/oaitest_output/mik.log"],"filetered_file_list":[],"pattern":"/tmp/oaitest_output/*","file_path":"/tmp/oaitest_output/mik.log"},"severity":8,"file":"/Users/Brandon/mik/src/fetchermanipulators/OaiMissingFileSet.php","line":131} []
[2018-05-03 12:51:23] config.INFO: MIK Configuration {"config_id":"oai-test"} []
[2018-05-03 12:51:23] config.INFO: MIK Configuration {"last_updated_on":"2017-02-21"} []
[2018-05-03 12:51:23] config.INFO: MIK Configuration {"last_update_by":"bw"} []
[2018-05-03 12:51:23] Info.INFO: MIK started running May 3, 2018, 5:51 am [] []

@mjordan
Copy link
Collaborator Author

mjordan commented May 3, 2018

OK, thanks, "Undefined index: datastream_ids" should make it easy to fix, but I'm wondering why it worked for me. Will take a look this evening.

@bondjimbond
Copy link
Collaborator

Sorry, I gave you the wrong log output!

[2018-05-03 20:04:22] ErrorException.ERROR: ErrorException {"message":"problem writing package","record_key":"oai%3Adigital.lib.sfu.ca%3Ahiv_1","details":"[object] (GuzzleHttp\Exception\RequestException(code: 0): No system CA bundle could be found in any of the the common system locations.\nPHP versions earlier than 5.6 are not properly configured to use the system's\nCA bundle by default. In order to verify peer certificates, you will need to\nsupply the path on disk to a certificate bundle to the 'verify' request\noption: http://docs.guzzlephp.org/en/latest/clients.html#verify. If you do not\nneed a specific certificate bundle, then Mozilla provides a commonly used CA\nbundle which can be downloaded here (provided by the maintainer of cURL):\nhttps://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt. Once\nyou have a CA bundle available on disk, you can set the 'openssl.cafile' PHP\nini setting to point to the path to the file, allowing you to omit the 'verify'\nrequest option. See http://curl.haxx.se/docs/sslcerts.html for more\ninformation. at /Users/Brandon/mik/vendor/guzzlehttp/guzzle/src/Exception/RequestException.php:52, RuntimeException(code: 0): No system CA bundle could be found in any of the the common system locations.\nPHP versions earlier than 5.6 are not properly configured to use the system's\nCA bundle by default. In order to verify peer certificates, you will need to\nsupply the path on disk to a certificate bundle to the 'verify' request\noption: http://docs.guzzlephp.org/en/latest/clients.html#verify. If you do not\nneed a specific certificate bundle, then Mozilla provides a commonly used CA\nbundle which can be downloaded here (provided by the maintainer of cURL):\nhttps://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt. Once\nyou have a CA bundle available on disk, you can set the 'openssl.cafile' PHP\nini setting to point to the path to the file, allowing you to omit the 'verify'\nrequest option. See http://curl.haxx.se/docs/sslcerts.html for more\ninformation. at /Users/Brandon/mik/vendor/guzzlehttp/guzzle/src/functions.php:199)"} []

@mjordan
Copy link
Collaborator Author

mjordan commented May 3, 2018

Can you add verify_ca = false to your .ini file's {SYSTEM] section and try again?

@bondjimbond
Copy link
Collaborator

Set it to false, still problems.

[2018-05-03 20:30:40] ErrorException.ERROR: ErrorException {"message":"problem writing package","record_key":"oai%3Adigital.lib.sfu.ca%3Ahiv_1","details":"[object] (GuzzleHttp\Exception\RequestException(code: 0): No system CA bundle could be found in any of the the common system locations.\nPHP versions earlier than 5.6 are not properly configured to use the system's\nCA bundle by default. In order to verify peer certificates, you will need to\nsupply the path on disk to a certificate bundle to the 'verify' request\noption: http://docs.guzzlephp.org/en/latest/clients.html#verify. If you do not\nneed a specific certificate bundle, then Mozilla provides a commonly used CA\nbundle which can be downloaded here (provided by the maintainer of cURL):\nhttps://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt. Once\nyou have a CA bundle available on disk, you can set the 'openssl.cafile' PHP\nini setting to point to the path to the file, allowing you to omit the 'verify'\nrequest option. See http://curl.haxx.se/docs/sslcerts.html for more\ninformation. at /Users/Brandon/mik/vendor/guzzlehttp/guzzle/src/Exception/RequestException.php:52, RuntimeException(code: 0): No system CA bundle could be found in any of the the common system locations.\nPHP versions earlier than 5.6 are not properly configured to use the system's\nCA bundle by default. In order to verify peer certificates, you will need to\nsupply the path on disk to a certificate bundle to the 'verify' request\noption: http://docs.guzzlephp.org/en/latest/clients.html#verify. If you do not\nneed a specific certificate bundle, then Mozilla provides a commonly used CA\nbundle which can be downloaded here (provided by the maintainer of cURL):\nhttps://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt. Once\nyou have a CA bundle available on disk, you can set the 'openssl.cafile' PHP\nini setting to point to the path to the file, allowing you to omit the 'verify'\nrequest option. See http://curl.haxx.se/docs/sslcerts.html for more\ninformation. at /Users/Brandon/mik/vendor/guzzlehttp/guzzle/src/functions.php:199)"} []

@mjordan
Copy link
Collaborator Author

mjordan commented May 3, 2018

As far as I know, that problem is specific to Macs, so I'm afraid I can't be of much help troubleshooting it. See https://github.com/MarcusBarnes/mik/wiki/Cookbook:-Running-MIK-on-Mac-OS-X, which is based on information at the official Guzzle documentation at http://docs.guzzlephp.org/en/stable/request-options.html#verify. @MarcusBarnes any suggestions?

@bondjimbond
Copy link
Collaborator

Another error -- I tried using the regular CSV Single File toolchain with this branch (by accident), got the following:

Fatal error: An iterator cannot be used with foreach by reference in /Users/Brandon/mik/src/fetchers/Csv.php on line 93

On the master branch it's fine.

@mjordan
Copy link
Collaborator Author

mjordan commented May 8, 2018

Branches are out of sync. I'll need to cut a new one from the most recent master. Won't get a chance to do that until after noon my time.

@mjordan
Copy link
Collaborator Author

mjordan commented May 8, 2018

If you got far enough to find that glitch, it sounds like the certificate stuff is no longer a problem. Is that the case?

@bondjimbond
Copy link
Collaborator

@mjordan No, it's still the case for OAI toolchain. (Just ran it again to confirm.) Just an additional problem with this branch.

@mjordan mjordan mentioned this pull request Nov 11, 2018
@mjordan mjordan mentioned this pull request Mar 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants