Add batch export feature for ProQuest JSON#1097
Conversation
8ee9bd6 to
7522e0d
Compare
cc51b7d to
8546949
Compare
JPrevost
left a comment
There was a problem hiding this comment.
Some of my comments may not make sense. I wanted to share my initial thoughts/questions before we meet later on though. Sorry in advance for any that are gibberish or nonsense :)
JPrevost
left a comment
There was a problem hiding this comment.
I've finished the code review portion of this work. I'm not entirely clear how to easily see the results of this work without doing a fair bit of manual tweaking of data.
I'll try to take a look at that later this afternoon but if I can't create a meaningful test set for myself I may defer the "does this work as promised" portion of the check to stakeholders after we merge it as they'll (hopefully) better understand how to get records into a useful state in staging...I hope.
JPrevost
left a comment
There was a problem hiding this comment.
Updating to approved. I feel like with the tests and code review I understand this as well as I am going to. Let's just extra communicate with stakeholders on the need to get the staging site to have useful data for them to verify these changes before we move it to prod.
Why these changes are being introduced: Processors and thesis admins need to be able to export a list of thesis handles that have opted in to ProQuest. An additional export is needed for doctoral theses that have opted out of ProQuest (or have not responded). No additional filters are required for these exports. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/ETD-588 * https://mitlibraries.atlassian.net/browse/ETD-603 How this addresses that need: This adds a view that allows processors to review theses that are ready to be exported to ProQuest. Two types of theses meet this criterion: * Published theses satisfying an advanced degree, with all authors having consented to ProQuest export. ProQuest will fully harvest these theses (metadata and content). * Published theses satisfying a doctoral degree, without all authors having consented to ProQuest export. ProQuest wil partially harvest these theses (metadata only). Once a processor has reviewwd the list of theses to be exported, they can initiate the export. This will kick off a background job that will produce a JSON file that lists each thesis' handle and distinguishes between full and partial exports. Once the export is ready, the job send an email to the ETD admin list with the JSON attached. Finally, all exported theses will be associated with a ProquestExportBatch, and the JSON file will be attached to the same as an ActiveStorage attachment. Side effects of this change: * ProquestExportJob currently loops through theses to update them using the `each` method. This is an expensive operation. A more efficient option is update_all, which updates attributes without instantiating the objects. However, this would not trigger callbacks, so we would need to find another way to get paper_trail to update, such as this hack: paper-trail-gem/paper_trail#456 (comment) * This adds a has_many/belongs_to relationship between the ProquestExportBatch and Thesis models. This may not be necessary since we are storing the JSON exports in ActiveStorage, but it could be useful to look up the date a thesis was exported and find other theses from the same export. * The proquest_exported field has been added to the thesis admin dashboard. * The _proquest_status_empty partial is moved to views/shared as it is now used by the export preview and the export report. * Some unrelated tests in the Report, Thesis, and User models are updated due to fixture changes. Where applicable, I tried to update these tests to use less brittle logic so they won't fail again under similar circumstances.
608d2e8 to
4cf2a8a
Compare
Why these changes are being introduced:
Processors and thesis admins need to be able to export a list of thesis handles that have opted in to ProQuest. An additional export is needed for doctoral theses that have opted out of ProQuest (or have not responded). No additional filters are required for these exports.
Relevant ticket(s):
https://mitlibraries.atlassian.net/browse/ETD-588
https://mitlibraries.atlassian.net/browse/ETD-603
How this addresses that need:
This adds a view that allows processors to review theses that are ready to be exported to ProQuest. Two types of theses meet this criterion:
Once a processor has reviewwd the list of theses to be exported, they can initiate the export. This will kick off a background job that will produce a JSON file that lists each thesis' handle and distinguishes between full and partial exports. Once the export is ready, the job send an email to the ETD admin list with the JSON attached.
Finally, all exported theses will be associated with a ProquestExportBatch, and the JSON file will be attached to the same as an ActiveStorage attachment.
Side effects of this change:
eachmethod. This is an expensive operation. A more efficient option isupdate_all, which updates attributes without instantiating the objects.
However, this would not trigger callbacks, so we would need to find another
way to get paper_trail to update, such as this hack:
is there a way get papertrail to work when my app uses update_attributes_without_callbacks paper-trail-gem/paper_trail#456 (comment)
and Thesis models. This may not be necessary since we are storing the JSON
exports in ActiveStorage, but it could be useful to look up the date a thesis
was exported and find other theses from the same export.
used by the export preview and the export report.
of a new authors fixture.
Developer
our guide and
all issues introduced by these changes have been resolved or opened as new
issues (link to those issues in the Pull Request details above)
Code Reviewer
(not just this pull request message)
Requires database migrations?
YES
Includes new or updated dependencies?
NO