Skip to content

Add batch export feature for ProQuest JSON#1097

Merged
jazairi merged 1 commit intomainfrom
etd-588-proquest-export
Jan 11, 2023
Merged

Add batch export feature for ProQuest JSON#1097
jazairi merged 1 commit intomainfrom
etd-588-proquest-export

Conversation

@jazairi
Copy link
Copy Markdown
Contributor

@jazairi jazairi commented Dec 23, 2022

Why these changes are being introduced:

Processors and thesis admins need to be able to export a list of thesis handles that have opted in to ProQuest. An additional export is needed for doctoral theses that have opted out of ProQuest (or have not responded). No additional filters are required for these exports.

Relevant ticket(s):

https://mitlibraries.atlassian.net/browse/ETD-588
https://mitlibraries.atlassian.net/browse/ETD-603

How this addresses that need:

This adds a view that allows processors to review theses that are ready to be exported to ProQuest. Two types of theses meet this criterion:

  • Published theses satisfying an advanced degree, with all authors having consented to ProQuest export. ProQuest will fully harvest these theses (metadata and content).
  • Published theses satisfying a doctoral degree, without all authors having consented to ProQuest export. ProQuest wil partially harvest these theses (metadata only).

Once a processor has reviewwd the list of theses to be exported, they can initiate the export. This will kick off a background job that will produce a JSON file that lists each thesis' handle and distinguishes between full and partial exports. Once the export is ready, the job send an email to the ETD admin list with the JSON attached.

Finally, all exported theses will be associated with a ProquestExportBatch, and the JSON file will be attached to the same as an ActiveStorage attachment.

Side effects of this change:

  • ProquestExportJob currently loops through theses to update them using the
    each method. This is an expensive operation. A more efficient option is
    update_all, which updates attributes without instantiating the objects.
    However, this would not trigger callbacks, so we would need to find another
    way to get paper_trail to update, such as this hack:
    is there a way get papertrail to work when my app uses update_attributes_without_callbacks paper-trail-gem/paper_trail#456 (comment)
  • This adds a has_many/belongs_to relationship between the ProquestExportBatch
    and Thesis models. This may not be necessary since we are storing the JSON
    exports in ActiveStorage, but it could be useful to look up the date a thesis
    was exported and find other theses from the same export.
  • The proquest_exported field has been added to the thesis admin dashboard.
  • The _proquest_status_empty partial is moved to views/shared as it is now
    used by the export preview and the export report.
  • Some unrelated tests in the Report model is updated due to the addition
    of a new authors fixture.

Developer

  • All new ENV is documented in README
  • All new ENV has been added to Heroku Pipeline, Staging and Prod
  • ANDI or Wave has been run in accordance to
    our guide and
    all issues introduced by these changes have been resolved or opened as new
    issues (link to those issues in the Pull Request details above)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Requires database migrations?

YES

Includes new or updated dependencies?

NO

@mitlib mitlib temporarily deployed to thesis-submit-pr-1097 December 23, 2022 22:50 Inactive
@jazairi jazairi force-pushed the etd-588-proquest-export branch from 8ee9bd6 to 7522e0d Compare January 3, 2023 14:28
@coveralls
Copy link
Copy Markdown

coveralls commented Jan 3, 2023

Coverage Status

Coverage: 97.59% (+0.07%) from 97.519% when pulling 4cf2a8a on etd-588-proquest-export into 9aab610 on main.

@jazairi jazairi force-pushed the etd-588-proquest-export branch from cc51b7d to 8546949 Compare January 4, 2023 14:03
@mitlib mitlib temporarily deployed to thesis-submit-pr-1097 January 4, 2023 14:11 Inactive
@JPrevost JPrevost self-assigned this Jan 4, 2023
Copy link
Copy Markdown
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my comments may not make sense. I wanted to share my initial thoughts/questions before we meet later on though. Sorry in advance for any that are gibberish or nonsense :)

Comment thread test/controllers/thesis_controller_test.rb
Comment thread app/controllers/thesis_controller.rb Outdated
Comment thread app/jobs/proquest_export_job.rb
Comment thread app/models/proquest_export_batch.rb Outdated
Comment thread app/views/thesis/_proquest_export_thesis.html.erb
Comment thread test/controllers/thesis_controller_test.rb
Comment thread test/models/proquest_export_batch_test.rb Outdated
Comment thread test/models/thesis_test.rb Outdated
Comment thread test/models/thesis_test.rb Outdated
Comment thread test/models/thesis_test.rb Outdated
@mitlib mitlib temporarily deployed to thesis-submit-pr-1097 January 9, 2023 17:21 Inactive
@jazairi jazairi temporarily deployed to thesis-submit-pr-1097 January 10, 2023 17:17 Inactive
@jazairi jazairi requested a review from JPrevost January 10, 2023 17:22
Copy link
Copy Markdown
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finished the code review portion of this work. I'm not entirely clear how to easily see the results of this work without doing a fair bit of manual tweaking of data.

I'll try to take a look at that later this afternoon but if I can't create a meaningful test set for myself I may defer the "does this work as promised" portion of the check to stakeholders after we merge it as they'll (hopefully) better understand how to get records into a useful state in staging...I hope.

Comment thread test/models/proquest_export_batch_test.rb
Copy link
Copy Markdown
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating to approved. I feel like with the tests and code review I understand this as well as I am going to. Let's just extra communicate with stakeholders on the need to get the staging site to have useful data for them to verify these changes before we move it to prod.

Why these changes are being introduced:

Processors and thesis admins need to be able to export a list of
thesis handles that have opted in to ProQuest. An additional export
is needed for doctoral theses that have opted out of ProQuest (or
have not responded). No additional filters are required for these
exports.

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/ETD-588
* https://mitlibraries.atlassian.net/browse/ETD-603

How this addresses that need:

This adds a view that allows processors to review theses that are
ready to be exported to ProQuest. Two types of theses meet this
criterion:

* Published theses satisfying an advanced degree, with all authors
having consented to ProQuest export. ProQuest will fully harvest
these theses (metadata and content).
* Published theses satisfying a doctoral degree, without all authors
having consented to ProQuest export. ProQuest wil partially harvest
these theses (metadata only).

Once a processor has reviewwd the list of theses to be exported, they
can initiate the export. This will kick off a background job that will
produce a JSON file that lists each thesis' handle and distinguishes
between full and partial exports. Once the export is ready, the job
send an email to the ETD admin list with the JSON attached.

Finally, all exported theses will be associated with a ProquestExportBatch,
and the JSON file will be attached to the same as an ActiveStorage
attachment.

Side effects of this change:

* ProquestExportJob currently loops through theses to update them using the
`each` method. This is an expensive operation. A more efficient option is
update_all, which updates attributes without instantiating the objects.
However, this would not trigger callbacks, so we would need to find another
way to get paper_trail to update, such as this hack:
paper-trail-gem/paper_trail#456 (comment)
* This adds a has_many/belongs_to relationship between the ProquestExportBatch
and Thesis models. This may not be necessary since we are storing the JSON
exports in ActiveStorage, but it could be useful to look up the date a thesis
was exported and find other theses from the same export.
* The proquest_exported field has been added to the thesis admin dashboard.
* The _proquest_status_empty partial is moved to views/shared as it is now
used by the export preview and the export report.
* Some unrelated tests in the Report, Thesis, and User models are updated due
to fixture changes. Where applicable, I tried to update these tests to
use less brittle logic so they won't fail again under similar
circumstances.
@jazairi jazairi force-pushed the etd-588-proquest-export branch from 608d2e8 to 4cf2a8a Compare January 11, 2023 14:30
@jazairi jazairi temporarily deployed to thesis-submit-pr-1097 January 11, 2023 14:31 Inactive
@jazairi jazairi merged commit b24a359 into main Jan 11, 2023
@jazairi jazairi deleted the etd-588-proquest-export branch January 11, 2023 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants