Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add final set of DspaceMets field methods #193

Merged
merged 2 commits into from
Jun 13, 2024

Conversation

ehanson8
Copy link
Contributor

Purpose and background context

Finish refactoring DspaceMets transform to use field methods. Just under our suggested limit 😬

How can a reviewer manually see the effects of these changes?

Run the following command to see that the DspaceMets transform still transforms a source file:

pipenv run transform -i tests/fixtures/dspace/dspace_mets_records.xml -o output/dspace-mets-transformed-records.json -s dspace

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
* Finish refactoring DspaceMets to use field methods

How this addresses that need:
* Add field methods and associated private methods for contributors, dates, edition, file_formats, format, identifiers, languages, links, numbering, publishers, related_items, rights, subjects, and summary
* Add unit tests for new field methods
* Update create_dspace_mets_source_record_stub function to use different inserts

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-286
Comment on lines 158 to 159
# Only publication date is mapped from DSpace, other relevant date field (dc.
# coverage.temporal) is not mapped to the OAI-PMH METS output.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several of these comments that were copied over, not sure how useful they really are and they feel like clutter to me but wanted to preserve them for your review. There's also this ticket which could render this entire transform unnecessary should we decide to go with DspaceDim for the DSpace@MIT instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring for the moment if we keep DSpaceMETS, I think they'd make nice docstrings vs standalone comments. To me, this is a big benefit of the field methods: a place for this kind of documentation. Here's an example from GeoHarvester. The rub is that our linting requires a first line title (short, ending in period) and then can have freeform text. My two cents is that a fairly generic title like "Field method: " is okay, then can launch into commentary.

I think it's nice to keep them for now, but thanks for sharing the ticket, that it may be moot eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I shifted them over!

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks great, approved.

Now that we've had a decent amount of these PRs coming through, I feel like I'm having some secondary level thoughts about transforms, possible helper methods that could be applied across lots of sources and transforms, BS4 vs Xpath/lxml, etc., but feeling like those might be discussions for future tickets or PRs.

What I am seeing is logic faithfully transferred from imperative code to field methods, keeping the complexity of change low, which will keeps confidence high we're not fundamentally changing the transform outputs, just the architecture of the code. Which, I think is exactly the scope of this work at this time.

Comment on lines 158 to 159
# Only publication date is mapped from DSpace, other relevant date field (dc.
# coverage.temporal) is not mapped to the OAI-PMH METS output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring for the moment if we keep DSpaceMETS, I think they'd make nice docstrings vs standalone comments. To me, this is a big benefit of the field methods: a place for this kind of documentation. Here's an example from GeoHarvester. The rub is that our linting requires a first line title (short, ending in period) and then can have freeform text. My two cents is that a fairly generic title like "Field method: " is okay, then can launch into commentary.

I think it's nice to keep them for now, but thanks for sharing the ticket, that it may be moot eventually.

# Only maps formats with attribute use="ORIGINAL" because other formats such as
# USE="TEXT" are used internally by DSpace and not made publicly available.
file_formats = []
for file_group in source_record.find_all("fileGrp", USE="ORIGINAL"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does USE need to be uppercase? Not a big deal, just surprising to see it that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind! Keeping comment for historical reviewing purposes, but I see now that the attribute USE is uppercase in the DSpace METS.

@@ -4,13 +4,15 @@
from transmogrifier.sources.xml.dspace_mets import DspaceMets


def create_dspace_mets_source_record_stub(xml_insert: str = "") -> BeautifulSoup:
def create_dspace_mets_source_record_stub(
dmdsec_insert: str = "", filesec_insert: str = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nice adjustment of the stub to accomodate more sections. This feels like a perfect way to incrementally adjust complexity of the stub as needed, and not before.

@ehanson8 ehanson8 force-pushed the TIMX-286-dspace-mets-fmr-returns branch from 4adb9a9 to 8b76a84 Compare June 12, 2024 14:46
@ehanson8 ehanson8 merged commit 97b7a19 into main Jun 13, 2024
5 checks passed
@ehanson8 ehanson8 deleted the TIMX-286-dspace-mets-fmr-returns branch June 13, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants