Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timx 287 ead field method refactor 2 #187

Merged
merged 1 commit into from
Jun 3, 2024

Conversation

jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented May 31, 2024

Purpose and background context

Field method refactor for transform class Ead (Part 2).

  • Add field methods and corresponding unit tests for contents, contributors, dates,
    and identifiers.
  • Add private method for get_dates

How can a reviewer manually see the effects of these changes?

  1. Run make test and verify all unit tests are passing.
  2. Run CLI command
    pipenv run transform -i tests/fixtures/ead/ead_record_all_fields.xml -o output/ead-transformed-records.json -s aspace
    
    Output:
    2024-05-31 13:20:14,820 INFO transmogrifier.cli.main(): Logger 'root' configured with level=INFO
    2024-05-31 13:20:14,820 INFO transmogrifier.cli.main(): No Sentry DSN found, exceptions will not be sent to Sentry
    2024-05-31 13:20:14,820 INFO transmogrifier.cli.main(): Running transform for source aspace
    2024-05-31 13:20:14,837 WARNING transmogrifier.sources.transformer.get_valid_title(): Record repositories/2/resources/1 has multiple titles. Using the first title from the following titles found: ['Charles J. Connick Stained Glass Foundation Collection', 'Title 2', 'Title 3']
    2024-05-31 13:20:14,853 INFO transmogrifier.cli.main(): Completed transform, total records processed: 1, transformed records: 1, skipped records: 0, deleted records: 0
    2024-05-31 13:20:14,853 INFO transmogrifier.cli.main(): Total time to complete transform: 0:00:00.033002
    

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

@jonavellecuerdo jonavellecuerdo self-assigned this May 31, 2024
Comment on lines -589 to -597
def test_ead_transform_with_multiple_unitid_gets_valid_ids():
ead_xml_records = Ead.parse_source_file(
"tests/fixtures/ead/ead_record_attribute_and_subfield_variations.xml"
)
output_record = next(Ead("aspace", ead_xml_records))
for identifier in output_record.identifiers:
assert identifier.value != "unitid-that-should-not-be-identifier"


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehanson8 Related to #185 (comment). I removed the test and in its place are the tests for the new get_identifiers field method. 🤓

@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review May 31, 2024 17:21
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks great! I'm requesting the dreaded changes just for consistency in the positional/keyword usage. It's ultimately not that big of a deal, but while we're on the topic, seems worth mentioning.

transmogrifier/sources/xml/ead.py Outdated Show resolved Hide resolved
transmogrifier/sources/xml/ead.py Outdated Show resolved Hide resolved
Comment on lines +441 to +446
source_record_id = cls.get_source_record_id(source_record)
dates.extend(
cls._parse_date_elements(collection_description_did, source_record_id)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're not addressing this now, but just want to continue to note that pulling the source_record_id, just so it can be logged for bad date parsing, feels awkward.

Apologies if an issue or note exists elsewhere, but just in case, created this one: #188.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and thanks for creating the issue!

transmogrifier/sources/xml/ead.py Show resolved Hide resolved
Comment on lines +773 to +789
def test_get_contributors_success():
source_record = create_ead_source_record_stub(
metadata_insert=(
"""
<origination label="Creator">
<persname>
Author, Best E.
<part>( <emph> Best <emph>Ever</emph> </emph> )</part>
</persname>
</origination>
"""
),
parent_element="did",
)
assert Ead.get_contributors(source_record) == [
timdex.Contributor(value="Author, Best E. ( Best Ever )", kind="Creator")
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a random example, but really digging these tests.

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great but a few suggestions!

tests/sources/xml/test_ead.py Show resolved Hide resolved
]


def test_get_dates_success():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a non-range date value as well so we're testing everything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change has been made in commit 95b6b47)

transmogrifier/sources/xml/ead.py Outdated Show resolved Hide resolved
Comment on lines 472 to 492
# get valid date ranges or dates
if "/" in date_value:
gte_date, lte_date = date_value.split("/")
if gte_date != lte_date:
if validate_date_range(
gte_date,
lte_date,
source_record_id,
):
date_instance.range = timdex.DateRange(
gte=gte_date,
lte=lte_date,
)
else:
# get valid date (if dates in ranges are the same)
date_string = gte_date
if validate_date(
date_string,
source_record_id,
):
date_instance.value = date_string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be broken out as a separate method to improve readability, Datacite has a _parse_date_range method that would be very similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the changes in latest commit!

@jonavellecuerdo jonavellecuerdo force-pushed the TIMX-287-ead-field-method-refactor-2 branch from 29e6d6e to 95b6b47 Compare June 3, 2024 13:54
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Approved.

elif validate_date(date_value, source_record_id):
date_instance.value = date_value
if "/" in date_string:
date_instance = cls._parse_date_range(date_string, source_record_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nice change. I almost commented in the previous PR that this field method was a bit unwieldy, but struggling a bit with how much of that refactoring -- particularly around dates -- to suggest in this first pass of just getting to a field method approach.

Either way, much easier to scan and understand now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, great work!

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

elif validate_date(date_value, source_record_id):
date_instance.value = date_value
if "/" in date_string:
date_instance = cls._parse_date_range(date_string, source_record_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, great work!

Comment on lines +488 to +490
def _parse_date_range(cls, date_string: str, source_record_id: str) -> timdex.Date:
date_instance = timdex.Date()
gte_date, lte_date = date_string.split("/")
Copy link
Contributor

@ehanson8 ehanson8 Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly different approach from Datacite to instantiate a separate date instance in the method rather than pass the existing one from get_dates but as Graham said we'll certainly be evaluating and hopefully centralizing a lot of the date parsing functionality from out of the source transforms. We can decide on the preferred approach then!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

* Add field methods and corresponding unit tests:
  contents, contributors, dates, and identifiers
* Update generate_name_identifier_url
  * Clarify that it is a private method associated with 'get_contributors'
* Update syntax for 'create_string_*' and 'create_list_*' methods
  to use keyword args
@jonavellecuerdo jonavellecuerdo force-pushed the TIMX-287-ead-field-method-refactor-2 branch from 95b6b47 to 9a736b9 Compare June 3, 2024 14:51
@jonavellecuerdo jonavellecuerdo merged commit d0e002d into main Jun 3, 2024
5 checks passed
@jonavellecuerdo jonavellecuerdo deleted the TIMX-287-ead-field-method-refactor-2 branch June 3, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants