Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Springshare sources Libguides and Research Databases #99

Merged
merged 7 commits into from
Jul 28, 2023

Conversation

ghukill
Copy link
Contributor

@ghukill ghukill commented Jul 26, 2023

What does this PR do?

This PR adds two new sources that transmogrifier can handle: libguides and researchdatabases.

The overall approach is this:

  • a generic OAI Dublin Core transformer OaiDc
  • a Springshare OAI transformer SpringshareOaiDc that extends OaiDc
  • both new sources use SpringshareOaiDc transformer as defined in the source configurations

This PR also updates dependencies.

How can a reviewer manually see the effects of these changes?

Example of libguides transformation:

pipenv run transform -s libguides -i "tests/fixtures/oai_dc/springshare/libguides/libguides_record_all_fields.xml" -o "output/libguides.json"
[
{
  "source": "Libguides",
  "source_link": "https://libguides.mit.edu/materials",
  "timdex_record_id": "libguides:materials",
  "title": "Materials Science & Engineering",
  "citation": "Materials Science & Engineering. Libguides. https://libguides.mit.edu/materials",
  "content_type": [
    "libguides"
  ],
  "contributors": [
    {
      "value": "Ye Li",
      "kind": "Creator"
    }
  ],
  "dates": [
    {
      "value": "2008-06-19T17:55:27"
    }
  ],
  "format": "electronic resource",
  "identifiers": [
    {
      "value": "oai:libguides.com:guides/175846",
      "kind": "OAI-PMH"
    }
  ],
  "links": [
    {
      "url": "https://libguides.mit.edu/materials",
      "kind": "Libguide URL",
      "text": "Libguide URL"
    }
  ],
  "publication_information": [
    "MIT Libraries"
  ],
  "subjects": [
    {
      "value": [
        "Engineering",
        "Science"
      ],
      "kind": "Subject scheme not provided"
    }
  ],
  "summary": [
    "Useful databases and other research tips for materials science."
  ]
}
]

Example of researchdatabases transformation:

pipenv run transform -s researchdatabases -i "tests/fixtures/oai_dc/springshare/research_databases/research_databases_record_all_fields.xml" -o "output/researchdatabases.json"
[
{
  "source": "Research Databases",
  "source_link": "https://libguides.mit.edu/llba",
  "timdex_record_id": "researchdatabases:llba",
  "title": "Linguistics and Language Behavior Abstracts (LLBA)",
  "citation": "Linguistics and Language Behavior Abstracts (LLBA). Research Databases. https://libguides.mit.edu/llba",
  "content_type": [
    "researchdatabases"
  ],
  "dates": [
    {
      "value": "2022-01-28T22:15:37"
    }
  ],
  "format": "electronic resource",
  "identifiers": [
    {
      "value": "oai:libguides.com:az/65257807",
      "kind": "OAI-PMH"
    }
  ],
  "links": [
    {
      "url": "https://libguides.mit.edu/llba",
      "kind": "Research Database URL",
      "text": "Research Database URL"
    }
  ],
  "subjects": [
    {
      "value": [
        "Humanities"
      ],
      "kind": "Subject scheme not provided"
    }
  ],
  "summary": [
    "The most comprehensive index to articles in Linguistics and Language\n          Development and use."
  ]
}
]

Delete this section if it isn't applicable to the PR.

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Includes new or updated dependencies?

YES

Why these changes are being introduced:
Adding two new TIMDEX sources, Libguies and Research Databases, which need to be
transformed into TIMDEX records.

How this addresses that need:
Libguides and Research Databases (called AZ List) are products of Springshare, both come from the same
OAI-PMH feed, and therefore share a very similar DC schema.

For this reason, a parent "SpringshareOaiDc" transformer was created that extracts
most, if not all, of the optional TIMDEX fields from the source XML.

Then, two transformers -- Libguides and ResearchDatabases -- extend this class.  At
this time, they do not modify the parent class's behavior at all, but were kept
distinct in the event they may have slight modifications over time.

The following source names were decided on:
- libguides: Libguides
- researchdatabases: Research Databases, aka the AZ List

An ADR has been included that captures the discussions for naming these sources.

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-227

Additional maintenance:
* Updated dependencies
Why these changes are being introduced:
After some discussion, opted to follow similar patterns with sources like zenodo for extending transformer classes.

How this addresses that need:
Instead of a dedicated method for extending the optional fields, this follows an established pattern of adding methods
within the get_optional_fields method that can be overridden (e.g. get_dates())

Side effects of this change:
* None

Relevant ticket(s):
* None
Why these changes are being introduced:
* Adding these new data sources prompted some discussion and decision about source names that would benefit from being recorded

How this addresses that need:
* Establishes an ADR structure in this project and records the source name discussions as the first added

Side effects of this change:
* None

Relevant ticket(s):
* None
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, happy to chat about any of these suggestions!

tests/springshare/test_libguides.py Outdated Show resolved Hide resolved
tests/springshare/test_springshare_oai.py Outdated Show resolved Hide resolved
transmogrifier/config.py Outdated Show resolved Hide resolved
transmogrifier/sources/oaidc.py Outdated Show resolved Hide resolved
transmogrifier/sources/oaidc.py Outdated Show resolved Hide resolved
transmogrifier/sources/oaidc.py Outdated Show resolved Hide resolved
transmogrifier/sources/springshare.py Show resolved Hide resolved
transmogrifier/sources/springshare.py Outdated Show resolved Hide resolved
transmogrifier/sources/springshare.py Outdated Show resolved Hide resolved
docs/adrs/0001-springshare-source-naming.md Show resolved Hide resolved
ghukill added a commit that referenced this pull request Jul 27, 2023
Why these changes are being introduced:
* Updates stemming from code review for PR #99

How this addresses that need:
* updated test naming
* additional tests for OaiDc get_dates() and get_links() hooks
* fallback on default citation generator
* ensure usage of str(<BS4_element.string) for memory concerns

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-227
Why these changes are being introduced:
* Updates stemming from code review for PR #99

How this addresses that need:
* updated test naming
* additional tests for OaiDc get_dates() and get_links() hooks
* fallback on default citation generator
* ensure usage of str(<BS4_element.string) for memory concerns

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-227
@ghukill ghukill force-pushed the TIMX-227-springshare-sources branch from a87c413 to 982f569 Compare July 28, 2023 12:29
@ghukill
Copy link
Contributor Author

ghukill commented Jul 28, 2023

@ehanson8 - whenever you have time, for your review, changes have been pushed to address the comments above.

@ghukill ghukill requested a review from ehanson8 July 28, 2023 12:43
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of comments but this is minor stuff

tests/springshare/test_springshare.py Outdated Show resolved Hide resolved
tests/springshare/test_springshare.py Outdated Show resolved Hide resolved
tests/springshare/test_springshare.py Outdated Show resolved Hide resolved
tests/springshare/test_springshare.py Outdated Show resolved Hide resolved
tests/springshare/test_springshare.py Outdated Show resolved Hide resolved
tests/springshare/test_springshare.py Outdated Show resolved Hide resolved
tests/springshare/test_research_databases.py Outdated Show resolved Hide resolved
tests/test_oai_dc.py Outdated Show resolved Hide resolved
transmogrifier/sources/springshare.py Show resolved Hide resolved
transmogrifier/sources/springshare.py Outdated Show resolved Hide resolved
* removing docstrings in unit tests in favor of more meaningful test names
* refactor libguides and researchdatabases tests into springshare
* make error and warning logging message patterns from transform consistent with other transforms
@ghukill ghukill requested a review from ehanson8 July 28, 2023 15:13
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, 3 comments!

tests/test_springshare.py Outdated Show resolved Hide resolved
tests/test_springshare.py Outdated Show resolved Hide resolved
tests/test_springshare.py Outdated Show resolved Hide resolved
- DRYed up fixture prefixes
- normalized variable names
@ghukill ghukill requested a review from ehanson8 July 28, 2023 17:26
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go, great work!

@ghukill ghukill merged commit 25649ec into main Jul 28, 2023
4 checks passed
@ghukill ghukill deleted the TIMX-227-springshare-sources branch August 18, 2023 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants