Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

Retrieve new Smithsonian unit codes #465

Merged
merged 9 commits into from Aug 4, 2020
Merged

Conversation

ChariniNana
Copy link
Contributor

Fixes

Fixes #451 by @ChariniNana

Description

This implementation helps keep all the the Smithsonian unit codes maintained in the SMITHSONIAN_SUB_PROVIDERS dictionary of the provider_details.py file up-to-date. The dictionary SMITHSONIAN_SUB_PROVIDERS maintains all known unit codes associated with Smithsonian images to help with the retrieval of corresponding sub-provider values. However, if there's an update to the unit code values at the Smithsonian API level, if we are unaware of them, issues would arise when we attempt to retrieve Smithsonian sub provider values. Therefore, we have implemented a workflow which can be used to frequently check for potential changes to unit codes at the Smithsonian API level, and manually update the SMITHSONIAN_SUB_PROVIDERS dictionary to reflect those changes.

Technical details

The latest unit codes maintained at the Smithsonian API level for images can be retrieved by calling the following end point: https://api.si.edu/openaccess/api/v1.0/terms/unit_code?q=online_media_type:Images&api_key=REDACTED
We retrieve the latest unit codes by calling this endpoint, and any unit code that is currently not seen in the SMITHSONIAN_SUB_PROVIDERS dictionary is stored in a table called smithsonian_new_unit_codes. The logic appears in the smithsonian_unit_codes.py program. The logic can be executed by triggering the check_new_smithsonian_unit_codes_workflow via the Airflow UI, and you will see the smithsonian_new_unit_codes table getting updated with the latest unit codes we need to add to the SMITHSONIAN_SUB_PROVIDERS dictionary. If no new unit codes are seen, the smithsonian_new_unit_codes table would be empty. Please not that a person who maintains the CC repo is expected to do the actual update in the SMITHSONIAN_SUB_PROVIDERS dictionary.

Tests

  1. The test_smithsonian_unit_codes.py test suite checks that the new unit code retrieval logic works correctly
  2. The test_check_new_smithsonian_unit_codes_workflow.py test suite verifies that the corresponding workflow dag is loaded properly

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the default branch of the repository (main or master).
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@kgodey kgodey added this to In Progress in Active Sprint Jul 16, 2020
@ChariniNana ChariniNana marked this pull request as ready for review July 19, 2020 21:52
@ChariniNana ChariniNana requested review from a team, kss682 and mathemancer and removed request for a team July 19, 2020 21:52
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main change I'd request is to try to make sure not to duplicate any constants from other parts of the code base. Over time, this will make things easier to maintain. See my specific comments for details. Otherwise, please make sure that the function raises an exception if human intervention is required, since it's unlikely I'll remember to always check that table.

Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thank you!

@mathemancer mathemancer merged commit d45848c into master Aug 4, 2020
Active Sprint automation moved this from In Progress to Done Aug 4, 2020
@mathemancer mathemancer deleted the smithsonian_unit_code_check branch August 4, 2020 09:14
@TimidRobot TimidRobot removed this from Done in Active Sprint Jan 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

[Infrastructure] Create a workflow for alerting about newly added categories (unit codes) in Smithsonian
2 participants