Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

Retrieve sub providers within Smithsonian #455

Merged
merged 6 commits into from Jul 29, 2020

Conversation

ChariniNana
Copy link
Contributor

Fixes

Fixes #454 by @ChariniNana, Related to #392, Related to #451

Description

This addresses the requirement of retrieving all sub providers within Smithsonian. There are two aspects to this requirement which are as follows:

Retrieve sub-providers at the API level, as and when pulling data from the Smithsonian API.
Update the existing Smithsonian related information present in the database to reflect the sub-provider information

Technical details

The content of the 'unit_code' field of the Smithsonian API response helps to identify the sub providers uniquely. We maintain a mapping of the sub provider name to the 'unit_code' value(s) to help with the sub provider retrieval. The 'unit_code' value is stored as meta data in the image store.

Since our requirement is to categorise every image under unique sub providers, we expect the 'unit_code' value of each image to correspond to some sup provider in our mapping. If we happen to encounter an unknown 'unit_code' we throw an error and terminate the program execution. Since the 'unit_code' values supported by Smithsonian can change over time, we need to have a mechanism of frequently checking whether our known set of unit code values is up to date. If such a mechanism is available, we can update the unit code, sub provider mapping prior to executing Smithsonian sub-provider retrieval, and avoid raising errors. This is monitored in a seperate ticket #451

  1. At the API script level, when an image is processed, we get the sub provider corresponding to the 'unit_code' value and set the source field in the Image Store to the relevant sub provider. If the 'unit_code' is unknown we throw an error.
  2. At the DB level, we initially execute a select query to retrieve the foreign identifier and the 'unit_code' values for all images from Smithsonian where the source values are not yet updated. Next, we process the output row by row, and if the 'unit_code' value is known, we set the corresponding row's source value to the relevant sub-provider value in the DB. If the 'unit_code' value is unknown we throw an error.

The workflow smithsonian_sub_provider_update_workflow allows triggering the DB update related to Smithsonian sub-provider retrieval.

Tests

  1. API script level sub provider retrieval: The function test_process_image_data_with_sub_provider within test_smithsonian test suite checks whether the source is properly set when a sub provider from our mapping is encountered.
  2. DB level sub provider update: The function test_update_smithsonian_sub_providers within test_sql checks the successful updating of the image table.
  3. Test for the workflow created for DB sub-provider update is: test_smithsonian_dag_loads_with_no_errors within the test_sub_provider_update_workflow test suite.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the master branch of the repository.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@ChariniNana ChariniNana requested review from a team, kss682 and mathemancer and removed request for a team July 7, 2020 23:31
@kgodey kgodey added this to In Progress in Active Sprint Jul 7, 2020
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think this looks good!

@mathemancer mathemancer merged commit 4d33aac into master Jul 29, 2020
Active Sprint automation moved this from In Progress to Done Jul 29, 2020
@kgodey kgodey deleted the smithsonian_sub_providers branch July 29, 2020 13:51
@TimidRobot TimidRobot removed this from Done in Active Sprint Jan 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Retrieve sub providers within Smithsonian
2 participants