Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

[Infrastructure] Fix Smithsonian NMNH related discrepancies #470

Closed
ChariniNana opened this issue Jul 26, 2020 · 1 comment · Fixed by #474
Closed

[Infrastructure] Fix Smithsonian NMNH related discrepancies #470

ChariniNana opened this issue Jul 26, 2020 · 1 comment · Fixed by #474
Assignees

Comments

@ChariniNana
Copy link
Contributor

ChariniNana commented Jul 26, 2020

Current Situation

As explained in ticket #397 the NMNH (National museum national history) images from the Smithsonian API have discrepancies with regard to certain fields. The creator field is missing for all of the NMNH records and the description is missing for 99.6% of it.

Suggested Improvement

With the initial research conducted, it was observed that the creator field may be populated with the value corresponding to the freetext -> name -> Collector field in the JSON response. Further discussion is necessary to determine whether this is the appropriate field. The description can be taken from the freetext -> notes -> Notes field for some of the images in NMNH.

Benefit

Making this change would improve the completeness of NMNH related data in Smithsonian. More than 95% of the Smithsonian data comes from NMNH and it is important to improve its completeness as much as possible.

Additional context

This issue is related to #397

@ChariniNana
Copy link
Contributor Author

Numbers of missing metadata looks as follows after this implementation

The numbers and percentages of missing creators:-

                         Sub provider | No Creator | Total Images | Missing Percentage
si_national_museum_of_natural_history |     111221 |      3325259 | 3.3447319441884074
               si_american_art_museum |         22 |        11561 | 0.19029495718363462
                  si_anacostia_museum |        322 |          571 |   56.3922942206655
                         si_libraries |          0 |           55 |                0.0
              si_cooper_hewitt_museum |      35980 |        65686 |  54.77575130164723
                           si_gardens |        669 |          689 |  97.09724238026125
                     si_postal_museum |       2900 |         2951 |  98.27177228058285
           si_american_history_museum |        215 |         2266 |  9.488084730803177
   si_african_american_history_museum |       3260 |         7526 | 43.316502790326865
                  si_portrait_gallery |       7084 |        11981 |  59.12695100575912
              si_freer_gallery_of_art |       2931 |         3877 |   75.5996904823317
              si_air_and_space_museum |        237 |         2501 |  9.476209516193522
                si_african_art_museum |          3 |          136 | 2.2058823529411766
            si_american_indian_museum |        168 |          248 |  67.74193548387096
                  si_hirshhorn_museum |          1 |          423 | 0.2364066193853428

The numbers and percentages of missing descriptions in the meta data field:-

                         Sub provider | No Description | Total Images | Missing Percentage
si_national_museum_of_natural_history |        3224038 |      3325259 |  96.95599651034702
               si_american_art_museum |          11561 |        11561 |              100.0
                  si_anacostia_museum |            501 |          571 |  87.74080560420315
                         si_libraries |              0 |           55 |                0.0
              si_cooper_hewitt_museum |           4169 |        65686 |  6.346862345096367
                           si_gardens |              0 |          689 |                0.0
                     si_postal_museum |              2 |         2951 | 0.06777363605557438
           si_american_history_museum |            999 |         2266 |   44.0864960282436
   si_african_american_history_museum |              0 |         7526 |                0.0
                  si_portrait_gallery |          11981 |        11981 |              100.0
              si_freer_gallery_of_art |           3877 |         3877 |              100.0
              si_air_and_space_museum |            319 |         2501 | 12.754898040783686
                si_african_art_museum |              1 |          136 | 0.7352941176470589
            si_american_indian_museum |            248 |          248 |              100.0
                  si_hirshhorn_museum |            423 |          423 |              100.0

@kgodey kgodey moved this from Pending Review to Internships 2020 in Backlog Jul 31, 2020
@kgodey kgodey removed this from Internships 2020 in Backlog Aug 17, 2020
@kgodey kgodey added this to Ready for Development in Active Sprint via automation Aug 17, 2020
@kgodey kgodey moved this from Ready for Development to Done in Active Sprint Aug 17, 2020
@TimidRobot TimidRobot removed this from Done in Active Sprint Jan 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants