Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backfill image dimensions data #1485

Open
1 task
stacimc opened this issue Aug 2, 2022 · 1 comment
Open
1 task

Backfill image dimensions data #1485

stacimc opened this issue Aug 2, 2022 · 1 comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects

Comments

@stacimc
Copy link
Contributor

stacimc commented Aug 2, 2022

Problem

Depends on #1486

Once we've added image dimensions detection for the providers that don't currently support them, we'll need to backfill the data for previously ingested records. The providers to backfill are:

  • NYPL
  • Smithsonian
  • Walters Art Museum
  • Finnish Museums
  • Europeana*
  • Metropolitan*

* Since Metropolitan and Europeana are dated DAGs, we could potentially rely on their reingestion workflow to backfill the data over time (related: #1501).

Implementation

  • 🙋 I would be interested in implementing this feature.
@stacimc stacimc added 🟨 priority: medium Not blocking but should be addressed soon 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository labels Aug 2, 2022
@obulat
Copy link
Contributor

obulat commented Aug 23, 2022

File size and file type can be backfilled together with the image dimensions data. Here's the information on the file size and file type information:

Provider file type in the script file size in the script backfill for file type backfill for file size
Smithsonian needs to be added needs to be added - -
Raw Pixel needs to be added needs to be added - -
Finnish Museums needs to be added needs to be added - -
NYPL added in WordPress/openverse-catalog#630 needs to be added not run yet -
Phylopic added in WordPress/openverse-catalog#547 needs to be added not run yet -
Metropolitan Museum of Art added in WordPress/openverse-catalog#568 needs to be added not run yet -
Cleveland Museum of Art added in WordPress/openverse-catalog#537 added in WordPress/openverse-catalog#537 not run yet not run yet
Museums Victoria added in WordPress/openverse-catalog#600 needs to be added not run yet -
SMK added in WordPress/openverse-catalog#542 added in WordPress/openverse-catalog#542 not run yet not run yet
Science Museum added in WordPress/openverse-catalog#576 needs to be added not run yet -
Walters Art Museum cannot fix due to #1637 - - -
Brooklyn Museum cannot fix due to #1638 - - -
Europeana needs to be added in fixing #1727 - - -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Status: 📋 Backlog
Openverse
  
Backlog
Development

No branches or pull requests

2 participants