Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit tags field for images #1557

Closed
obulat opened this issue May 20, 2022 · 3 comments
Closed

Audit tags field for images #1557

obulat opened this issue May 20, 2022 · 3 comments
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects

Comments

@obulat
Copy link
Contributor

obulat commented May 20, 2022

Problem

Currently, we have a tag cleanup process running in the ingestion server as well in the catalog.

Description

We should audit the tags in the database, and run one-time cleanup if necessary. The things we should check for:

  • tags in the denylist
  • tags that have accuracy lower than 90 (machine-generated tags that are probably incorrect)
  • duplicate tags.

We should also review the tags cleanup process in the API and add anything from it that's missing in the catalog.

@obulat obulat added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository data normalization labels May 20, 2022
@krysal
Copy link
Member

krysal commented Aug 22, 2022

Related to #1566.

@obulat obulat added the 🧱 stack: catalog Related to the catalog and Airflow DAGs label Feb 24, 2023
@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
@dhruvkb dhruvkb added this to the Data normalization milestone Dec 2, 2023
@krysal
Copy link
Member

krysal commented Feb 28, 2024

We should also review the tags cleanup process in the API and add anything from it that's missing in the catalog.

@obulat I don't see any tags cleanup/management in the API besides what is done at ingestion. Have you identified any other treatments that should be considered for one-time cleaning? If not, I think we can close this issue and open a new one specifically for the cleanup.

@obulat
Copy link
Contributor Author

obulat commented Mar 11, 2024

I think we can close this issue as the problem has been described better in other issues.

@obulat obulat closed this as completed Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Archived in project
Openverse
  
Backlog
Development

No branches or pull requests

3 participants