Phylopic ingestion may fail if build
changes during processing
#3820
Labels
💻 aspect: code
Concerns the software code in the repository
🛠 goal: fix
Bug fix
help wanted
Open to participation from the community
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
🔧 tech: airflow
Involves Apache Airflow
🐍 tech: python
Involves Python
Airflow log link
https://airflow.openverse.engineering/log?execution_date=2024-02-11T00%3A00%3A00%2B00%3A00&task_id=ingest_data.pull_image_data&dag_id=phylopic_workflow&map_index=-1
Description
When performing the Phylopic ingestion, the first step we take is to get the
build_param
:openverse/catalog/dags/providers/provider_api_scripts/phylopic.py
Line 45 in 614720f
I assume this value relates to some index or API version that is being referenced. Recently, we had ingestion fail due to this parameter:
When visiting the above URL, a 410 is indeed returned with the following body:
I suspect that this meant that the build changed while we were doing processing, since the above requests for said build ran fine. In these cases, we probably just want to kick the Phylopic DAG off again from the start. I think it would be erroneous of us to pick up the new build number and continue from the same page as the data may be entirely different.
We have a few options here:
Personally, I'm partial to the latter as it will mean less intervention from operators and the DAG will be able to complete as expected. @WordPress/openverse-catalog, any other thoughts/opinions?
Reproduction
You should be able to replicate this by clicking one of the links above, or triggering the DAG locally with
{"build": 307}
as part of theadditional_query_params
.DAG status
The DAG looks like it's chugging along successfully on a re-run so no need to change the status.
The text was updated successfully, but these errors were encountered: