Increase Wikimedia request timeout #4003
Labels
💻 aspect: code
Concerns the software code in the repository
🛠 goal: fix
Bug fix
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
🔧 tech: airflow
Involves Apache Airflow
Airflow log link
https://airflow.openverse.engineering/log?execution_date=2024-03-24T00%3A00%3A00%2B00%3A00&task_id=ingest_data_day_shift_456.pull_mixed_data_day_shift_456&dag_id=wikimedia_reingestion_workflow&map_index=-1
(one of many examples)
Description
The Wikimedia API is often particularly slow to reply. In many cases the query for a given parameter takes longer than 60 seconds (our default request timeout) to complete and the workflow will fail as a result. Particularly for when the reingestion workflows are running, this can mean that numerous days fail for a given reingestion run. We had a recent run with over a dozen errors all of the type:
For Wikimedia only, we should consider increasing the request timeout to at least 120s to see if this helps reduce the number of timeout issues we have. This can be done by passing requests's
timeout
into theget_response_json
call for theProviderDataIngester
. We actually already have the timeout overridden here, so that'd just need to be bumped up to 120s:openverse/catalog/dags/providers/provider_api_scripts/wikimedia_commons.py
Line 241 in 6636dcf
Reproduction
DAG status
Unchanged
Related issues
#1269
The text was updated successfully, but these errors were encountered: