You can use get_file_list
to retrieve a list of available files based on a storage path and the Airflow connection. Based on the files available on your system storage, this can generate tasks dynamically.
The supported filesystems are file_location
Warning
Fetching a lot of files using this method can lead to overloaded XCOM. This can create lot of parallel tasks when used in dynamic task map expand
method.
The following example retrieves a file list from the GCS bucket and dynamically generates tasks using expand
to upload each listed file to a Bigquery table.
../../../../example_dags/example_dynamic_task_template.py
- :external+airflow
Dynamic task mapping - Apache Airflow <authoring-and-scheduling/dynamic-task-mapping>
- Dynamic tasks - Astronomer