-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Description
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
This affects Airflow 2.7.2. It appears that the 10.9.0 version of apache-airflow-providers-google fails to list objects in gcs.
Example to recreate:
pipenv --python 3.8
pipenv shell
pip install apache-airflow==2.7.2 apache-airflow-providers-google==10.9.0
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://'Then create the following python test file:
from airflow.providers.google.cloud.hooks.gcs import GCSHook
result = GCSHook().list(
bucket_name='a-test-bucket,
prefix="a/test/prefix",
delimiter='.csv'
)
result = list(result)
print(result)The output if this is:
[]
In a different pipenv environment, this works when using Airflow 2.7.1 and the 10.7.0 version of the provider:
pipenv --python 3.8
pipenv shell
pip install apache-airflow==2.7.1 apache-airflow-providers-google==10.7.0
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://'Use the same python test file as above. The output of this is a list of files as expected.
this appears to be the commit which may have broken things.
The hooks/gcs.py file can be patched in the following way which appears to force the lazy loading to kick in:
print("Forcing loading....")
all_blobs = list(blobs)
for blob in all_blobs:
print(blob.name)
if blobs.prefixes:
ids.extend(blobs.prefixes)
else:
ids.extend(blob.name for blob in all_blobs)
page_token = blobs.next_page_token
if page_token is None:
# empty next page token
breakExample patch file:
+++ gcs.py 2023-10-12 11:34:00.774206013 +0000
@@ -829,12 +829,19 @@
versions=versions,
)
+ print("Forcing loading....")
+ all_blobs = list(blobs)
+
+ for blob in all_blobs:
+ print(blob.name)
+
if blobs.prefixes:
ids.extend(blobs.prefixes)
else:
- ids.extend(blob.name for blob in blobs)
+ ids.extend(blob.name for blob in all_blobs)
page_token = blobs.next_page_token
+
if page_token is None:
# empty next page token
break
What you think should happen instead
The provider should be able to list files in gcs.
How to reproduce
Please see above for the steps to reproduce.
Operating System
n/a
Versions of Apache Airflow Providers
10.9.0 of the google provider.
Deployment
Other 3rd-party Helm chart
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct