Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Python GCSFileSystem.delete does not recursively delete #27605

Open
1 of 15 tasks
timblakely opened this issue Jul 21, 2023 · 12 comments
Open
1 of 15 tasks

[Bug]: Python GCSFileSystem.delete does not recursively delete #27605

timblakely opened this issue Jul 21, 2023 · 12 comments

Comments

@timblakely
Copy link

timblakely commented Jul 21, 2023

What happened?

In the Python SDK, GCSFileSystem.delete suggests directories will be deleted recursively, but that doesn't appear to be the case...?

e.g.I have bucket blakely_dev and the following paths:

gs://blakely_dev/_staging/iteration/1/result
gs://blakely_dev/_staging/iteration/1/output-00000-of-00002
gs://blakely_dev/_staging/iteration/1/output-00001-of-00002

If I pass gs://blakely_dev/_staging/ to .delete(), despite it being a directory and a wildcard being appended if it ends with a /, the following .match() call within .delete() matches neither subdirectories nor the result or output-0000.* files.

Issue Priority

Priority: 3 (minor)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@tvalentyn
Copy link
Contributor

Thanks for reporting! What happens if you delete gs://blakely_dev/_staging/iteration/1/ ?
Note that in GCS there is no concept of directories. there are buckets and objects. / is just a symbol in the object name.

@tvalentyn
Copy link
Contributor

https://stackoverflow.com/questions/52789714/google-cloud-storage-how-to-delete-a-folder-recursively-in-python has some examples how to fetch objects starting with a particular prefix. might be easier once #25676 is fixed.

@tvalentyn
Copy link
Contributor

cc: @BjornPrime

@BjornPrime
Copy link
Contributor

.take-issue

@timblakely
Copy link
Author

Thanks for reporting! What happens if you delete gs://blakely_dev/_staging/iteration/1/ ? Note that in GCS there is no concept of directories. there are buckets and objects. / is just a symbol in the object name.

Yup, I'm aware :) That does remove all the objects, but doesn't "recursively" work.

FYI the match() function seems to function slightly differently than the GCS py client's bucket.list_blobs() as that takes a prefix and delimiter that, if the prefix ends with the delimiter, will return both delimiter-separated "directories" and the files with that prefix. If no delimiter is passed, it matches all files with the prefix, which is what it would seem that match() is intending to do (at least from the docstring :).

@tsafacjo
Copy link

@ AnandInguva Is this issue update ?

@liferoad
Copy link
Collaborator

cc @shunping

@tsafacjo
Copy link

tsafacjo commented Aug 30, 2024

Can I pick it ?

@liferoad
Copy link
Collaborator

@tsafacjo
Copy link

tsafacjo commented Sep 6, 2024

thanks

@tsafacjo
Copy link

tsafacjo commented Sep 6, 2024

@liferoad
Copy link
Collaborator

liferoad commented Sep 7, 2024

@AnandInguva what is the problem for your PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants