Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vacuum command #97

Closed
xianwill opened this issue Mar 1, 2021 · 5 comments
Closed

Implement vacuum command #97

xianwill opened this issue Mar 1, 2021 · 5 comments
Assignees
Labels
binding/rust Issues for the Rust crate enhancement New feature or request good first issue Good for newcomers

Comments

@xianwill
Copy link
Collaborator

xianwill commented Mar 1, 2021

delta-rs should have a "vacuum table" utility analogous to the one provided by the open source Spark Delta Lake implementation. This utility is useful for cleaning up old files that are no longer referenced by the delta log (e.g. files rewritten by merge statements, optimize command etc.).

See the VacuumCommand in the open source implementation for reference.

@houqp houqp added binding/rust Issues for the Rust crate enhancement New feature or request good first issue Good for newcomers labels Mar 1, 2021
@rtyler
Copy link
Member

rtyler commented May 17, 2021

@fvaleye I think this is actually done right? I'm not clear what work we have left to do

@fvaleye
Copy link
Collaborator

fvaleye commented May 17, 2021

@fvaleye I think this is actually done right? I'm not clear what work we have left to do

Yes, it is already implemented! Hum, we need to improve the tests suite: #227

@MironAtHome
Copy link

Как насчет поправить документацию?
Наверное стоит подновить с вот этого
image
на вот это?
image

@mrk-its
Copy link
Contributor

mrk-its commented Jul 3, 2022

@rtyler @fvaleye It looks like there are still two serious issues with vacuum implementation:

  • vacuum lists all files in dataset using StorageBackend.list_objs. The problem is this function returns all files (including these in subdirectories) on s3 backend and gcs backend (althrough I'm not sure about gcs). On file and azure backends this function lists only first-level files (without recursing to subdirectories).
  • vacuum ignores files not referenced by delta log at all (so not included on DeltaTableState.files() and DeltaTableState.all_tombstones() lists).

@wjones127
Copy link
Collaborator

Resolved by #669.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

7 participants