Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should zstd delete incomplete archives? #4007

Open
Mitmischer opened this issue Mar 31, 2024 · 5 comments
Open

Should zstd delete incomplete archives? #4007

Mitmischer opened this issue Mar 31, 2024 · 5 comments
Assignees

Comments

@Mitmischer
Copy link

Is your feature request related to a problem? Please describe.
The last issue related to #4005 and maybe the most debatable.
When the disk fills up during compression, the resulting archive will be corrupt. It's kept by zstd and not deleted.
That gives a false sense of security and may evoke some annoying post-processing to find and remove corrupt archives.
To me, corrupt archives have no benefit as long as the source file still exists (which must be checked prior to archive deletion).

Describe the solution you'd like
zstd should remove corrupt archives it produces.
Make it an instant operation to check that the archive is corrupt (maybe by altering a critical checksum/flag in the first bytes of the archive). Currently, zstd will notice the corruption as a premature end (which is good), but is has to read the whole archive for that. That's unnecessary.

Describe alternatives you've considered
Keep the current behavior, that is however one probable cause for issues #4005 and #4006.

Additional context
None.

@terrelln
Copy link
Contributor

Agreed, this would be a desirable behavior

@Cyan4973
Copy link
Contributor

Cyan4973 commented Apr 1, 2024

We already have some code which is supposed to remove the destination file when an error happens :
https://github.com/facebook/zstd/blob/v1.5.6/programs/fileio.c#L1887

Why is it not active in the reported scenario is unclear. To be investigated.

@Cyan4973 Cyan4973 self-assigned this Apr 1, 2024
@Cyan4973
Copy link
Contributor

Cyan4973 commented Apr 1, 2024

Confirmed :
zstd correctly detects that the device is full, and issues a relevant error message:
error 70 : Write error : cannot write block : No space left on device
but then, the resulting incomplete *.zst file remains present on the device, so it's not removed automatically.
To be investigated.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Apr 1, 2024

The error is triggered in the recent-ish fileio_asyncio.c :
https://github.com/facebook/zstd/blob/v1.5.6/programs/fileio_asyncio.c#L49

Essentially, the program terminates execution immediately.
As a consequence, it doesn't get a chance to reach the previously mentioned code
which would erase the destination file.

The expected cleaner behavior is that the code must fail, and return with an error to the main fileio.c unit, where destination cleaning can happen.
But due to the multithreaded nature of the code within asyncio.c, I don't know yet how doable that is.

Another alternative could be to trap program termination and implement some kind of cleaning there, but I'm even more concerned about increasing bug-producing surface and how ugly code could become.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Apr 1, 2024

Indeed, it's not straightforward to propagate an error signal throughout the asyncio multi-threaded layer.
No "simple" solution I'm afraid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants