Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unconfirmed] Data corruption after recent upgrade #39

Closed
mappu opened this issue Sep 3, 2018 · 4 comments
Closed

[Unconfirmed] Data corruption after recent upgrade #39

mappu opened this issue Sep 3, 2018 · 4 comments
Labels
blocker Something that needs to be fixed before next release

Comments

@mappu
Copy link

mappu commented Sep 3, 2018

Hi,

Our application stores and loads compressed data on disk using this library. We recently upgraded our application from 0727e17 (tag v1.3.0) to aebefd9 (tag v1.3.4).

After this library upgrade, there were some complaints from users. After generating some JSON data, compressing it, storing it, then later loading it and decompressing it, the data could no longer be parsed (e.g. json.Unmarshal: invalid character '\x00' in string literal).

The issue occurred on (at least) both Windows and macOS, with both Go 1.9.7 and 1.10.3.

I assume it was caused by memory corruption in CGO data buffers.

Reverting this library back to tag v1.3.0 seems to have completely resolved the issue going-forward.

We're not yet able to reproduce the issue, and not all our users were affected (perhaps it's related to OS memory pressure?), but, just a heads-up that this library upgrade was implicated. Once we have an internal reproducer we may be able to bisect it (or hopefully, blame something else and disregard this entire issue).

@x4m
Copy link
Contributor

x4m commented Sep 5, 2018

In WAL-G we observed some related data corruptions. Currently, we are hunting this too.
We have the unstable repro, but it requires a lot of setups, PostgreSQL, PITR, S3 etc.
I'll try testing 1.3.0, but it is very sporadic, kind of depends on the phase of the moon and whether on Mars.

@Viq111
Copy link
Collaborator

Viq111 commented Sep 5, 2018

Thanks for reporting!
I will try to add a fuzzer to see if we can uncover a bug.
In the meantime if you'd have any hint on the size of the payload, parallelism, type of data (or even better a reproducible payload), that would be great

@mappu
Copy link
Author

mappu commented Sep 5, 2018

facebook/zstd#1300 may be related (don't know).

@x4m
Copy link
Contributor

x4m commented Sep 6, 2018

In my case I cannot reproduce the problem with less than 2 MAXGOROUTINES, compressed size does not exceed few Gb. I still suspect that there might be something broken in WAL-G, but lz4 and lzma do not fail on similar tests (but this is not a proof, actually, just a hint...)
v1.3.0 does not work for us either.
Maybe @Tinsane can add some more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker Something that needs to be fixed before next release
Projects
None yet
Development

No branches or pull requests

3 participants