Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption when not closing on Windows #476

Closed
zippoxer opened this issue May 6, 2018 · 15 comments
Closed

Data corruption when not closing on Windows #476

zippoxer opened this issue May 6, 2018 · 15 comments
Labels
status/needs-attention This issue needs more eyes on it, more investigation might be required before accepting/rejecting it

Comments

@zippoxer
Copy link

zippoxer commented May 6, 2018

#470 describes and fixes a lock file issue on Windows. The author also describes a data corruption problem on Windows. I can reproduce this problem as well. Quite easily.

On my 64-bit Windows 10 machine, to simulate a crash (opening Badger and not closing it), I ran the following program three times:

import (
	"fmt"
	"github.com/dgraph-io/badger"
)

func main() {
	opts := badger.DefaultOptions
	opts.Dir = "crashtest"
	opts.ValueDir = "crashtest"
	_, err := badger.Open(opts)
	if err != nil {
		fmt.Println(err)
	}
}

The first time I ran it, it terminated gracefully. The second run, Badger said there's a lock file, so I removed it and ran the program again. The third time, Badger told me it's corrupted:

Unable to replay value log: "crashtest\\000000.vlog": Data corruption detected.
Value log truncate required to run DB. This would result in data loss.

This shouldn't happen. I'm using the default options, and badger's writes should be crash-safe.

@manishrjain
Copy link
Contributor

That error triggers if there is a need to truncate the file. Can't reproduce this in Linux. Given no write, this should produce a value log of zero length, can you confirm?

@zippoxer
Copy link
Author

zippoxer commented May 6, 2018

The value log is actually 2,147,483,648 bytes long. Even if I am writing before crashing, the value log is still the same size.

@manishrjain
Copy link
Contributor

I think in Windows, we have to create a 2GB file to make things work. That's why it is 2GB.

Unfortunately, I don't have a way to test Badger in Windows. Could use help here from the community.

@manishrjain manishrjain added the status/needs-attention This issue needs more eyes on it, more investigation might be required before accepting/rejecting it label May 9, 2018
@djdv
Copy link
Contributor

djdv commented May 9, 2018

@manishrjain
I'm free to run tests on Windows machines if you need them. Ping me anytime.

@manishrjain
Copy link
Contributor

Hey @djdv , could you see if you can replicate this bug, and suggest/test solutions?

@djdv
Copy link
Contributor

djdv commented May 9, 2018

The case above is the same for me, on the 3rd run I get

Unable to replay value log: "crashtest\000000.vlog": Data corruption detected. Value log truncate required to run DB. This would result in data loss.

with a 2,147,483,648 byte vlog.

Unfortunately, I may not be much help on actually debugging the issue at the moment, but can certainly run tests.

@djdv
Copy link
Contributor

djdv commented May 9, 2018

It seems as though something has to be done to handle dynamically sized memory-mapped files on Windows.
Probably by handling shrinking, growing, and migration manually.
I would assume that on vlog size changes, a new mapping would have to be created, the old data moved over, and references to it would have to be updated, as well as removing the old backing file.

Handling this without impacting performance seems like it could be complex.
Maybe there is a better solution, but that is what came to mind right now. Take into account I'm not familiar with badgers internals yet.

@manishrjain
Copy link
Contributor

manishrjain commented May 10, 2018

We're already doing all that for Windows.

func Mmap(fd *os.File, write bool, size int64) ([]byte, error) {

I remember that in Windows, we need to expand the size of a file to the max size upfront.

if fi.Size() < size {

So, I think what's happening here is that this file which has been expanded beyond it's written data, gets left behind when Badger is crashed in windows. And when replaying the value log, Badger determines that it needs to truncate the file to bring it back to it's valid written data. Now, that truncation was changed recently to not auto-truncate, because of this issue: #434 (comment)

This is what is confusing users. In linux, truncation error means there might be a data loss. But, in windows, that's just how it works. You need to trucate because we have to overallocate upfront due to the nature of how file mmap works.

So, it's not really corruption. In windows, you must pass the Truncate option. In fact, it could be tested by writing a few key-value pairs, then crashing the instance, and seeing if any of those ever get lost. I bet they wouldn't.

I'm inclined to close this issue -- unless someone can prove an actual data loss.

Summary: Set Truncate option to true on Windows. It is needed to make Windows work.

@manishrjain
Copy link
Contributor

Closing the issue. Feel free to reopen if there's an actual data loss.

@schomatis
Copy link
Contributor

@manishrjain How then can the user differentiate if it's truncating because of how Windows works or because there was actually some data loss that resulted in an inconsistent DB?

@manishrjain
Copy link
Contributor

Hard to tell, honestly. The chance of actually losing confirmed data with sync writes on, is almost nil. Can only happen if the hard drive has issues and is flipping bits.

@robinchesterman
Copy link

I have a similar issue running dgraph on a Windows VM. Anytime the VM is restarted i.e. forces dgraph to stop unexpectedly, I am unable to restart dgraph server. Lock files exist (which can be deleted), but then I get:

Error while creating badger KV WAL store error: Value log truncate required to run DB. This might result in data loss.

I don't know about any data loss. What I don't know how to do is actually get dgraph to restart at all.

@manishrjain
Copy link
Contributor

Hmm... we'll have to set the truncate option in Dgraph. I'll raise a PR.

@manishrjain
Copy link
Contributor

I've submitted a change, would be part of v1.0.7 release:
dgraph-io/dgraph@6648f14

@AJAviJain
Copy link

Where to set truncate option? I am getting an error >> Unable to find log file. Please retry. so i think might be the error is same, server and zero are getting started but for some queries in ratel browser, giving this error. (using windows)

luca-moser added a commit to iotaledger/iota.go that referenced this issue Apr 25, 2019
according to dgraph-io/badger#476 (comment) Truncate needs to be set to true under Windows to make BadgerDB work without spitting out corruption errors.

Signed-off-by: Luca Moser <moser.luca@gmail.com>
Dictor added a commit to Dictor/Every-Logger that referenced this issue Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/needs-attention This issue needs more eyes on it, more investigation might be required before accepting/rejecting it
Development

No branches or pull requests

6 participants