Skip to content

Data corruption when using NBD mode #184

@Nikratio

Description

@Nikratio

As far as I can tell, there is something that corrupts data when running s3backer in NBD mode.

I am running ZFS on top of the NBD devices, and I'm getting frequent errors that I think can only be explained by data corruption. For example, when running examples/create_zpool.py and then trying to import the fresh zpool, I have now several times gotten errors that vdev's had the wrong guid, or just general "I/O errors".

When replacing NBD's s3backer backend with the file backend, the problems all seem to go away.

The errors are exactly of the sort that I would expect my eventual consistency (I think s3backer's protection mechanism do not help when s3backer is restarted), but Amazon is pretty clear about offering strong consistency for everything: https://aws.amazon.com/s3/consistency/

Therefore, the only explanation I have is that something is not working right in the NBD-s3backer read or write path.

I was thinking about creating a simple unit test that writes data through the NBD interface (but not using the NBD server itself), reads it back, and confirms that the contents are correct. However, I strongly suspect that the problem is not that straightforward and it is something about the sequence of operations that ZFS performs.

Is s3backer executing NBD requests synchronously, or are they deferred to background threads? Might it be possible if there is an ordering issue with TRIM and WRITE requests or something like that?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions