As far as I can tell, there is something that corrupts data when running s3backer in NBD mode.
I am running ZFS on top of the NBD devices, and I'm getting frequent errors that I think can only be explained by data corruption. For example, when running examples/create_zpool.py and then trying to import the fresh zpool, I have now several times gotten errors that vdev's had the wrong guid, or just general "I/O errors".
When replacing NBD's s3backer backend with the file backend, the problems all seem to go away.
The errors are exactly of the sort that I would expect my eventual consistency (I think s3backer's protection mechanism do not help when s3backer is restarted), but Amazon is pretty clear about offering strong consistency for everything: https://aws.amazon.com/s3/consistency/
Therefore, the only explanation I have is that something is not working right in the NBD-s3backer read or write path.
I was thinking about creating a simple unit test that writes data through the NBD interface (but not using the NBD server itself), reads it back, and confirms that the contents are correct. However, I strongly suspect that the problem is not that straightforward and it is something about the sequence of operations that ZFS performs.
Is s3backer executing NBD requests synchronously, or are they deferred to background threads? Might it be possible if there is an ordering issue with TRIM and WRITE requests or something like that?
As far as I can tell, there is something that corrupts data when running s3backer in NBD mode.
I am running ZFS on top of the NBD devices, and I'm getting frequent errors that I think can only be explained by data corruption. For example, when running
examples/create_zpool.pyand then trying to import the fresh zpool, I have now several times gotten errors that vdev's had the wrong guid, or just general "I/O errors".When replacing NBD's s3backer backend with the file backend, the problems all seem to go away.
The errors are exactly of the sort that I would expect my eventual consistency (I think s3backer's protection mechanism do not help when s3backer is restarted), but Amazon is pretty clear about offering strong consistency for everything: https://aws.amazon.com/s3/consistency/
Therefore, the only explanation I have is that something is not working right in the NBD-s3backer read or write path.
I was thinking about creating a simple unit test that writes data through the NBD interface (but not using the NBD server itself), reads it back, and confirms that the contents are correct. However, I strongly suspect that the problem is not that straightforward and it is something about the sequence of operations that ZFS performs.
Is s3backer executing NBD requests synchronously, or are they deferred to background threads? Might it be possible if there is an ordering issue with TRIM and WRITE requests or something like that?