New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corruption of persistent database file cause by sudden lost of power #189

Closed
thanhvtruong opened this Issue Jun 20, 2016 · 3 comments

Comments

Projects
None yet
3 participants
@thanhvtruong

thanhvtruong commented Jun 20, 2016

It looks like when the persistent file is being save there is a tiny amount of time where a sudden power lost will cause the persistent database file to be corrupted.

Thousands of sudden power lost on our system were performed we notice the following:

  • Mosquitto start up with "invalid argument" and "database" read error.
  • mosquitto.db and mosquitto.db.new both existed in /var/lib/mosquitto
  • Both mosquitto.db and mosquitto.db.new have the same inode. (suggest that a rename has occurred)

We have a similar problem with another application that we build and reading into Linux documentation, we found out that flushing or closing a file is not enough to write the content to disk. An fsync need to be perform to confirm that the content is written to disk.

You can see the documentation in the man page (man close, under "note" second paragraph) on Fedora 23.

"A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored, use fsync(2). (It will depend on the disk hardware at this point.)"

@ralight ralight added this to the Fixes-next milestone Jun 26, 2016

ralight added a commit that referenced this issue Jun 26, 2016

[189] Call fsync after persisting data.
To ensure it is correctly written. Closes #189.

Thanks to thanhvtruong.

Bug: #189
@ralight

This comment has been minimized.

Show comment
Hide comment
@ralight

ralight Jun 26, 2016

Contributor

Thanks for the report, I've added code to call fsync() on the file and on its directory. Could you please confirm whether this fixes the problem for you?

Contributor

ralight commented Jun 26, 2016

Thanks for the report, I've added code to call fsync() on the file and on its directory. Could you please confirm whether this fixes the problem for you?

@kcallin

This comment has been minimized.

Show comment
Hide comment
@kcallin

kcallin Jun 27, 2016

Contributor

Recommend calling fflush before fsync to ensure that application buffers are completely flushed to kernel buffer before being flushed to disk. It's awful hard to reproduce this, but between thanhvtruong and myself we started a long-term test series to mechanically verify.

I do not believe the directory sync is required; the rename logic should work as-is.

I opened a pull requrest for these changes and will update as the long-term tests progress.

Contributor

kcallin commented Jun 27, 2016

Recommend calling fflush before fsync to ensure that application buffers are completely flushed to kernel buffer before being flushed to disk. It's awful hard to reproduce this, but between thanhvtruong and myself we started a long-term test series to mechanically verify.

I do not believe the directory sync is required; the rename logic should work as-is.

I opened a pull requrest for these changes and will update as the long-term tests progress.

kcallin added a commit to kcallin/mosquitto that referenced this issue Jul 6, 2016

[189] Mosquitto database corrupted on power-loss.
Mosquitto database writes are not atomic and if power is lost during
a write the file will be permanently lost.  This commit makes writes as
atomic as possible.

Signed-off-by: Keegan Callin <kc@kcallin.net>
Bug: eclipse#189

ralight added a commit that referenced this issue Aug 16, 2016

[189] Mosquitto database corrupted on power-loss. (#206)
Mosquitto database writes are not atomic and if power is lost during
a write the file will be permanently lost.  This commit makes writes as
atomic as possible.

Signed-off-by: Keegan Callin <kc@kcallin.net>
Bug: #189
@ralight

This comment has been minimized.

Show comment
Hide comment
@ralight

ralight Aug 16, 2016

Contributor

Thanks very much for your work on this, I'm closing this now based on your pull request.

Contributor

ralight commented Aug 16, 2016

Thanks very much for your work on this, I'm closing this now based on your pull request.

@ralight ralight closed this Aug 16, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment