should we make the device dead on a failed endio? (of async io) #35

Closed
akiradeveloper opened this Issue Mar 6, 2014 · 3 comments

Comments

Projects
None yet
2 participants
Owner

akiradeveloper commented Mar 6, 2014

At present, migration is async within a segment and the error handling that it counts the number of failed writes and redo if there is. However, in case of IO failure, the whole system should block up to avoid any further damage (or achieve none theoritically).

It can be setting wb->dead true in case of finding error in endio. BUT the defect of this strategy is that it didn't avoid the consecutive async writes already submitted. Thus, these writes may damage the system. Since knowing the I/O fail prior is theoritically impossible this is really an intrinsic problem.

So, how to solve this?

torn5 commented Jul 16, 2014

Hello Akira,
I think a good way is this:

  • Allow reads to the device if possible (see below how)
  • Fail all subsequent writes to the device (= do not receive any more data)
  • Try not to lose the data you have received, so: commit memory to persistent cache if possible, or to the backend otherwise.

If you ever receive a read error: pass it upstream like it is. Do not disable anything.

------- Read path:
Try read from cache. Uncommitted cache is in RAM memory, so this should never fail AFAIU.

If cache miss: try read from backend. If error (not all sectors of backend will have errors in general), pass error to upstream but do not disable the device further. This is similar to the behaviour of an HDD with some unreadable sectors. The filesystem will probably go readonly by itself and you will receive no more writes.

------- Write path:
Always return error. Do not accept any more data.

Let's call this mode of operation "write_error_mode". Dm-writeboost should not get out of such mode by itself.

You should implement a writethrough mode (via DM message). To reach writethrough, first the writes are flushed (like your drop_caches) and then writethrough mode is really initiated. In such mode the writes go directly to the backend. This is also very useful 1) for decommissioning the cache and 2) in order to more precisely snapshot the backend device.

Let's now suppose that the user has repaired the cache device or the backend device (e.g. fixed the RAID) so that now operation could continue.

From the write_error_mode, the user might want to go to either writethrough or writeback (the normal mode), by sending a DM message.

In both cases, dm-writeboost should attempt to flush the cache to the backend, and if this gives noerror switch to such mode.

If the error was a write error on the cache and it's still faulty... with writeback it will return in write_error_mode shortly. In that case, only writethrough will be possible.

------- Strategy not to lose data right now:
dm-writeboost has just received a write failure either from the cache device or the backend device, and is going into write_error_mode. The cache is dirty.

Try to safeguard the data you received if at least one (cache or backend) is still alive (i.e. support single failure).

If the cache device has returned write error: quickly commit everything to backend (you still have it in RAM AFAIU so this should be no problem). This should give no error. Then you can switch to writethrough mode, not necessarily to write_error_mode.

If the backend device has returned a write error: write everything quickly from the RAM to the cache, then switch to write_error_mode and wait for user intervention. The user will fix the RAID on the backend device then will switch to writeback or writethrough.

Keep up the good work
T.

Owner

akiradeveloper commented Sep 29, 2014

Hi,

Thanks for comments.

To know Writeboost's strategy on failure, you can see IO() macro in dm-writeboost.h. Many I/O submissions across Writeboost's code is wrapped by this macro.

This macro means "Once cache device failed I/O, let's stop the device", which I call block-up. It is like switching all devices to null devices such as /dev/null, which ignores I/Os. After block-up, all side-effects to underlying devices will be ignored.

I really disagree with the concept of write-through in Writeboost because the fact all side-effects are passing through the cache device simplifies the principle in Writeboost. To be honest, I don't understand what would be if it switches to write-through in runtime.

Read doesn't cause any side-effects. So, reading from the backing device (typically is HDD) continues after block-up. Since Writeboost is write-only cache, the reads doesn't hit so often and thus from backing device in many cases, reading from backing device should keep on if possible.

Changing the behavior depending on write error or read error is also what I disagree. I don't trust device once failed otherwise the upstream receives wrong data. Since device failure happens really seldom, this strategy that looks too be on the safe side is OK.

Owner

akiradeveloper commented May 11, 2015

off topic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment