New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error recovery in device mapper #3470
Conversation
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
Build succeeded.
|
activeErr = p.activateDevice(ctx, info) | ||
if activeErr != nil { | ||
return activeErr | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
What if crash happens in between these steps and rollback code doesn't get chance to run?
This possibility also applies to other places to change device or metadata, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if crash happens in between these steps and rollback code doesn't get chance to run?
Each device mapper operation is wrapped with state transition function (have a look on 1 and 2). If we're failed to activate a device and crash happens, so no defer
was run, you'll end up with something like this in your metadata database (meaning that the device didn't finish the transition):
DevName=X ID=9 State=Activating Error="failed to activate"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially each operation with devmapper device is recorded (https://github.com/containerd/containerd/blob/master/snapshots/devmapper/device_info.go#L34), so you can get a clear picture about device state and last operation at any time. For example, for CreateThinDevice
the following states will be used:
Unknown <-- preallocated metadata info and devID
Creating <-- createDevice
Created
Activating <-- activateDevice
Activated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
thanks for your detailed explanation ;-) Yes, I know, in the crash case, the metadata DB can reflect correctly the device state before crashing, such as creating, activating etc.
Actually, my question was, in that case, there is no chance to the rollback code to mark the device as faulty
, right?
If so, after machine comes up and containerd get started again, we still have "object already exists" error from the AddDevice() - 878a320#diff-4132e94f99d36d21b09c446fb20687faL100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you crash right in between create and rollback, then yes, a device won't be marked as Faulty
one. In this case, what behavior would you expect to see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a crash happens, containerd must restart again, so devmapper
will be reloaded. Could we take the chance to scan the device list and mark the device as faulty
who's state is not in a final state - Deactivated
and Activated
. Or I feel it's safe to cleanup that device from metadata and thin pool completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you crash right in between create and rollback, then yes, a device won't be marked as
Faulty
one. In this case, what behavior would you expect to see?
Hi,
I just come up this idea. I'd like to know what do you think?thanks.
diff --git a/snapshots/devmapper/pool_device.go b/snapshots/devmapper/pool_device.go
index 22e78c1..4eaab6e 100644
--- a/snapshots/devmapper/pool_device.go
+++ b/snapshots/devmapper/pool_device.go
@@ -150,6 +150,16 @@ func (p *PoolDevice) CreateThinDevice(ctx context.Context, deviceName string, vi
retErr = multierror.Append(retErr, p.metadata.MarkFaulty(ctx, info.Name))
return
}
+
+ // If ErrAlreadyExists returns, it indicates the device name is already in metadata,
+ // but not aware by upper layer. This may occur when crash happens and there is no
+ // chance to mark the device faulty. So, mark it here and upper layer will retry with
+ // the same device name.
+ if metaErr == ErrAlreadyExists {
+ log.G(ctx).Warnf("conflicting thin device %s, mark it faulty", info.Name)
+ retErr = multierror.Append(retErr, p.metadata.MarkFaulty(ctx, info.Name))
+ return
+ }
}()
// Save initial device metadata and allocate new device ID from store
@@ -241,6 +251,12 @@ func (p *PoolDevice) CreateSnapshotDevice(ctx context.Context, deviceName string
retErr = multierror.Append(retErr, p.metadata.MarkFaulty(ctx, snapInfo.Name))
return
}
+
+ if metaErr == ErrAlreadyExists {
+ log.G(ctx).Warnf("conflicting snapshot %s, mark it faulty", snapInfo.Name)
+ retErr = multierror.Append(retErr, p.metadata.MarkFaulty(ctx, snapInfo.Name))
+ return
+ }
}()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed this use case in #3489. In case of crash, the snapshotter will mark devices as faulty after restart. Can you check if it works for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed this use case in #3489. In case of crash, the snapshotter will mark devices as faulty after restart. Can you check if it works for you?
Great, thanks!
Hmm, devmapper: deferred remove can break consistency |
This makes sense. devmapper needs more precise control when to do unmounts, and certainly that should happen before remove. Though I think we don't need |
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
Build succeeded.
|
Codecov Report
@@ Coverage Diff @@
## master #3470 +/- ##
==========================================
- Coverage 44.24% 44.22% -0.03%
==========================================
Files 124 124
Lines 13732 13780 +48
==========================================
+ Hits 6076 6094 +18
- Misses 6725 6747 +22
- Partials 931 939 +8
Continue to review full report at Codecov.
|
Yes, thanks. |
Do you plan to add some documentation as part of this PR for what a user can do to remediate the faulty device ID and to mark it as ok once it has been remediated? If not, is that planned for a follow-up PR? |
Once it's correctly marked as faulty, I think it's safe to just delete those devices which are not aware by upper layer and contains user data. |
@samuelkarp Yes, I plan to update the README in a follow up PR.
@renzhengeek SGTM. At init time we can check devices that are not in final state and mark them faulty. However I'd prefer to play safe and leave any device deletion for user in order to prevent unexpected situations. Sometimes it's possible to restore a device instead of cleanup. |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Should any of this cleanup/err handling get taken back into |
@estesp no, this is devmapper specific err handling. |
This PR attempts to mitigate some cases described in #3436
It introduces faulty state for devices and devmapper device ID.
A device that is failed to create/activate and failed to rollback (delete back), will be marked as faulty in metadata store (instead of cleaning up the metadata and ending up in an inconsistent state). The snapshotter will try to recreate this device with a different devID in a subsequent call. This would allow us to move on and prevent the snapshotter to entirely stuck on a first faulty ID.
A faulty device ID means that the snapshotter could not handle it (nor create nor rollback), and it needs manual handling. It'll be marked in metadata store and won't be reused by snapshotter until addressed and manually marked as ok.
@renzhengeek @jiazhiguang Could you please give it a try on your env.