Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raid: add assemble option #1826

Closed
wants to merge 2 commits into from
Closed

raid: add assemble option #1826

wants to merge 2 commits into from

Conversation

lehmanju
Copy link

@lehmanju lehmanju commented Mar 4, 2024

Adds option to assemble raid devices before creating new ones. Allows for persistent raid configurations without wiping existing data.

Haven't tested this yet, but my main goal is to provide an option to reassemble existing raids when reprovisioning. Looking forward to feedback.

adds option to assemble raid devices before creating new ones. allows for persistent raid configurations without wiping existing data.
@lehmanju
Copy link
Author

Corresponding issue #579

@prestist
Copy link
Collaborator

@lehmanju thank you for working on this!

Firstly have you gotten a chance to test this?

Copy link
Collaborator

@prestist prestist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thank you for working on this, it looks sane from my understanding, it is missing some translation bits, and a few nits here and there. lmk.

@@ -168,6 +168,8 @@ root:
desc: the number of spares (if applicable) in the array.
- name: options
desc: any additional options to be passed to mdadm.
- name: assemble
desc: try to assemble raid array from the list of devices before creating it. Defaults to false.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wdyt "Attempts to assemble the RAID array from the list of devices before creating it. Defaults to false."

"assembling %q", md.Name,
); err != nil {
s.Logger.Info("mdadm assemble failed: %v", err)
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be able to simplify some of this nesting wdyt?

}

if _, err := s.Logger.LogCmd(
exec.Command(distro.MdadmCmd(), args...),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might help readablity if we make this command a var

aka cmd:= exec.Command(distro.MdadmCmd(), args...)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super opinionated on this, just a passing thought.

@@ -74,6 +74,9 @@ func (s stage) createRaids(config types.Config) error {
if err := s.waitOnDevices([]string{devName}, "raids"); err != nil {
s.Logger.Info("mdadm assemble failed: %v", err)
} else {
if err := s.waitOnDevices([]string{devName}, "raids"); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment for this, or explain it a bit in the commit message?

@@ -236,6 +236,9 @@
"items": {
"type": "string"
}
},
"assemble": {
Copy link
Collaborator

@prestist prestist Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you added a struct field you need to add a translation for it in translations, this is why CI is failing.

Name string `json:"name"`
Options []RaidOption `json:"options,omitempty"`
Spares *int `json:"spares,omitempty"`
Assemble *bool `json:"assemble,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you added a struct field you need to add a translation for it in translations, this is why CI is failing.

@lehmanju
Copy link
Author

Had no time to test this yet. Was also looking for a rough opinion on behavior first. Because "assemble" would just try to assemble any previous raids, even if they do not match the current (possibly changed) configuration.

@prestist
Copy link
Collaborator

prestist commented Apr 4, 2024

@lehmanju it might make sense to bring this up in the fcos meeting, we have one weekly at 16:30 UTC.

@jlebon
Copy link
Member

jlebon commented Apr 5, 2024

This is doing assembly, but not checking that the assembled RAID matches the specification, right?

I think also the semantics should match all the other objects with reuse semantics: a field named e.g. wipeRaid which default to false (allows reuse, errors if it doesn't match spec) but if set to true always wipes it first.

@lehmanju
Copy link
Author

lehmanju commented Apr 6, 2024

This is doing assembly, but not checking that the assembled RAID matches the specification, right?

Correct.

The problem I recall is that additional options for raid creation are passed directly to mdadm. Ideally, its not possible to pass random options but to have separate fields for them. Then checking whether the specification matches is easier. But even then it might only be possible after assemble, since raid properties are not visible up front.

I wanted to implement a minimal, non breaking change, hence a simple assemble option.

@jlebon
Copy link
Member

jlebon commented Apr 12, 2024

This is doing assembly, but not checking that the assembled RAID matches the specification, right?

Correct.

The problem I recall is that additional options for raid creation are passed directly to mdadm. Ideally, its not possible to pass random options but to have separate fields for them. Then checking whether the specification matches is easier. But even then it might only be possible after assemble, since raid properties are not visible up front.

I wanted to implement a minimal, non breaking change, hence a simple assemble option.

You're right that with the options key it becomes impossible to verify everything. I think that's OK though. Compare to the luks implementation; we only check the bits we do spec out before considering it reusable. We should document the limitations of our reuse logic.

For raid, we could check level, devices, and spares.

Note that mdadm --examine can be used to analyze a RAID member without activating the device. That'll give info on the RAID level. The number of devices and spares can be deduced too, but it's a bit awkward. It seems fine too to activate it to make analyzing easier. (And then disassemble it if it doesn't match.)

I think also the semantics should match all the other objects with reuse semantics: a field named e.g. wipeRaid which default to false (allows reuse, errors if it doesn't match spec) but if set to true always wipes it first.

One thing I realized is that the problem with defaulting to false is that it's a behavioural change. We would have to default to true to keep the same behaviour, which is OK though a bit awkward being that all the other wipe* options default to false.

@lehmanju
Copy link
Author

I just checked the current behavior: All raids are automatically assembled when found. No additional configuration is required; FCOS does this by default.

Therefore, adding the option wipeRaid is probably the best solution. If set to false, we check whether it matches and do nothing further. If set to true, we create a new raid as previously.

@jlebon
Copy link
Member

jlebon commented Apr 15, 2024

I just checked the current behavior: All raids are automatically assembled when found. No additional configuration is required; FCOS does this by default.

Was this in the real root or in the initramfs? AFAIK, the mdraid udev rules in the initramfs will purposely not auto-assemble any array except those specified by rd.md.uuid.

@lehmanju
Copy link
Author

lehmanju commented May 3, 2024

It was the real root, in the initramfs RAID is not assembled.

As it turns out, I sadly don't find the time to continue working on this. I'll have to close this PR.

@jlebon
Copy link
Member

jlebon commented May 3, 2024

@lehmanju That's fair. Nonetheless, thanks for the discussion. I think it will help anyone who has cycles to pick this up again in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants