Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identity: fail on missing section sooner #1965

Merged

Conversation

Chris-Peterson444
Copy link
Contributor

Moves the logic for checking if user-data or identity section is provided (on server) in the autoinstall config to the load_autoinstall_data function. Without this change, the exception thrown in apply_autoinstall_config won't halt the installer until
the postinstall steps (LP: 2060547).

The change is ready but I am keeping this as a draft PR until I can test this on Desktop.

@Chris-Peterson444
Copy link
Contributor Author

This doesn't quite work for desktop because app.base_model.source.current isn't set correctly until the controllers are started, which only happens after loading the autoinstall data. I think the quickest (and hackiest) way to solve this is to inspect the sources information in /cdrom/casper/install-sources.yaml from the identity controller itself.

@ogayot
Copy link
Member

ogayot commented Apr 10, 2024

I've only tested in dry-run mode but I'm not seeing an obvious difference between the previous and the new behavior. What is it that we are trying to achieve? Exit the subiquity server process before partitioning starts?

@Chris-Peterson444
Copy link
Contributor Author

Yes. On server, if you omit both the identity section and the user-data section then an error will be emitted when the identity controller loads the autoinstall data, but the installation with proceed until the postinstall stage where it hangs. We want to make sure this error halts the installer immediately.

@dbungert
Copy link
Collaborator

To elaborate on what Chris said, the failure mode here is really best seen to be believed.

So you supply a trivial cloud-config like so to live-server:

#cloud-config
autoinstall:
  version: 1

An exception is seen in the logs complaining about the lack of user-data or identity, but the entire install section continues, including partitioning of disks and copying to the target system.

THEN, at this point, the install hangs, as the post-install models are not configured, and there isn't even a visual indication on why this is other than that Exception shown all the way before partitioning even started.

My goal here was to move that check earlier, since the time between the problem report and when the install stops is measured in how long it takes for partitioning + rsync to happen.

@Chris-Peterson444
Copy link
Contributor Author

Chris-Peterson444 commented Apr 12, 2024

Yeah, sorry for not being more explicit. So the way I see it this could have regressed the following conditions:

  • Server autoinstall:
    • Includes only Identity section: OK
    • Includes only user-data section: OK
    • Identity section is explicitly interactive: OK
    • Identity section is implicitly interactive (i.e. * ): OK
    • reset-partition-only is true: OK
    • none of the above scenarios and missing both identity and user-data sections: Error
  • Desktop autoinstall:
    • Provides neither identity or userdata: OK

I was able to confirm all of these cases pass in VM tests. I did find a separate bug in the reset-partition-only handling but I'll file a separate bug about that. Edit: reset-partition bug LP: #2061042

@Chris-Peterson444 Chris-Peterson444 marked this pull request as ready for review April 12, 2024 00:17
Copy link
Collaborator

@dbungert dbungert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that was a bit more involved than I would have guessed. Nice work!

Conceptually this is fine, my major feedback is to just relocate some parts to a place where we might remove the duplication, or at least make it more greppable. I would be quite surprised to find some of this in identity.py.

subiquity/server/controllers/identity.py Outdated Show resolved Hide resolved
subiquity/server/controllers/identity.py Outdated Show resolved Hide resolved
subiquity/server/controllers/identity.py Outdated Show resolved Hide resolved
if (
source_config is not None
and (id := source_config.get("id")) is not None
and "server" not in id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of checking the string set on the id, it's semi free form.
What's the motivation for this section? It looks like we can drop it and get the same outcome? If I'm missing something let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in subiquity/server/controllers/source.py in load_autoinstall_data we get the source id with a .get("id") on the autoinstall data and do an exact match on the IDs in the catalog via get_matching_source in subiquity/models/source.py so I felt that it wasn't that bad of a comparison to just check that server wasn't in the name.

In the autoinstall case if the id the user provides doesn't match any sources in the catalog then we get a KeyError anyways so I wasn't super concerned with weird outcomes if the user provides something unrealistic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*but this section matches the source controller/model logic of letting autoinstall inform us of the source id

@Chris-Peterson444
Copy link
Contributor Author

Chris-Peterson444 commented Apr 12, 2024

Well that was a bit more involved than I would have guessed. Nice work!

Right? And thanks!

Another thing I had in mind about keeping the logic in the identity controller: we probably want to remove this logic at some point in the future and put it back into the load_autoinstall function when we can just throw from there and expect an immediate halt to the overall install. Edit: re-reading your comment now I see what you're saying. I'll move the bits around tomorrow!

@Chris-Peterson444 Chris-Peterson444 force-pushed the autoinstall-bug-lp-2060547 branch 2 times, most recently from d29991b to 1f11807 Compare April 12, 2024 18:05
@Chris-Peterson444
Copy link
Contributor Author

Moved to separate utility functions owned by the respective controllers, as requested. There is still the open question about the source controller logic.

subiquity/server/controllers/filesystem.py Outdated Show resolved Hide resolved

# Check the sources available the same way the source controller does
# but return a bool for if the variant is desktop
path = "/cdrom/casper/install-sources.yaml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move the existing magic path and reference that common definition, please, or maybe this is obsolete given the previous comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah obsolete but I still think it's worth moving the magic var definition somewhere global

source_config = self.app.autoinstall_config.get(self.autoinstall_key)
if (
source_config is not None
and (id := source_config.get("id")) is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm vetoing this approach, as it relies on behavior in a string that is theoretically an arbitrary value.
What I want instead is for the SourceController.start logic to move earlier in the sequence - out of start entirely actually - so that we can answer this question before SourceController.start() has been called. Then we should be able to just get the info needed. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I originally looked at this bug I think I had reservations about moving the logic in SourceController.start to somewhere sooner, but looking at it again now I don't see why we can't. We can probably move that logic to SourceController.load_autoinstall_data. I'll give it a shot.

Moves the logic for checking if user-data or identity section is
provided (on server) in the autoinstall config to the
load_autoinstall_data function. Without this change, the exception
thrown in apply_autoinstall_config won't halt the installer until
the postinstall steps (LP: 2060547).
Copy link
Collaborator

@dbungert dbungert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have suggested a few tweaks but this seems fine. What further work do you imagine for this draft PR? What sort of VM testing has been done at this point?

subiquity/server/controllers/filesystem.py Outdated Show resolved Hide resolved
subiquity/server/controllers/filesystem.py Outdated Show resolved Hide resolved
subiquity/server/controllers/filesystem.py Outdated Show resolved Hide resolved
The identity controller shouldn't own the logic for determining
if the filesystem controller will only install the reset partition.
Creates a utility function that can be called by the identity
controller to determine if only installing the reset partition.
The current check for Desktop in the identity controller doesn't
work because the variant isn't set until SourceController.start
is called. Move this logic to earlier in the sequence so that the
variant information can be referenced by later-loaded controllers.
@Chris-Peterson444
Copy link
Contributor Author

Thanks for the suggestions! Currently I am running the following VM tests before I mark this ready:

  • Server autoinstall:
    • Includes only Identity section, successful autoinstall
    • Includes only user-data section, successful autoinstall
    • Identity section is explicitly interactive, successful (interactive) autoinstall
    • Identity section is implicitly interactive (i.e. * ), successful (interactive) autoinstall
    • reset-partition-only is true, successful autoinstall
    • none of the above scenarios and missing both identity and user-data sections, installer halts and displays error
  • Desktop autoinstall:
    • Provides neither identity or userdata, successful autoinstall and does user creation on first boot

@Chris-Peterson444 Chris-Peterson444 marked this pull request as ready for review May 22, 2024 21:45
@Chris-Peterson444
Copy link
Contributor Author

The above tests all pass

1 similar comment
@Chris-Peterson444
Copy link
Contributor Author

The above tests all pass

@Chris-Peterson444 Chris-Peterson444 merged commit c5131fc into canonical:main May 22, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants