Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Drogue IoT MQTT connector #904

Closed
wants to merge 13 commits into from
Closed

Conversation

ctron
Copy link

@ctron ctron commented Dec 1, 2022

This PR adds an MQTT based connector for Drogue IoT.

The goal is to provide remote control and monitoring capabilities for an IoT cloud side backend, like Drogue IoT.

The main changes (aside from adding the code to perform this) are:

  • Add capabilities to the update and ostree actor to broadcast its current state
  • Allow to set an "update trigger", which defined the logic to identify a new update (currently either Cincinnati or Drogue)
  • Keep the existing "update strategy" as this still makes sense, deciding when the update should be applied
  • The state change from "no update" to "update requested" is now defined by the update trigger. For Cincinnati this works as before, for Drogue this just waits (until a remote commands triggers the change).

In case updates are disabled, it behaves as before. In case the Drogue agent is configured "readonly" it will only report the state, but changes are triggered through Cincinnati.

Everything is gated by a feature flag (drogue), which is disabled by default.

I also know this PR contains a few changes in the makefile and documentation which helped me test. They are not considered for merging.

@lucab
Copy link
Contributor

lucab commented Dec 8, 2022

Thanks for the PR, interesting experiment!

I do agree that there several small quality-of-life improvements in here that could be split out and easily merged.

On the new core logic, I think this design does not exactly align with FCOS release engineering flow. Notably, FCOS updates are pull-based / level-triggered, and do not individually target specific nodes nor releases. This PR is instead somehow trying to sidestep the update graph and making the central update server aware of individual nodes (and overall scheduling updates in a push-based / event-triggered way).

While I don't know the whole context for this feature implementation (and I'm not directly working on Zincati anymore) I think that patching Zincati this way is probably not the best way to go.
I would instead suggest looking into some other possible approaches:

  • moving this MQTT logic to a container listening on localhost and pointing Zincati to it. A local container can implement whatever custom logic is required, and then expose local Cincinnati and Fleetlock endpoints. This kind of containerized logic is overall well-aligned with the goals and typical usages of FCOS.
  • avoiding Zincati at all. If you are not using any of FCOS releng features (update graph, windowed rollouts, etc.) then you most likely just don't need Zincati. Simply disable it and then directly drive rpm-ostree from a fully custom updater.

One thing that I acknowledge is that Zincati is currently lacking a primitive to externally trigger a tick / refresh the state-machine.
This would be really valuable for event-based flows like yours, where there is external knowledge that an update is very likely already available and Zincati should quickly try to progress toward an UpdateAvailable state. This is a new primitive that should likely be exposed through a DBus method.
Right now a dirty workaround is to always speed up the refresh timings through #219, but it wasn't really meant to handle cases like this so it is quite expensive in this context.

@ctron
Copy link
Author

ctron commented Dec 12, 2022

The use case comes from a space where one might want to have multiple images, for different devices/gateways. So not all devices receive the same image.

The PR actually has the following changes:

  • Expose information via MQTT. The allows to monitor the state of OStree and the updates in a more "realtime" fashion. In case the read-only mode is set, the updates follow the normal flow through Cincinatti.
  • Allow triggering an update through the MQTT channel. (I will explain this below).
  • Add the MQTT base code (this could be extracted into a dedicated crate, which would then add an some external dependency: pros & cons).

Initially I had the same feeling: it doesn't quite fit. I still started out to add this functionality in order to avoid "not invented here", and leverage the code already in place. From a technology perspective (Rust, Actix) is was a good fit.

During that process it turned out that the change actually isn't that big (aside from the core MQTT stuff and some internal plumbing) there are two main changes (as mentioned above): adding some monitoring functionality over MQTT and choosing a different trigger for an update.

If you take a closer look at the change of triggering an update, it isn't that much of a different IMHO. With Cincinatti, you have a client, which polls HTTP to figure out the new target state. With Drogue/MQTT, you do the same, just with MQTT, which reversed the command direction (not pulling, but pushing).

And all the other stuff is still active. Including the fleet lock logic (which also might make sense in combination with Drogue/MQTT).

Pulling this out of Zincati is definitely possible, but would replicate around 80% of the code, if not more. True, one could extract this into a sidecar container, mimicking Cincinatti (fleetlock is still used as before). But that would mean that one would create an artificial upgrade graph, just for triggering an update.

I think a cleaner approach would be to make the update trigger a trait too. One implementation is Cincinatti, but there can be others too.

@cgwalters
Copy link
Member

First, thanks so much for this pull request! There's a lot of neat stuff going on in Drogue (I'm a big Rust fan too).

I have a lot of thoughts and there's a lot going on related to things that touch on this topic.

First, there's a giant shift we have going on to use containers for updates https://fedoraproject.org/wiki/Changes/OstreeNativeContainerStable that touches on the update graph bits coreos/fedora-coreos-tracker#1263

As part of this - it's becoming more emphasized to support injecting custom privileged code that runs directly as part of the host. Today, one could write a privileged container that orchestrated FCOS updates in a custom way (and disables zincati). In fact, we effectively do that in OpenShift because it's the machine config operator there that does updates (and handles draining nodes).

But with layering one can now directly inject custom update agents into the host and hence there's no point in time in which one has an OS without an agent.

This also touches on the RHEL for Edge flow which always involves a custom OS build and hence there's an opportunity to inject custom agents there too.

Ultimately I think I'd like to pare down the basic functionality of zincati down into rpm-ostree (and into bootc). What specific APIs we support there is up for discussion but what I'm thinking right now is that we basically support polling a remote container image and that's it - more complex logic requires a custom agent/driver which could be a container or external binary.

@cgwalters
Copy link
Member

Per discussion for now, closing but without prejudice - this is just something that can be done external to this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants