Decouple workload, configuration management from charm.py #9

sed-i · 2023-05-25T21:01:46Z

In this PR a few parts of operating grafana agent have been decoupled from charm.py.
Something similar is expected to follow in a future PR for operating nginx.

Split out "WorkloadManager" and "Config".
Set up stage for the config to be lazily evaluated, in an attempt to avoid code ordering issues.
New "compound status" handling approach:
- Charm components report back via a callback
- Each component, and then the charm, are responsible for calculating the "total" status for themselves

Apart from that, most of the code was copy-pasted from grafana agent (k8s operator).

dstathis · 2023-05-29T14:14:08Z

src/agent_workload.py

+class Status:
+    """Helping with centralized status setting."""
+
+    def __init__(self, callback: Callable[[StatusBase], None] = lambda _: None):
+        self._config: StatusBase = UnknownStatus()
+        self._callback = callback


I seems to me like this status class should be defined globally in charm.py because the status will need to be combined with any status from Mimir itself.

It for sure would need to be combined, but the idea I'm proposing here is:

Each component has its own status, and calculate it's own total status

Each component reports back its total status

The "total of totals" is calculated in charm code

This seems more flexible than introducing additional assumptions on the structure/hierarchy of one global status object. Wdyt?

I'm thinking we should wait on compound status to land in ops, inventing our own here seems like coinventing the wheel

dstathis · 2023-05-29T14:18:03Z

src/agent_workload.py

+        super().__init__(charm, f"{self.__class__.__name__}-{container_name}")
+
+        # Property to facilitate centralized status update
+        self.status = Status(callback=status_changed_callback)  # pyright: ignore
+
+        self._unit = charm.unit
+
+        self._service_name = self._container_name = container_name
+        self._container = charm.unit.get_container(container_name)
+
+        self._render_config = config_getter
+
+        # turn the container name to a valid Python identifier
+        snake_case_container_name = self._container_name.replace("-", "_")
+        charm.framework.observe(
+            getattr(charm.on, "{}_pebble_ready".format(snake_case_container_name)),
+            self._on_pebble_ready,
+        )


Since we call super, I think we should be able to use self.framework and self.unit rather than charm.framework and charm.unit.

You're right, but this seems more explicit in a good way?

dstathis · 2023-05-30T09:30:31Z

src/agent_workload.py

+        if version := self.version:
+            self._unit.set_workload_version(version)


issue (blocking): I'm not sure the version we display should be the grafana-agent version. This is a Mimir charm after all. Not sure what version we should display but I don't think it should be this.

Good point!
I was focusing on the "workload manager" aspect and overlooked this.

If we aim to have the same "workload manager" for gagent here and elsewhere, then we would probably need a means to selectively set/not set the workload version.

The coordinator runs gagent and nginx. How awful would agent: x.y.z; nginx: a.b.c be?

TODO:

log.info the workload version

do not set the workload here; let charm code access .version prop directly

in this case, charm code should set n/a ~~(or multi/various?)~~

dstathis · 2023-05-30T09:34:28Z

src/agent_workload.py

+        if config == old_config:
+            # Nothing changed, possibly new installation. Move on.
+            self.status.config = ActiveStatus()
+            return
+
+        try:
+            self.write_file(self.CONFIG_PATH, yaml.dump(config))
+        except APIError as e:
+            logger.warning(str(e))
+            self.status.config = WaitingStatus(str(e))
+            return


question: Should the write_file be in an else block? Seems like it shouldn't be needed if the config file did not change.

question: Should we restart or reload grafana-agent after writing the new config file?

If write_file goes in the else block, what would we have in the try block? Wouldn't it be equivalent to just having the write_file in the try block in the first place?

At first I had a restart in this same function (like we do elsewhere), but that made the function not reusable on pebble ready, because there is no service yet. I think it makes sense to separate the two concerns: update config and restart. We could come up with a third name that does both.

dstathis · 2023-05-30T09:35:54Z

src/agent_workload.py

+        except APIError as e:
+            logger.warning(str(e))
+            self.status.config = WaitingStatus(str(e))
+            return


issue (blocking): I do not think waiting status makes sense. We are not waiting on a related app. In fact, this error is not resolvable at all, so we should surface the exception in order to go in to error state.

This is copied from gagent.

Seems like:

It should be WaitingStatus if we're before pebble-ready ("auto resolvable")

BlockedStatus otherwise

So I agree, all in all this should be BlockedStatus.

Abuelodelanada · 2023-05-30T21:29:18Z

src/agent_workload.py

+        config = self._render_config()  # TODO: Must not be None
+        assert config is not None


Are we capturing possible AssertionError exception? 🤔

Abuelodelanada · 2023-05-30T21:36:50Z

src/charm.py

+            config_getter=lambda: Config(
+                topology=JujuTopology.from_charm(self),
+                scrape_configs=None,  # FIXME generate from memberlist
+                remote_write=self.remote_write_consumer.endpoints,
+                loki_endpoints=self.loki_consumer.loki_endpoints,
+                insecure_skip_verify=True,
+                http_listen_port=3500,
+                grpc_listen_port=3600,
+            ).build(),  # TODO figure out what to do about potential code ordering problem


I would build Config() object before, so this section is easy to understand

We cannot, unfortunately, as it must be evaluated as a lambda to not miss out on the changes happening as part of the event hook execution.

@sed-i: What we could do, however, is to move the Config object instantiation to a separate function/method, and pass that invocation as the lambda expression.

simskij · 2023-06-12T09:24:54Z

src/agent_config.py

+        for endpoint in self.remote_write + self.loki_endpoints:
+            endpoint["tls_config"] = {"insecure_skip_verify": insecure_skip_verify}


This does not look like it necessarily belongs in the init function.

simskij · 2023-06-12T09:29:22Z

src/charm.py

+            config_getter=lambda: Config(
+                topology=JujuTopology.from_charm(self),
+                scrape_configs=None,  # FIXME generate from memberlist
+                remote_write=self.remote_write_consumer.endpoints,
+                loki_endpoints=self.loki_consumer.loki_endpoints,
+                insecure_skip_verify=True,
+                http_listen_port=3500,
+                grpc_listen_port=3600,
+            ).build(),  # TODO figure out what to do about potential code ordering problem


We cannot, unfortunately, as it must be evaluated as a lambda to not miss out on the changes happening as part of the event hook execution.

@sed-i: What we could do, however, is to move the Config object instantiation to a separate function/method, and pass that invocation as the lambda expression.

PietroPasotti

RE the different Config approach, I don't think it's such a different approach. The implementation is different, sure, but the idea is the same. Also AM's config has somewhat different design goals:

storedstate-backed hash instead of recalculation from filesystem (we could factor that out, of course).
the config building is somewhat more complicated, not just one big dict.

We should see if we can generalize the two?

PietroPasotti · 2023-06-23T06:49:11Z

src/agent_workload.py

+class Status:
+    """Helping with centralized status setting."""
+
+    def __init__(self, callback: Callable[[StatusBase], None] = lambda _: None):
+        self._config: StatusBase = UnknownStatus()
+        self._callback = callback


I'm thinking we should wait on compound status to land in ops, inventing our own here seems like coinventing the wheel

PietroPasotti · 2023-06-23T06:49:54Z

src/agent_workload.py

+    def config(self, value: StatusBase):
+        self._config = value
+        # When status is updated, it is likely desirable to have some kind of side effect.
+        self._side_effect()


[nit]: _side_effect --> _update_status

PietroPasotti · 2023-06-23T06:50:32Z

src/agent_workload.py

+        logger.debug("Status updated to: %s", self._combined())
+        self._callback(self._combined())
+
+    def _combined(self) -> StatusBase:


[nit]: either _combine() or @property\ndef _combined(self)...

PietroPasotti · 2023-06-23T06:51:59Z

src/agent_workload.py

+        Returns:
+            A string equal to the agent version
+        """
+        if not self.is_ready:


is_ready is not a property (but imho it should be) so this will always be trivially true.

PietroPasotti · 2023-06-23T06:54:04Z

src/agent_workload.py

+        try:
+            self.write_file(self.CONFIG_PATH, yaml.dump(config))
+        except APIError as e:
+            logger.warning(str(e))


if this fails, how does the charm know? and how does it retry?

sed-i · 2023-11-28T04:32:54Z

Outdated. Closing.

sed-i added 6 commits May 18, 2023 15:45

WorkloadManager

3985db1

with custom event

8d78bad

without custom event

6a5f503

Add 'config builder'

c06ee8f

Refactor

1b2a837

Refactor

b5782a9

sed-i requested review from Abuelodelanada, simskij, dstathis, PietroPasotti and lucabello May 25, 2023 21:01

github-actions bot added the Libraries: Out of sync label May 25, 2023

Fix utest

a225556

dstathis requested changes May 30, 2023

View reviewed changes

Abuelodelanada reviewed May 30, 2023

View reviewed changes

simskij reviewed Jun 12, 2023

View reviewed changes

PietroPasotti requested changes Jun 23, 2023

View reviewed changes

sed-i closed this Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple workload, configuration management from charm.py #9

Decouple workload, configuration management from charm.py #9

sed-i commented May 25, 2023

dstathis May 29, 2023

sed-i May 30, 2023

PietroPasotti Jun 23, 2023

dstathis May 29, 2023

sed-i May 30, 2023

dstathis May 30, 2023 •

edited

sed-i May 30, 2023

sed-i May 30, 2023

dstathis May 30, 2023

sed-i May 30, 2023

dstathis May 30, 2023 •

edited

sed-i May 30, 2023

Abuelodelanada May 30, 2023

Abuelodelanada May 30, 2023

simskij Jun 12, 2023

simskij Jun 12, 2023

simskij Jun 12, 2023

PietroPasotti left a comment

PietroPasotti Jun 23, 2023

PietroPasotti Jun 23, 2023

PietroPasotti Jun 23, 2023

PietroPasotti Jun 23, 2023

PietroPasotti Jun 23, 2023

sed-i commented Nov 28, 2023

		if version := self.version:
		self._unit.set_workload_version(version)

		config = self._render_config() # TODO: Must not be None
		assert config is not None

		for endpoint in self.remote_write + self.loki_endpoints:
		endpoint["tls_config"] = {"insecure_skip_verify": insecure_skip_verify}

Decouple workload, configuration management from charm.py #9

Decouple workload, configuration management from charm.py #9

Conversation

sed-i commented May 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dstathis May 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dstathis May 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PietroPasotti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sed-i commented Nov 28, 2023

dstathis May 30, 2023 •

edited

dstathis May 30, 2023 •

edited