`RelationDataContent.setitem` should dynamically dispatch to a file if it's too long #801

rbarry82 · 2022-07-15T00:33:25Z

We've already seen this with Grafana Dashboards, which routinely overflow the maximum argument length from subprocess, but it was also observed that relating Prometheus to a very large number of targets could overflow and cause a strange looking OSError on a RelationChangedEvent

Ultimately, this is due to relation_set calling back to subprocess to handle relation-set ....

We already split long log messages, and relation-set takes a --file parameter which reads in YAML, allowing the limit to be bypassed. If OF determines that the length of the relation data is anywhere near the limit, we could defer to something like:

with tempfile.TempFile() as relation_data:
    with open(relation_data, "w") as f:
        f.write(yaml.dump({key: value})
    self._backend.relation_set(..., data_file=relation_data)

If an optarg were added to relation_set where, if present, data was loaded from a file. This seems easy enough to add, avoids requiring charm authors to carefully think about the size/length of their data bags and potentially destructure them to avoid it mapping back to a map[string]string on the backend, and yields the desired behavior.

The text was updated successfully, but these errors were encountered:

rwcarlsen · 2022-07-15T15:22:16Z

This seems like a good idea. It's surprising to me that charms have managed to run into this though. Crazy users. Maybe we could also log a warning: "Why so much relation data?" :P

sed-i · 2022-07-15T15:28:09Z

relation > charm ⛅

rbarry82 · 2022-07-15T15:55:15Z

This seems like a good idea. It's surprising to me that charms have managed to run into this though. Crazy users. Maybe we could also log a warning: "Why so much relation data?" :P

I would say that it's because some of the "nuts and bolts" get masked a bit.

Grafana dashboards are huge, no surprise there, only surprising that it didn't already detect the length and spit it out to a file, because the first OSError: ... I saw from a RelationEvent was a real head-scratcher.

But outside of that, this is broadly either not intuitive or not exposed. Charm authors using OF are guided towards using dicts (well, [Foo]Mapping, but not important) when interacting with relations and relation data. Hence, it seems natural enough to structure dict-like or otherwise "normal" datastructures.

In this case, it's Prometheus scrape targets. Normally, there wouldn't be that many on one charm, but the point of intersection here is a proxy/bridge between the "old" reactive/LMA charms and the COS observability charms.

So it's structured like:

app-data:
    scrape-jobs:
        - ...
        - ...
        - ...
        - ...
unit-data:
    scrape-metadata:
       - ip_address_and_other_unit_specific_stuff

The proxy/bridge sits in one model, and forms a bit of a "funnel", so it's a N:1 <-> M relation, where N is reactive charms which "speak Prometheus" over the reactive LMA relation, and M is the cos prometheus interfaces.

Instead of a single charm providing, let's say, 4 scrape jobs, there are potentially as many as there are /metrics endpoints in any given "proxied (reactive)" model. The exception which ultimately occurred was more of a "straw that broke the camel's back" than a single misbehaving client.

That said, as mentioned, we've seen that a single Grafana dashboard can push over this limit. We may as well do the same sort of detection/splitting for state-set while we're at it, because almost exactly the same kind of "??? why am I seeing OSError: Argument list too long when add something to a StoredState object or when some custom event is emitted with an HTML template attached?" response.

Where foo-set ... has a --file argument which can cleanly avoid this (I haven't checked state-set, but I would imagine it does), OF handling the finer details of "this is potentially too long -- serialize it to a temporary YAML file and pass that instead" is a good way to stick with principle of least surprise.

rwcarlsen · 2022-07-15T16:32:29Z

Agree - we should make sure to handle this consistently with all the hook tools. It does appear that state-set calls do currently use the --file arg FWIW.

rbarry82 · 2022-07-16T00:05:42Z

Want me to draft a PR?

rwcarlsen · 2022-07-18T18:41:46Z

I'm not going to stop you ;-)

rbarry82 mentioned this issue Jul 19, 2022

Use --file for relation-set; avoid shell argument length limits #805

Merged

3 tasks

jnsgruk closed this as completed in #805 Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`RelationDataContent.setitem` should dynamically dispatch to a file if it's too long #801

`RelationDataContent.setitem` should dynamically dispatch to a file if it's too long #801

rbarry82 commented Jul 15, 2022

rwcarlsen commented Jul 15, 2022

sed-i commented Jul 15, 2022

rbarry82 commented Jul 15, 2022

rwcarlsen commented Jul 15, 2022

rbarry82 commented Jul 16, 2022

rwcarlsen commented Jul 18, 2022

RelationDataContent.__setitem__ should dynamically dispatch to a file if it's too long #801

RelationDataContent.__setitem__ should dynamically dispatch to a file if it's too long #801

Comments

rbarry82 commented Jul 15, 2022

rwcarlsen commented Jul 15, 2022

sed-i commented Jul 15, 2022

rbarry82 commented Jul 15, 2022

rwcarlsen commented Jul 15, 2022

rbarry82 commented Jul 16, 2022

rwcarlsen commented Jul 18, 2022

`RelationDataContent.setitem` should dynamically dispatch to a file if it's too long #801

`RelationDataContent.setitem` should dynamically dispatch to a file if it's too long #801