Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RelationDataContent.__setitem__ should dynamically dispatch to a file if it's too long #801

Closed
rbarry82 opened this issue Jul 15, 2022 · 6 comments · Fixed by #805
Closed

Comments

@rbarry82
Copy link
Contributor

We've already seen this with Grafana Dashboards, which routinely overflow the maximum argument length from subprocess, but it was also observed that relating Prometheus to a very large number of targets could overflow and cause a strange looking OSError on a RelationChangedEvent

Ultimately, this is due to relation_set calling back to subprocess to handle relation-set ....

We already split long log messages, and relation-set takes a --file parameter which reads in YAML, allowing the limit to be bypassed. If OF determines that the length of the relation data is anywhere near the limit, we could defer to something like:

with tempfile.TempFile() as relation_data:
    with open(relation_data, "w") as f:
        f.write(yaml.dump({key: value})
    self._backend.relation_set(..., data_file=relation_data)

If an optarg were added to relation_set where, if present, data was loaded from a file. This seems easy enough to add, avoids requiring charm authors to carefully think about the size/length of their data bags and potentially destructure them to avoid it mapping back to a map[string]string on the backend, and yields the desired behavior.

@rwcarlsen
Copy link
Contributor

This seems like a good idea. It's surprising to me that charms have managed to run into this though. Crazy users. Maybe we could also log a warning: "Why so much relation data?" :P

@sed-i
Copy link
Contributor

sed-i commented Jul 15, 2022

relation > charm

@rbarry82
Copy link
Contributor Author

This seems like a good idea. It's surprising to me that charms have managed to run into this though. Crazy users. Maybe we could also log a warning: "Why so much relation data?" :P

I would say that it's because some of the "nuts and bolts" get masked a bit.

Grafana dashboards are huge, no surprise there, only surprising that it didn't already detect the length and spit it out to a file, because the first OSError: ... I saw from a RelationEvent was a real head-scratcher.

But outside of that, this is broadly either not intuitive or not exposed. Charm authors using OF are guided towards using dicts (well, [Foo]Mapping, but not important) when interacting with relations and relation data. Hence, it seems natural enough to structure dict-like or otherwise "normal" datastructures.

In this case, it's Prometheus scrape targets. Normally, there wouldn't be that many on one charm, but the point of intersection here is a proxy/bridge between the "old" reactive/LMA charms and the COS observability charms.

So it's structured like:

app-data:
    scrape-jobs:
        - ...
        - ...
        - ...
        - ...
unit-data:
    scrape-metadata:
       - ip_address_and_other_unit_specific_stuff

The proxy/bridge sits in one model, and forms a bit of a "funnel", so it's a N:1 <-> M relation, where N is reactive charms which "speak Prometheus" over the reactive LMA relation, and M is the cos prometheus interfaces.

Instead of a single charm providing, let's say, 4 scrape jobs, there are potentially as many as there are /metrics endpoints in any given "proxied (reactive)" model. The exception which ultimately occurred was more of a "straw that broke the camel's back" than a single misbehaving client.

That said, as mentioned, we've seen that a single Grafana dashboard can push over this limit. We may as well do the same sort of detection/splitting for state-set while we're at it, because almost exactly the same kind of "??? why am I seeing OSError: Argument list too long when add something to a StoredState object or when some custom event is emitted with an HTML template attached?" response.

Where foo-set ... has a --file argument which can cleanly avoid this (I haven't checked state-set, but I would imagine it does), OF handling the finer details of "this is potentially too long -- serialize it to a temporary YAML file and pass that instead" is a good way to stick with principle of least surprise.

@rwcarlsen
Copy link
Contributor

Agree - we should make sure to handle this consistently with all the hook tools. It does appear that state-set calls do currently use the --file arg FWIW.

@rbarry82
Copy link
Contributor Author

Want me to draft a PR?

@rwcarlsen
Copy link
Contributor

I'm not going to stop you ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants