Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(cos-agent) Use peer relation data for communicating all principals' data #160

Merged
merged 18 commits into from
Apr 4, 2023

Conversation

sed-i
Copy link
Contributor

@sed-i sed-i commented Mar 29, 2023

Issue

Subordinate charms only see one unit over the subordinated relation - the principal unit. I.e. subordinate units cannot iterate over relation data from other units.
Combined with a leader guard, this means only one principal unit can get its files across - the one that is related to the subordinate leader.

Solution

  • Provider: copy data from principal to peer unit data
  • Requirer: refactor _fetch_data_from_relation
  • Manual e2e test:
    • CMR: prom (metrics, alerts)
    • CMR: loki (logs, alerts)
    • CMR: grafana (dashboards)
  • Unit tests

Fixes #158.

Context

Testing Instructions

  1. Check out Add o11y integration zookeeper-operator#56 and copy over the cos-agent lib from this PR.
  2. Pack zookeeper.
  3. Deploy first-zk (two units), second-zk (one unit is enough) and gagent from this PR.
  4. Relate both Zks to gagent over cos-agent.
  5. juju show-unit ...
  6. Good luck :)

Eventually we'd like to test something like this:

graph LR

subgraph user-model in a lxd controller

subgraph first-zk
first-zk/0
first-zk/1
end

subgraph second-zk
second-zk/0
end

subgraph agent
agent/0
agent/1
agent/2
end

first-zk ---|cos-agent| agent
first-zk/0 -.-|same VM| agent/0

second-zk/0 -.-|"same VM"| agent/2
second-zk ---|"  cos-agent &nbsp "| agent


end

agent ---|CMR - send-remote-write| prom
agent ---|CMR - send-remote-write| prom2

subgraph cos-model in a K8s controller
subgraph prom
prom/0
prom/1
end

subgraph prom2
prom2/0
end

end
Loading

Release Notes

Use peer relation data for communicating all principals' data.

@sed-i
Copy link
Contributor Author

sed-i commented Mar 30, 2023

Hey @PietroPasotti
I refactored this a bit after realizing that we need to forward only some of the data from the principal to the subordinate leader.

graph LR

zk/0 --->|"
    metrics_alert_rules
    log_alert_rules
    dashboards
    metrics_scrape_jobs
    log_slots
"| gagent/i --->|"
    metrics_alert_rules
    log_alert_rules
    dashboards
"| gagent/leader* --->|"
    metrics_alert_rules
    log_alert_rules
    dashboards
"| cos
Loading
  • I introduced two dataclasses. The main issue with them is that runtime json parsing errors give zero insight into the problem. I hope pydantic is capable informing why parsing a json string into a dataclass (model) failed.
  • I was incrementally fixing bugs as moved along manual testing. Bug fixing isn't done but hopefully good enough for you to pick up.

Copy link
Member

@simskij simskij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually tested at 1c7d99ad10e344ae187f55f6473df5e71c2554a4, which worked great.

@sed-i sed-i changed the title (cos-agent) Use peer relation data for communicating all principals' data (WIP) (cos-agent) Use peer relation data for communicating all principals' data Apr 3, 2023
pyproject.toml Outdated Show resolved Hide resolved
@sed-i sed-i merged commit 1b42811 into machine_charm Apr 4, 2023
@sed-i sed-i deleted the feature/use_peer_data branch April 4, 2023 17:58
sed-i added a commit that referenced this pull request Apr 14, 2023
* Cleanup etc. (#90)
* Fix metadata and snap usage (#91)
* Class split (#92)
* Make static_configs into a list (#96)
* add option for tls insecure skip verify (#93)
* Add dashboards handler (#97)
* Scenario tests and machine charm rename (#99)
* spelling errors (#100)
* Clear machine metadata relations (#98)
* Add tests for update-status (#101)
* Type fixes (#102)
* small fixes and scenarios (#103)
* Machine charm lib (#95)
* Feature/fix prop (#111)
* LINT. Fix the grafana data iteration (#112)
* plug extraction (#105)
* Sub relation priority (#114)
* Fix a possible JSON loading exception (#119)
* purged grafana dashboard requirer (#117)
* Fix labels (#120)
* Use cosl for Rules and JujuTopology (#121)
* Fix position files (#116)
* top level documentation for cos_machine lib  (#122)
* Add juju-info itest (#123)
* Snap plug logs (#124)
* Loki shared logs monitoring (#125)
* Rename cos-machine to cos-agent (#129)
* wrong scope for grafana-dashboards-provider relation (#134)
* FIX Grafana dashboards files are stored in a bad format (#139)
* move on-refresh config to unit databag (#142)
* Set BlockedStatus per relation (#141)
* Be looser with finding snap plug mounts (#145)
* Use an actual uuid4, alerts() is now a property everywhere (#151)
* Replace DS_PROMETHEUS with prometheusds (#148)
* Cleanup (#152)
* use better locations for the positions files (#155)
* added typing for metrics endpoint dict (#149)
* Fix fstab parser (#159)
* (cos-agent) Use peer relation data for communicating all principals' data (#160)
* add support for the grafana cloud integrator charm (#150)

Co-authored-by: Dylan Stephano-Shachter <dylan.stephano-shachter@canonical.com>
Co-authored-by: awnn <adrian@webyda.se>
Co-authored-by: Ryan Barry <ryan.barry@canonical.com>
Co-authored-by: PietroPasotti <pietro.pasotti@canonical.com>
Co-authored-by: Jose C. Massón <939888+Abuelodelanada@users.noreply.github.com>
Co-authored-by: sed-i <82407168+sed-i@users.noreply.github.com>
Co-authored-by: Simon Aronsson <simme@arcticbit.se>
Co-authored-by: Luca Bello <36242061+lucabello@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants