Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephadm: using ip instead of short hostname for prometheus urls #49836

Merged
merged 1 commit into from Feb 28, 2023

Conversation

rkachach
Copy link
Contributor

@rkachach rkachach commented Jan 23, 2023

Tested with both ipv4 and ipv6 addresses, following is an example of the generated links after the change:

http://[fe80::5054:ff:fef8:982e]:9095/graph?g0.expr=up%7Bjob%3D%22ceph%22%7D+%3D%3D+0&g0.tab=1
http://192.168.100.101:9095/graph?g0.expr=up%7Bjob%3D%22ceph%22%7D+%3D%3D+0&g0.tab=1

Fixes: https://tracker.ceph.com/issues/58548

Signed-off-by: Redouane Kachach rkachach@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@rkachach rkachach requested a review from a team as a code owner January 23, 2023 12:25
src/cephadm/cephadm.py Outdated Show resolved Hide resolved
Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we find that we need a hostname-to-IP lookup, I think that we should provide that as a mgr API call so that all consumers get a consistent view of hosts-IPs (the idea of maintaining a registry of IPs-names has already been around for a while).

src/cephadm/cephadm.py Outdated Show resolved Hide resolved
@rkachach
Copy link
Contributor Author

jenkins test make check

1 similar comment
@rkachach
Copy link
Contributor Author

jenkins test make check

Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding some tests... (integration-like, since most of the pain here can come from the network config). Mocking doesn't seem possible here since getaddrinfo() is implemented in C...

src/cephadm/cephadm.py Show resolved Hide resolved
src/cephadm/cephadm.py Outdated Show resolved Hide resolved
src/cephadm/cephadm.py Outdated Show resolved Hide resolved
@rkachach
Copy link
Contributor Author

Maybe adding some tests... (integration-like, since most of the pain here can come from the network config). Mocking doesn't seem possible here since getaddrinfo() is implemented in C...

Not sure if right now there're integration tests for monitoring stack. In this case we need to fire some alarm and then make sure the generated prometheus link is an IP ...

@epuertat
Copy link
Member

Thanks for addressing my comments @rkachach !

Maybe adding some tests... (integration-like, since most of the pain here can come from the network config). Mocking doesn't seem possible here since getaddrinfo() is implemented in C...

Not sure if right now there're integration tests for monitoring stack. In this case we need to fire some alarm and then make sure the generated prometheus link is an IP ...

With pure functions I find it superhelpful to use doctests, as they serve both as documentation and unit tests. Despite unfortunately these functions are not 'pure' (they heavily depend on the internal state of the network stack), I still think that is should be useful to document (in the doctstring) some sample input-outputs obtained in a real/testing environment. That helps a lot in debugging future issues or when refactoring the code.

@rkachach
Copy link
Contributor Author

Thanks for addressing my comments @rkachach !

Maybe adding some tests... (integration-like, since most of the pain here can come from the network config). Mocking doesn't seem possible here since getaddrinfo() is implemented in C...

Not sure if right now there're integration tests for monitoring stack. In this case we need to fire some alarm and then make sure the generated prometheus link is an IP ...

With pure functions I find it superhelpful to use doctests, as they serve both as documentation and unit tests. Despite unfortunately these functions are not 'pure' (they heavily depend on the internal state of the network stack), I still think that is should be useful to document (in the doctstring) some sample input-outputs obtained in a real/testing environment. That helps a lot in debugging future issues or when refactoring the code.

Very interesting lib indeed, specially as it keeps, code, dos and UT in the same file. Thanks :)

BTW: I checked and we don't have integration testing specific to monitoring. We will be introducing some new tests but that would be done on a separate PR.

Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 Thanks @rkachach

@adk3798
Copy link
Contributor

adk3798 commented Feb 17, 2023

https://pulpito.ceph.com/adking-2023-02-13_21:57:03-orch:cephadm-wip-adk-testing-2023-02-13-1359-distro-default-smithi/

11 failures, one dead job

  • 8 of the failed jobs were on new tests being added in by other PRs in the run, which obviously aren't quite working yet (test_monitoring_stack_basic, rotate-ssh-keys, test_iscsi_container) but outside of iscsi and monitoring stack specific changes and the PRs that introduced the tests, this shouldn't block merging.
  • 2 instances of https://tracker.ceph.com/issues/58535
  • 1 instance of https://tracker.ceph.com/issues/49287
  • 1 dead job was an error re-imaging machines, infra issue

@adk3798
Copy link
Contributor

adk3798 commented Feb 27, 2023

https://pulpito.ceph.com/adking-2023-02-21_05:38:18-orch:cephadm-wip-adk-testing-2023-02-20-1650-distro-default-smithi/

failed/dead job reruns: https://pulpito.ceph.com/adking-2023-02-24_17:44:54-orch:cephadm-wip-adk-testing-2023-02-20-1650-distro-default-smithi/

After reruns, 3 failures and 1 dead job

  • dead job was failure pulling a podman package
Failed to download packages: podman-docker-3:4.3.1-2.module_el8.8.0+1254+78119b6e.noarch:
  Cannot download, all mirrors were already tried without success

Another instance of this test passed in original run so didn't bother with another rerun

  • 2 failures were staggered upgrade test issue tracked by https://tracker.ceph.com/issues/58535
  • last failure was in currently flaky test_nfs task test. Passed when I tried an interactive rerun for debugging

Overall, nothing to block merging. Will note initial version of basic monitoring stack test passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants