Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/cephadm: introduce centralized logging in Dashboard using Loki and Promtail #44751

Merged
merged 9 commits into from Mar 16, 2022

Conversation

avanthakkar
Copy link
Contributor

@avanthakkar avanthakkar commented Jan 24, 2022

Goal of this PR is to introduce centralize logging by installing two new monitoring containers (Loki & Promtail) in the orchestrator.

Fixes: https://tracker.ceph.com/issues/50491

Screenshot from 2022-01-31 16-10-18

Screenshot from 2022-01-31 16-10-37

Recording:

  • Daemon logs
Screencast.from.03-02-2022.05.36.18.PM.mp4

Steps to configure -

  1. Enable logging to the files using these cli commands -
    ceph config set global log_to_file true
    ceph config set global mon_cluster_log_to_file true
  2. Log In to Grafana at port 3000
  3. Go to the Explore section and select Loki datasource from the dropdown(top left)
  4. Filter by log labels(filename, job) etc
  5. You can enable the Live Mode and get the latest logs in realtime.

Co-authored-by: Avan Thakkar athakkar@redhat.com
Co-authored-by: Aashish Sharma aasharma@redhat.com

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@avanthakkar avanthakkar added this to In progress in Dashboard via automation Jan 24, 2022
@avanthakkar avanthakkar force-pushed the introduce-logging-containers branch 4 times, most recently from 30b0b99 to 0266d7e Compare January 31, 2022 10:46
@avanthakkar avanthakkar marked this pull request as ready for review January 31, 2022 10:47
@avanthakkar avanthakkar requested a review from a team as a code owner January 31, 2022 10:47
@aaSharma14 aaSharma14 changed the title mgr/cephadm: introduce loki and promtail containers mgr/cephadm: introduce centralized logging in Dashboard using Loki and Promtail Jan 31, 2022
@avanthakkar avanthakkar force-pushed the introduce-logging-containers branch 2 times, most recently from 8e3907d to 59063e6 Compare February 1, 2022 10:34
@aaSharma14 aaSharma14 moved this from In progress to Review in progress in Dashboard Feb 9, 2022
@epuertat
Copy link
Member

epuertat commented Feb 9, 2022

jenkins test make check

Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
@avanthakkar avanthakkar force-pushed the introduce-logging-containers branch 2 times, most recently from 0cee300 to bb65adf Compare February 10, 2022 11:01
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Co-authored-by: Aashish Sharma <aasharma@redhat.com>
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
@aaSharma14
Copy link
Contributor

jenkins test windows

@aaSharma14
Copy link
Contributor

@avanthakkar @aaSharma14 Kudos for this PR! Have some questions

  1. Does it supports all container daemon logs collection, that is part of ceph cluster?
  • Yes, all the daemons for which log files are being created. Ex - mgr, mon, osd
  1. monitoring stack images are in quay.io registry, since LOKI and PROMTAIL will also be part of monitoring stack images, What is the reason to keep it in docker.io can we keep all monitoring stack images in one single registry?
  • Makes sense, Am in process of adding these images to quay.
  1. In the existing logs section in the ceph dashboard, Can we support these logs collection as well there?
  • Since we are facing some issues with grafana iframes in dashboard as of now, we are deferring this step for now, but this will be integrated in dashboard in near future.

@avanthakkar
Copy link
Contributor Author

jenkins test windows

Copy link

@umangachapagain umangachapagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are typos on 5th and 6th commit messages.
"grps -> grpc" and "liniting -> linting"

@aaSharma14
Copy link
Contributor

jenkins test windows

@rkachach
Copy link
Contributor

rkachach commented Mar 4, 2022

@adk3798 please correct me if I'm wrong but having a look to the changes it doesn't seem that this PR is adding them to the services that are upgraded (as part of the ceph orch upgrade). Is this OK?

Copy link
Contributor

@rkachach rkachach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avanthakkar you need to add the new services to the list in ceph/src/pybind/mgr/cephadm/utils.py otherwise they will not be redeployed during an upgrade (you will end up with an upgraded cluster but with old daemons running)

MONITORING_STACK_TYPES = ['node-exporter', 'prometheus', 'alertmanager', 'grafana']

It would be good also to test an upgrade where you basically start from a cluster the contains the new daemons towards a version where they are present also to make sure nothing is broken and to ensure that the new services are redeployed correctly after the upgrade.

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
@aaSharma14
Copy link
Contributor

@avanthakkar you need to add the new services to the list in ceph/src/pybind/mgr/cephadm/utils.py otherwise they will not be redeployed during an upgrade (you will end up with an upgraded cluster but with old daemons running)

MONITORING_STACK_TYPES = ['node-exporter', 'prometheus', 'alertmanager', 'grafana']

It would be good also to test an upgrade where you basically start from a cluster the contains the new daemons towards a version where they are present also to make sure nothing is broken and to ensure that the new services are redeployed correctly after the upgrade.

Thanks @rkachach , Done.

Dashboard automation moved this from Reviewer approved to Review in progress Mar 10, 2022
@epuertat
Copy link
Member

jenkins test make check

Dashboard automation moved this from Review in progress to Reviewer approved Mar 16, 2022
@epuertat epuertat merged commit b8ba5d0 into ceph:master Mar 16, 2022
12 checks passed
Dashboard automation moved this from Reviewer approved to Done Mar 16, 2022
@epuertat epuertat deleted the introduce-logging-containers branch March 16, 2022 13:06
s0nea added a commit to s0nea/ceph that referenced this pull request Mar 21, 2022
Fixes: https://tracker.ceph.com/issues/54502
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit 4f14993)

Conflicts:
	src/pybind/mgr/cephadm/services/monitoring.py
Fixed conflict because ceph#44751 has
not been backported to pacific (yet).
s0nea added a commit to s0nea/ceph that referenced this pull request Mar 24, 2022
Fixes: https://tracker.ceph.com/issues/54502
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit 4f14993)

Conflicts:
	src/pybind/mgr/cephadm/services/monitoring.py
Fixed conflict because ceph#44751 has not been backported to
pacific (yet).
s0nea added a commit to s0nea/ceph that referenced this pull request Mar 28, 2022
Fixes: https://tracker.ceph.com/issues/54502
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit 4f14993)

Conflicts:
	src/pybind/mgr/cephadm/services/monitoring.py
Fixed conflict because ceph#44751 has not been
backported to quincy (yet).
mchangir pushed a commit to mchangir/ceph that referenced this pull request Apr 13, 2022
Fixes: https://tracker.ceph.com/issues/54502
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit 4f14993)

Conflicts:
	src/pybind/mgr/cephadm/services/monitoring.py
Fixed conflict because ceph#44751 has not been backported to
pacific (yet).
@adk3798 adk3798 mentioned this pull request Apr 27, 2022
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Dashboard
  
Done
8 participants