Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/cephadm: add ability to write custom logrotate configs to all hosts #57410

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

adk3798
Copy link
Contributor

@adk3798 adk3798 commented May 10, 2024

For either the cephadm.log or the cluster logs. Additionally, this tries to integrate the feature into the cephadm task in order to rotate logs during teuthology runs

Fixes: https://tracker.ceph.com/issues/65827

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Copy link
Contributor

@phlogistonjohn phlogistonjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of random little things, no major complaints about the general approach. Poke me if it's not clear what I think should be changed vs what are random suggestions.

src/cephadm/cephadm.py Outdated Show resolved Hide resolved
src/cephadm/cephadmlib/logging.py Outdated Show resolved Hide resolved
src/cephadm/cephadmlib/logging.py Outdated Show resolved Hide resolved
src/cephadm/cephadmlib/logging.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/inventory.py Show resolved Hide resolved
cephadm.log or the cluster logs to all hosts in the cluster. It's
recommended to take the existing logrotate file and then add fields
to it. The cluster log in particular makes use of certain template
fields to generate the name of the directory in which to rotate the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's list the one's we're willing to support.

fields to generate the name of the directory in which to rotate the
logs and for the postrotate actions

In order to have cephadm write out the custom logrotate config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out again

When the command to write the custom logrotate file is run, it only schedules
the logrotate configs to be written. If cephadm is busy with other operations
such as deploying daemons or an upgrade, there could be a notable delay in the
logrotate configs being written out.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind it so much here.


.. note::

Cephadm will only attempt to write the logrotate config out to each host once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'd lose this one.

.. note::

Cephadm will only attempt to write the logrotate config out to each host once
and will not repeatedly check the config in each host to make sure it's "correct".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it does not monitor the file on the host for changes or reconcile differences from the centrally configured content., or something along those lines?

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What drives the frequency of invocations of logrotate? ceph.py does it manually every 15m? I actually think 15m is too infrequent. Some workloads can easily produce >>10G of logs in 15 minutes.

qa/tasks/cephadm.py Outdated Show resolved Hide resolved
qa/tasks/cephadm.py Outdated Show resolved Hide resolved
'echo', teuth_cluster_logrotate_config, run.Raw('>'), '/root/logrotate.conf'
])
_shell(ctx, cluster_name, bootstrap_remote,
['ceph', 'orch', 'write-custom-logrotate', 'cluster', '-i', '/mnt/logrotate.conf'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a divergence in configuration behavior with ceph.py in that this task is not using

https://github.com/ceph/ceph/blob/69bd270cf53ee6fe98ee2c0e86624bd85040149f/qa/tasks/logrotate.conf

I don't think, in practice, anyone bothers to configure that in any significant way so a static config is okay with me.

qa/tasks/cephadm.py Outdated Show resolved Hide resolved
@adk3798
Copy link
Contributor Author

adk3798 commented May 20, 2024

What drives the frequency of invocations of logrotate? ceph.py does it manually every 15m? I actually think 15m is too infrequent. Some workloads can easily produce >>10G of logs in 15 minutes.

We're making use of the Rotater class so whatever we do should match what ceph.py did in terms of frequency. I actually thought it would be every 30 seconds though because of the self.stop_event.wait(timeout=30) in the invoke_logrotate function of the class. Is the 15 minute cycle something you've seen from experience?

@adk3798 adk3798 force-pushed the cephadm-custom-logrotate-config branch from 2f52eba to cd5abb8 Compare May 20, 2024 18:02
@adk3798
Copy link
Contributor Author

adk3798 commented May 20, 2024

Addressed review comments with the exception of the documentation ones. Will get back to those after I see some success when testing this.

@adk3798 adk3798 requested a review from batrick May 20, 2024 18:06
@batrick
Copy link
Member

batrick commented May 21, 2024

What drives the frequency of invocations of logrotate? ceph.py does it manually every 15m? I actually think 15m is too infrequent. Some workloads can easily produce >>10G of logs in 15 minutes.

We're making use of the Rotater class so whatever we do should match what ceph.py did in terms of frequency. I actually thought it would be every 30 seconds though because of the self.stop_event.wait(timeout=30) in the invoke_logrotate function of the class. Is the 15 minute cycle something you've seen from experience?

My memory failed me. It is 30 seconds as you found.

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Would you like cephfs to run this through the fs suite or is there more to be done?

Edit: fs:workload is sufficient for testing I think.

@adk3798
Copy link
Contributor Author

adk3798 commented May 21, 2024

Thank you! Would you like cephfs to run this through the fs suite or is there more to be done?

Edit: fs:workload is sufficient for testing I think.

I haven't quite got this working yet. Did a run through the orch suite and hit

2024-05-21T15:42:56.770 DEBUG:teuthology.orchestra.run.smithi120:> set -ex
2024-05-21T15:42:56.770 DEBUG:teuthology.orchestra.run.smithi120:> dd of=/root/logrotate.conf
2024-05-21T15:42:56.775 DEBUG:teuthology.orchestra.run:got remote process result: 1

I'll do an fs:workload run once I get it to pass the orch suite

@adk3798 adk3798 force-pushed the cephadm-custom-logrotate-config branch from cd5abb8 to a38f6ed Compare May 21, 2024 18:08
bootstrap_remote.write_file(
path='/etc/ceph/logrotate.conf',
data=teuth_cluster_logrotate_config)
_shell(ctx, cluster_name, bootstrap_remote,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two remaining issues here

  1. The upgrade tests will be bootstrapping with a version that can't handle the write-custom-logrotate commands. We need a fall back that manually writes the files out for that case
  2. Because we're relying on cephadm to write these logrotate files out in the background, there is a chance the Rotater class tries to rotate the logs before cephadm has gotten a chance to write the logrotate config (keep in mind we're currently starting the Rotater going before the other hosts have even been added to the cluster). We'll probably need to make the Rotater be able to handle some failures caused by the logrotate config not being present.

Signed-off-by: Adam King <adking@redhat.com>
Making use of the new command in the cephadm binary to
do this on a single host

Signed-off-by: Adam King <adking@redhat.com>
The default logrotate conf cephadm generates has no size
setting, which makes it not work for teuthology

This tries to make use of the Rotater class from the
ceph task. In order to do that, it moves the class
to the top level of the file

Fixes: https://tracker.ceph.com/issues/65827

Signed-off-by: Adam King <adking@redhat.com>
Signed-off-by: Adam King <adking@redhat.com>
@adk3798 adk3798 force-pushed the cephadm-custom-logrotate-config branch from a38f6ed to bbfef55 Compare May 23, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants