Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Persistent journald logs #1956

Merged
merged 1 commit into from Dec 29, 2017
Merged

Conversation

feiskyer
Copy link
Member

What this PR does / why we need it:

Kubelet logs are lost after node reboot.

  • In /etc/systemd/journal.conf, it’s #Storage=auto by default and no persistent storage for journal logs
  • And the kubelet container will also be destroyed after reboot and no container logs of kubelet can be found under /var/log/containers as well.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Kubelet logs are lost after node reboot.

Special notes for your reviewer:

Release note:

@jackfrancis
Copy link
Member

Hi @feiskyer, we want to make sure we have some kind of log rotation/retention enforcement for this. Is that on by default?

@feiskyer
Copy link
Member Author

@jackfrancis No, rotation is not on by default. Related settings are (see here for reference):

#SystemMaxUse=
#SystemKeepFree=
#SystemMaxFileSize=
#SystemMaxFiles=100
#RuntimeMaxUse=
#RuntimeKeepFree=
#RuntimeMaxFileSize=
#RuntimeMaxFiles=100

What's your suggestion of those settings?

@andyzhangx
Copy link
Contributor

what about agent node kubelet logs?

@feiskyer
Copy link
Member Author

@jackfrancis Updated with rotation settings.

@andyzhangx This is for both master and agent nodes.

@@ -295,6 +295,10 @@ function extractKubectl(){
function ensureJournal(){
systemctl daemon-reload
systemctlEnableAndCheck systemd-journald.service
echo "Storage=persistent" >> /etc/systemd/journald.conf
echo "SystemMaxFileSize=10M" >> /etc/systemd/journald.conf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feiskyer are these the default values for SystemMaxFileSize, SystemMaxFiles, and MaxFileSec? Here is a representative example of the current /etc/systemd/journald.conf config on a deployed cluster:

[Journal]
#Storage=auto
#Compress=yes
#Seal=yes
#SplitMode=uid
#SyncIntervalSec=5m
#RateLimitInterval=30s
#RateLimitBurst=1000
#SystemMaxUse=
#SystemKeepFree=
#SystemMaxFileSize=
#SystemMaxFiles=100
#RuntimeMaxUse=
#RuntimeKeepFree=
#RuntimeMaxFileSize=
#RuntimeMaxFiles=100
#MaxRetentionSec=
#MaxFileSec=1month
#ForwardToSyslog=yes
#ForwardToKMsg=no
#ForwardToConsole=no
#ForwardToWall=yes
#TTYPath=/dev/console
#MaxLevelStore=debug
#MaxLevelSyslog=debug
#MaxLevelKMsg=notice
#MaxLevelConsole=info
#MaxLevelWall=emerg

@slack @brendanburns @khenidak any thoughts on what retention/rotation settings we should deliver as static config with a change to persisting journald logs? I'm O.K. with making this static config btw, but open to thoughts that we should do the work to make these low-level knobs user-configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent some time trying to understand the rat's nest of options...

I'd focus on high-water mark for disk usage as that is most likely to impact customers:

  • SystemMaxUse == 1GB
  • RuntimeMaxUse == 1GB

If unset, both MaxUse values would be 10% of the disk. Paired with acs-engine defaults that'd land us at 3GB each.

Most of the other params default to okay-ish values:

  • Max file size == 1/8th MaxUse
  • Max file sec == 1 month

With this change we should probably turn off ForwardToSyslog.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions @slack, thanks!

@feiskyer: let's set the following config options:

SystemMaxUse=1G
RuntimeMaxUse=1G
ForwardToSyslog=no

(@feiskyer, kindly verify the gigabyte/gibibyte unit character is G. According to my cursory research it is: http://man7.org/linux/man-pages/man5/journald.conf.5.html)

I'll revisit this after the holiday break, thanks again @feiskyer!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis thanks, updated.

Copy link
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackfrancis jackfrancis merged commit 9c9507c into Azure:master Dec 29, 2017
@feiskyer feiskyer deleted the persistent-journald branch December 29, 2017 01:59
@mpalumbo7
Copy link
Contributor

mpalumbo7 commented Apr 19, 2018

@feiskyer I'm trying to view logs about ssh activity in OMS, but ForwardToSyslog=no is preventing the OMS agent running on the host (not running as a container) from sending syslog, and I don't see any journald logs in OMS. I confirmed removing this setting made ssh logs appear via Log Analytics.

What was the reasoning behind setting ForwardToSyslog=no? Will removing it affect Kubernetes? If not, should I open a PR to remove it?

@feiskyer
Copy link
Member Author

@ewok2030 See discussion here. The change doesn't affect kubernetes, it's just the concern of disk usage.

Why are OMS agents depending on ForwardToSyslog?

@mpalumbo7
Copy link
Contributor

Ah, thanks I didn't see the discussion.

We're looking for ssh logs for security monitoring purposes. From what I can see, the only way to get those into Log Analytics is via the auth facility of syslog: https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-data-sources-syslog#configure-syslog-on-linux-agent

I'm not a linux admin, so I might be missing something obvious. I don't see anything about the OMS agent supporting journald.

@feiskyer
Copy link
Member Author

@slack @jackfrancis What do you think of enabling ForwardToSyslog?

@jackfrancis
Copy link
Member

@feiskyer @ewok2030 #2757

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants