Logging

Summary

Refine the guide for logging for developing Chaos Mesh.

Motivation

Logging in each component in Chaos Mesh is bad. Related issue: chaos-mesh/chaos-mesh#2149. Because of the mess on logging, the debugging and profiling of the components are difficult.

This proposal aims to improve the logging observability of the components and the development experience on printing logs.

After passed this proposal, I would write a developing guide for logging in the repo of Chaos Mesh.

Requirements

structured logging
leveled logging
runtime level configure
could logging in everywhere (every line of code)

Detailed design

We are going to use logr and zap (with zapr as the shim between them) to construct the logging facilities.

logr is a library that provides a common interface for logging, in other words, logging facade. It is designed to be used by different logging frameworks(likes the SLF4J API in Java), with using logr, we could switch from one logging backend to another without many changes.

zap is a high performance logging framework. zapr is the logr implementation using zap.

Global Logger

Here would exists a global logger at pkg/log, which marked as deprecated and only for the compatibility with the old code. Global Logger could be accessed by log.L().

The initial value of the global logger is logr.DiscardLogger, which means no log would be printed anywhere. The global logger should be replaced ONLY ONCE when the application starts.

Logger as dependency

The suggested way to use the logger is to use it as a dependency, and make dependency explicit.

GOOD Example:

type SomeStruct struct {
  state string
  otherField string
  logger logr.Logger
}

func NewSomeStruct(state string, otherField string, logger logr.Logger) *SomeStruct {
  return &SomeStruct{
    state: state,
    otherField: otherField,
    logger: logger,
  }
}

func (it *SomeStruct) DoSomething()  {
  it.logger.Info("Doing something")
}

MUST NOT use the implicitly logger in the constructor.

BAD example:

func NewSomeStruct(state string, otherField string) *SomeStruct {
  return &SomeStruct{
    state: state,
    otherField: otherField,
    logger: somewhere.DefaultLogger(),
  }
}

And logger should be a parameter of the function, not a global variable.

GOOD example:

func doSomething(logger logr.Logger) {
  logger.Info("Doing something")
}

BAD Example:

var logger = somewhere.DefaultLogger()

func doSomething()  {
  logger.Info("Doing something")
}

As you could see, it is actually a closure with an extern/global state.

Logging in Chaos Mesh Dashboard and gin

Chaos Mesh Dashboard use gin as thew web framework, and there are some logging printed when requests coming in.

There are two "gin middleware" for logging, called Logger() and Recovery(), included in the gin.Default() function.

So we should replace these 2 middlewares with other logger middlewares, and thanks to community, there are already some libraries for this: https://github.com/alron/ginlogr and https://github.com/gin-contrib/zap. We could choose one of them.

In other codes of Chaos Mesh Dashboard, we should use logger as same as other components.

Logging in cli tools

For cli tools, stdout and stderr are very important for user experience, so you could not print many logs to stdout as a service application. But it does not mean that you should not print anything with logger, you could still use logging to show the progress, or as the profiling tools.

Practices

Should I use the global logger

NO. At least, avoid using it in the new code.

The only reason to use the global logger is you have nowhere to access an instance of logger, which means you might write some code that relies on the global state, like some "registration pattern", or func init(), or some extremely simple function like func min(a, b int) int.

Please avoid using the global/package level variable/state when you coding, and once a "simple method" need a logger to print some message, it probably has been not "simple" anymore, please refactor it.

Using logger as function parameter is UGLY! Is there another way

YES. Consider to refactor the one simple function into a struct with method, and introduce the logger as the dependency of the struct.

Or use the "Functional Options", see: https://github.com/uber-go/guide/blob/master/style.md#functional-options.

If you are facing making a choice between making API clean and writing fewer codes, please choose the former one.

Should I log in library codes

YES. When you write codes as library or write codes under pkg/<some-package> which are exported, logging is not only help the development, but also help the users to understand the logic of the library. And logging is also a important part of observability of the library.

And using a logging facade (like logr) gives us flexibility without burdening library users with a certain logging framework.

Only use "high level" V(n).Info() n > 0 in the library codes, do not use Error(), V(0).Info() and Info(). Because V(0).Info() (or not defined a level) and Error() would most probably print something to stdout, which would might break the design of cli tools.

Logger name

The name of the logger should not be defined by itself, also should not be defined in the constructor. It should be defined in the caller for the constructor.

GOOD Example:

func main(){
  val logger = initializeApplicationRootLogger()
  NewWebModule(logger.WithName("web")).StartServe()
}

BAD Example:

func NewWebModule(logger logr.Logger) *WebModule {
  return &WebModule{
    logger: logger.WithName("web"),
  }
}

The name of the logger should follow these rules:

use kebab-case for the name
follow pattern: <application-name>.<component-name>.<component-name>

For zapr(logr adaptor for zap), the name of logger would be prefixed with its parent logger, which means:

rootLogger := <somewhere>
chaosDaemonLogger := rootLogger.WithName("chaos-daemon")
daemonServerLogger := chaosDaemonLogger.WithName("daemon-server")

The name of chaosDaemonLogger is chaos-daemon, and the name of daemonServerLogger is chaos-daemon.daemon-server.

Which level should I use

TL;DR, for most codes please follow these suggestions:

use logger.Error() for logging errors in ERROR level
use logger.Info() (or logger.V(0).Info()) for logging in INFO or WARN level
use logger.V(1).Info() for logging in DEBUG level
use logger.V(5).Info() for logging in TRACE level.

About the detailed meaning of each V level, we do not restrict the number of level for the logger, here is a list for the level we recommend to use:

Error() - Error should be used to indicated unexpected error, for example, unexpected errors returned by subroutine function calls.
Info() - Info should be used as a human journal or diary. And it could be used to log expected error as warning. Info() hash multiple levels:
- V(0) - This is the default level, would ALWAYS be visible to users.
  - CLI argument handling, print args before the program starts
  - Application configuration
  - Expected errors as warning message
  - Journal for application's major behavior
    - service endpoint: resolve requests
    - long-time batch-job: job start, job complete
  - Acquirement on system resources: listen on port, persistent in file, etc.
- V(1) - Default log level for debug.
  - Expected error that repeat frequently, relates to conditions would be corrected, like StatusReasonConflict when reconciling
  - Component configuration and initialize
- V(2) - Useful state changes and conditional branch.
  - State which is useful for debugging changed
  - Choose a branch in a conditional statement
- V(3) - Extended information about changes
  - Logging with state updates like err := updateState(); err == nil
  - Logging error with more context before wrap and return it
- V(4) - Debug level verbosity, any other behavior that changes the state
- V(5) - Default log level for trace.
  - Progressing in a for-loop
  - Omitted things within "best effort" pattern
  - Use default value with not configured value
- V(6) - Communication with other components
  - Logging in handler before processing/resolving request
  - RPC call with other components/services
- V(7) - Detailed information about communication
  - Detailed payload and status of request/response within RPC

The number of level still limited by the backend logging framework, for example, you could only create 128 level in zap.

Relations between error and logging

You could resolve an error by log it with Error(), it usually means this program could not handle this error properly, and let the user know. But if your codes could handle this error, including you "throwing" this error to the upper level, you should not log it with Error().

You could also use debug level Info() to log error, with more detailed context and behaviors. It is not conflict with the former rule.

Drawbacks

Alternatives

Why zap? And why not other logger framework as logr backend

There are still many other logging frameworks supported by logr: https://github.com/go-logr/logr#implementations-non-exhaustive. I select zap only because I am familiar with it, and I think the API of zap is easy to use and has lots of configure options for features.

Please comment on it if you have any suggestions.

Unresolved questions

logr just release its v1.x.x stable version, and brings BEARKING CHANGES with its v0.x.x: it changes the logr.Logger from interface to struct, see go-logr/logr#42.

Some library we used in Chaos Mesh, like controller-runtime, has upgrade to logr in v1.x.x in master branch, but not has released the stable version. The latest version for controller-runtime is 0.10.3, and 0.11.0-beta.0 has been released at 2020-11-10, and it still need several months to be stable.

So we have to use logr v0.4.0 for a while, and several months later, we will upgrade to logr v1.x.x.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2021-12-09-logging.md

2021-12-09-logging.md

Logging

Summary

Motivation

Requirements

Detailed design

Global Logger

Logger as dependency

Logging in Chaos Mesh Dashboard and gin

Logging in cli tools

Practices

Should I use the global logger

Using logger as function parameter is UGLY! Is there another way

Should I log in library codes

Logger name

Which level should I use

Relations between error and logging

Drawbacks

Alternatives

Why zap? And why not other logger framework as logr backend

Unresolved questions

Files

2021-12-09-logging.md

Latest commit

History

2021-12-09-logging.md

File metadata and controls

Logging

Summary

Motivation

Requirements

Detailed design

Global Logger

Logger as dependency

Logging in Chaos Mesh Dashboard and gin

Logging in cli tools

Practices

Should I use the global logger

Using logger as function parameter is UGLY! Is there another way

Should I log in library codes

Logger name

Which level should I use

Relations between error and logging

Drawbacks

Alternatives

Why zap? And why not other logger framework as logr backend

Unresolved questions