Logging refactor #842

aaron-blox · 2023-02-08T15:36:30Z

Background

SSV project had one type of logger. A refactor was started but not completed.

Assumptions/Goals

We want to optimize for logging performance
We want to optimize for understanding the lifecycle of an event though logging
We don’t want un-logged debug logs to use resources when they will not be logged anyway

Existing problems

1. Log quality/content needs to be better

Message and field names are not standardized/uniform

Potential Solutions

Wrappers for field names, specific for zap fields
Declare/enforce that logs use field names which are defined as constants
Add contextual information to logs earlier so they don’t need to get added to each log line. Fixing problem (3.) will naturally improve this.

2. Too many logs

For example, an operator has a problem and enables debug logs. This results in too many logs to be useful

Potential Solutions

Tracing could potentially solve this issue. In the meantime until tracing is implemented, we can just dealing with it with a hacky solution in the meantime, like commenting out logs/wrapping logs in if globalconfig.VerboseDebugLogEnabled
Specify logging levels by component. We maybe already have this to some extent in runners, not everywhere. Specify components by regex in config. Change config struct to a more key:value design.
Additional log level to support production debugging vs development debugging
1. Criticism: why in principle is a production issue separate from an issue in development such that it requires different logs to solve?
2. We will skip doing this for now, a better solution is 6. anyway
Global variable which is used to gate additional debug logs. 6. Is better than this don’t do this
Devs use debugger locally & in staging. Debugging server can be easily started locally. Debugging server can be started in staging and can be connected to with some amount of work/configuration.
Write a zap middleware which operates on a global bool similar to 4. to selectively log logs with a specific magic value like zap.Bool(“verboseDebugLog”, true)

3. Logging is not request-scoped for global shared variables

When instances are shared between different go-routines and use the same logger, for the logger to log request specific values, values are typically added into each log execution of the logger in a redundant manner. A global or instance level logger should not be used for the same reason.

For example: The database

Potential Solutions

A logger with contextual fields already loaded should be passed in to the instance methods, for structs/instances which are shared between multiple message processing routines.

4. Methods return errors which are debug logged

Errors are encountered and are “handled” by debug logging the error.

Examples:

ev, err := contractAbi.EventByID(vLog.Topics[0])

if err != nil { // unknown event -> ignored

ec.logger.Debug("could not read event by ID",

zap.String("eventId", vLog.Topics[0].Hex()),

zap.Uint64("block", vLog.BlockNumber),

zap.String("txHash", vLog.TxHash.Hex()),

zap.Error(err),

)

return "", nil

}

if isCloseError(err) {

logger.Debug("failed to read message as the connection was closed", zap.Error(err))

return

}

if err != nil {

dvs.logger.Debug("could not read subnets", zap.String("enr", e.Node.String()))

return

}

Potential Solutions

Wrap these errors in a specific error type to handle them in specific ways, in a central place, higher up the stack.
Log these errors at a trace level, which we don’t have today.

5. Heavy/Large logs

Logs are too large in size/length. This causes performance issues in the system, performance issues with the logging service (kibana), and makes reading logs difficult.

Potential Solutions

Zap middleware to truncate long logs. We maybe don’t want to truncate these.
Zap middleware to change log level

6. Some log field values are expensive functions

Example: log.Debug(“msg”, zap.String(“field”, parsing.ExpensiveComputation(functionInput))

In this case, if debug logging is not enabled, parsing.ExpensiveComputation(functionInput) is still executed anyway.

Potential Solutions

Create a custom zap field type for lazy evaluation of log values, wrapped in an anonymous function:

log.Debug(“msg”, logginglib.LazyLoad(“field”, func() { parsing.ExpensiveComputation(functionInput) } )

7. Log search is difficult when trying to find one single “trace”

It would be nice to search for one ID that is logged for the entire flow of a certain message process

Fox example, in the blox project, a globally unique key is constructed and added to logs like:

Potential Solutions

Generate a unique deterministic key which can be added to a loggers field like in the blox project.
Mark critical event logs with an emoji which is easily identifiable in logs

Overall solution

Commit to these changes

Pass request specific logger through program via context.Context
Identify which fields/types of data are most useful to query
Create documentation about how to write good logs. Have some process/mechanism to verify this in pull requests
Get rid of global / struct level loggers whos individual instance is used by multiple go-routines/message processings
Add contextual information to loggers which are instantiated when a new request/message enters the system (like a ctx.Context)
Add zap middleware to change the log level of too long logs
Define constants for log field names, or write functions which wrap values and return a full zap field.
Change logger level config to a key:value structure, regex-path:level. Logger names/subNames should be defined as constants.
Standardize logger naming so regex can be used to map components to different log levels
Devs use debugger locally & in staging. Debugging server can be easily started locally. Debugging server can be started in staging and can be connected to with some amount of work/configuration.
Custom zap field type for lazy evaluation of log values. Need one field per type, or create a type switch in the implementation.
Identify where errors are being logged and not handled. Determine if that is good practice for the few cases where it happens.
Create documentation &/or tooling to help devs use debugger locally and in staging easier.
Rethink which logs we need when we are debugging. Look though current logs and validate if current is good (delete bad/add good)
Identify a way to create a composite key to set in logs which is unique per message/process, like in the blox project.

Won’t do

Refactor project to use a different logger that gets us another log level

Implementation Phases

Code changes which have no dependencies

Pass request specific logger through program via context.Context
Get rid of global / struct level loggers whos individual instance is used by multiple go-routines/message processings
Add zap middleware to change the log level of too long logs
Custom zap field type for lazy evaluation of log values. Need one field per type, or create a type switch in the implementation.

Path based log level configuration

Standardize logger naming so regex can be used to map components to different log levels. Right now it seems filepath based, is that really useful? Why not use package name? Current path used / it should use .
Change logger level config to a key:value structure, regex-path:level. Logger names/subNames should be defined as constants.

Standardize

Identify which fields/types of data are most useful to query
Identify a way to create a composite key to set in logs which is unique per message/process, like in the blox project.
Mark critical event logs with an emoji which is easily identifiable in logs.
Define functions for log fields which wrap values and return a zap field type
Clean up existing logging “message” field.
Create documentation &/or tooling to help devs use debugger locally and in staging easier

Clean up unhandled errors

Identify where errors are being handled as being logged as debug logs. Determine the cases where the error occurs and what the intent of logging the error is.
Create a new error type, or types, for these kinds of errors

Documentation (do last)

Create documentation about how to write good logs. Have some process/mechanism to verify this in pull requests. Maybe just use github PR template.
Create documentation &/or tooling to help devs use debugger locally and in staging easier. Give examples like conditional breakpoint & how to connect to remove debugging server. Docs for VSCode and goland

Review and improve existing logs (do last)

Rethink which logs we need when we are debugging. Look though current logs and validate if current is good (delete bad/add good)

y0sher · 2023-03-15T08:21:49Z

Hi @aaron-blox , I have some experience with the zap logger.
I want to add some additional points to the "Too many logs" issue that might come in handy,

Use zap's With(k,v) for adding "fixed" fields and Named(string) for names, for example the node ID should be attached to probably almost every log line for a specific operator.
This is done by initializing a logger in main and using With to add the ID when we have it ready (load from file/cfg/generate), then passing the logger down to any other component, of course this approach can be used within every component, for example adding the P2P ID on the P2P constructor to the logger using With and adding p2p using Named.
resulting in a logger that logs something like this:
TIME LEVEL node.p2p messagehere { "operator_id": "", "p2p_id": "" } for every log from this component.
Using config or GRPC to toggle log levels for a specific component. This wasn't too straight forward with zap but I've done it before, this way if a developer debugs a specific component and wants verbose logs he can just turn on the DEBUG level for this specific component instead of flooding the test/prod env with debug logs from all the components.

Also I agree that relaying only on logs is an issue, tracing and metrics might give much more reliable feedback, but tracing is hard adding to an existing big project.

let me know what you think, I can help/give examples.

aaron-blox added the enhancement New feature or request label Feb 8, 2023

aaron-blox self-assigned this Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging refactor #842

Logging refactor #842

aaron-blox commented Feb 8, 2023

y0sher commented Mar 15, 2023 •

edited

Logging refactor #842

Logging refactor #842

Comments

aaron-blox commented Feb 8, 2023

Background

Assumptions/Goals

Existing problems

1. Log quality/content needs to be better

Potential Solutions

2. Too many logs

Potential Solutions

3. Logging is not request-scoped for global shared variables

Potential Solutions

4. Methods return errors which are debug logged

Potential Solutions

5. Heavy/Large logs

Potential Solutions

6. Some log field values are expensive functions

Potential Solutions

7. Log search is difficult when trying to find one single “trace”

Potential Solutions

Overall solution

Commit to these changes

Won’t do

Implementation Phases

Code changes which have no dependencies

Path based log level configuration

Standardize

Clean up unhandled errors

Documentation (do last)

Review and improve existing logs (do last)

y0sher commented Mar 15, 2023 • edited

y0sher commented Mar 15, 2023 •

edited