You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SSV project had one type of logger. A refactor was started but not completed.
Assumptions/Goals
We want to optimize for logging performance
We want to optimize for understanding the lifecycle of an event though logging
We don’t want un-logged debug logs to use resources when they will not be logged anyway
Existing problems
1. Log quality/content needs to be better
Message and field names are not standardized/uniform
Potential Solutions
Wrappers for field names, specific for zap fields
Declare/enforce that logs use field names which are defined as constants
Add contextual information to logs earlier so they don’t need to get added to each log line. Fixing problem (3.) will naturally improve this.
2. Too many logs
For example, an operator has a problem and enables debug logs. This results in too many logs to be useful
Potential Solutions
Tracing could potentially solve this issue. In the meantime until tracing is implemented, we can just dealing with it with a hacky solution in the meantime, like commenting out logs/wrapping logs in if globalconfig.VerboseDebugLogEnabled
Specify logging levels by component. We maybe already have this to some extent in runners, not everywhere. Specify components by regex in config. Change config struct to a more key:value design.
Additional log level to support production debugging vs development debugging
Criticism: why in principle is a production issue separate from an issue in development such that it requires different logs to solve?
We will skip doing this for now, a better solution is 6. anyway
Global variable which is used to gate additional debug logs. 6. Is better than this don’t do this
Devs use debugger locally & in staging. Debugging server can be easily started locally. Debugging server can be started in staging and can be connected to with some amount of work/configuration.
Write a zap middleware which operates on a global bool similar to 4. to selectively log logs with a specific magic value like zap.Bool(“verboseDebugLog”, true)
3. Logging is not request-scoped for global shared variables
When instances are shared between different go-routines and use the same logger, for the logger to log request specific values, values are typically added into each log execution of the logger in a redundant manner. A global or instance level logger should not be used for the same reason.
For example: The database
Potential Solutions
A logger with contextual fields already loaded should be passed in to the instance methods, for structs/instances which are shared between multiple message processing routines.
4. Methods return errors which are debug logged
Errors are encountered and are “handled” by debug logging the error.
Examples:
ev, err := contractAbi.EventByID(vLog.Topics[0])
if err != nil { // unknown event -> ignored
ec.logger.Debug("could not read event by ID",
zap.String("eventId", vLog.Topics[0].Hex()),
zap.Uint64("block", vLog.BlockNumber),
zap.String("txHash", vLog.TxHash.Hex()),
zap.Error(err),
)
return "", nil
}
if isCloseError(err) {
logger.Debug("failed to read message as the connection was closed", zap.Error(err))
return
}
if err != nil {
dvs.logger.Debug("could not read subnets", zap.String("enr", e.Node.String()))
return
}
Potential Solutions
Wrap these errors in a specific error type to handle them in specific ways, in a central place, higher up the stack.
Log these errors at a trace level, which we don’t have today.
5. Heavy/Large logs
Logs are too large in size/length. This causes performance issues in the system, performance issues with the logging service (kibana), and makes reading logs difficult.
Potential Solutions
Zap middleware to truncate long logs. We maybe don’t want to truncate these.
7. Log search is difficult when trying to find one single “trace”
It would be nice to search for one ID that is logged for the entire flow of a certain message process
Fox example, in the blox project, a globally unique key is constructed and added to logs like:
<taskType>-e<epoch>-s<slot>-v<validatorIndex>
Potential Solutions
Generate a unique deterministic key which can be added to a loggers field like in the blox project.
Mark critical event logs with an emoji which is easily identifiable in logs
Overall solution
Commit to these changes
Pass request specific logger through program via context.Context
Identify which fields/types of data are most useful to query
Create documentation about how to write good logs. Have some process/mechanism to verify this in pull requests
Get rid of global / struct level loggers whos individual instance is used by multiple go-routines/message processings
Add contextual information to loggers which are instantiated when a new request/message enters the system (like a ctx.Context)
Add zap middleware to change the log level of too long logs
Define constants for log field names, or write functions which wrap values and return a full zap field.
Change logger level config to a key:value structure, regex-path:level. Logger names/subNames should be defined as constants.
Standardize logger naming so regex can be used to map components to different log levels
Devs use debugger locally & in staging. Debugging server can be easily started locally. Debugging server can be started in staging and can be connected to with some amount of work/configuration.
Custom zap field type for lazy evaluation of log values. Need one field per type, or create a type switch in the implementation.
Identify where errors are being logged and not handled. Determine if that is good practice for the few cases where it happens.
Create documentation &/or tooling to help devs use debugger locally and in staging easier.
Rethink which logs we need when we are debugging. Look though current logs and validate if current is good (delete bad/add good)
Identify a way to create a composite key to set in logs which is unique per message/process, like in the blox project.
Won’t do
Refactor project to use a different logger that gets us another log level
Implementation Phases
Code changes which have no dependencies
Pass request specific logger through program via context.Context
Get rid of global / struct level loggers whos individual instance is used by multiple go-routines/message processings
Add zap middleware to change the log level of too long logs
Custom zap field type for lazy evaluation of log values. Need one field per type, or create a type switch in the implementation.
Path based log level configuration
Standardize logger naming so regex can be used to map components to different log levels. Right now it seems filepath based, is that really useful? Why not use package name? Current path used / it should use .
Change logger level config to a key:value structure, regex-path:level. Logger names/subNames should be defined as constants.
Standardize
Identify which fields/types of data are most useful to query
Identify a way to create a composite key to set in logs which is unique per message/process, like in the blox project.
Mark critical event logs with an emoji which is easily identifiable in logs.
Define functions for log fields which wrap values and return a zap field type
Clean up existing logging “message” field.
Create documentation &/or tooling to help devs use debugger locally and in staging easier
Clean up unhandled errors
Identify where errors are being handled as being logged as debug logs. Determine the cases where the error occurs and what the intent of logging the error is.
Create a new error type, or types, for these kinds of errors
Documentation (do last)
Create documentation about how to write good logs. Have some process/mechanism to verify this in pull requests. Maybe just use github PR template.
Create documentation &/or tooling to help devs use debugger locally and in staging easier. Give examples like conditional breakpoint & how to connect to remove debugging server. Docs for VSCode and goland
Review and improve existing logs (do last)
Rethink which logs we need when we are debugging. Look though current logs and validate if current is good (delete bad/add good)
The text was updated successfully, but these errors were encountered:
Hi @aaron-blox , I have some experience with the zap logger.
I want to add some additional points to the "Too many logs" issue that might come in handy,
Use zap's With(k,v) for adding "fixed" fields and Named(string) for names, for example the node ID should be attached to probably almost every log line for a specific operator.
This is done by initializing a logger in main and using With to add the ID when we have it ready (load from file/cfg/generate), then passing the logger down to any other component, of course this approach can be used within every component, for example adding the P2P ID on the P2P constructor to the logger using With and adding p2p using Named.
resulting in a logger that logs something like this: TIME LEVEL node.p2p messagehere { "operator_id": "", "p2p_id": "" } for every log from this component.
Using config or GRPC to toggle log levels for a specific component. This wasn't too straight forward with zap but I've done it before, this way if a developer debugs a specific component and wants verbose logs he can just turn on the DEBUG level for this specific component instead of flooding the test/prod env with debug logs from all the components.
Also I agree that relaying only on logs is an issue, tracing and metrics might give much more reliable feedback, but tracing is hard adding to an existing big project.
let me know what you think, I can help/give examples.
Background
SSV project had one type of logger. A refactor was started but not completed.
Assumptions/Goals
Existing problems
1. Log quality/content needs to be better
Message and field names are not standardized/uniform
Potential Solutions
2. Too many logs
For example, an operator has a problem and enables debug logs. This results in too many logs to be useful
Potential Solutions
Tracing could potentially solve this issue. In the meantime until tracing is implemented, we can just dealing with it with a hacky solution in the meantime, like commenting out logs/wrapping logs in if globalconfig.VerboseDebugLogEnabled
Specify logging levels by component. We maybe already have this to some extent in runners, not everywhere. Specify components by regex in config. Change config struct to a more key:value design.
Additional log level to support production debugging vs development debugging
Global variable which is used to gate additional debug logs. 6. Is better than this don’t do this
Devs use debugger locally & in staging. Debugging server can be easily started locally. Debugging server can be started in staging and can be connected to with some amount of work/configuration.
Write a zap middleware which operates on a global bool similar to 4. to selectively log logs with a specific magic value like zap.Bool(“verboseDebugLog”, true)
3. Logging is not request-scoped for global shared variables
When instances are shared between different go-routines and use the same logger, for the logger to log request specific values, values are typically added into each log execution of the logger in a redundant manner. A global or instance level logger should not be used for the same reason.
For example: The database
Potential Solutions
4. Methods return errors which are debug logged
Errors are encountered and are “handled” by debug logging the error.
Examples:
ev, err := contractAbi.EventByID(vLog.Topics[0])
if err != nil { // unknown event -> ignored
ec.logger.Debug("could not read event by ID",
zap.String("eventId", vLog.Topics[0].Hex()),
zap.Uint64("block", vLog.BlockNumber),
zap.String("txHash", vLog.TxHash.Hex()),
zap.Error(err),
)
return "", nil
}
if isCloseError(err) {
logger.Debug("failed to read message as the connection was closed", zap.Error(err))
return
}
if err != nil {
dvs.logger.Debug("could not read subnets", zap.String("enr", e.Node.String()))
return
}
Potential Solutions
5. Heavy/Large logs
Logs are too large in size/length. This causes performance issues in the system, performance issues with the logging service (kibana), and makes reading logs difficult.
Potential Solutions
6. Some log field values are expensive functions
Example: log.Debug(“msg”, zap.String(“field”, parsing.ExpensiveComputation(functionInput))
In this case, if debug logging is not enabled, parsing.ExpensiveComputation(functionInput) is still executed anyway.
Potential Solutions
log.Debug(“msg”, logginglib.LazyLoad(“field”, func() { parsing.ExpensiveComputation(functionInput) } )
7. Log search is difficult when trying to find one single “trace”
It would be nice to search for one ID that is logged for the entire flow of a certain message process
Fox example, in the blox project, a globally unique key is constructed and added to logs like:
<taskType>-e<epoch>-s<slot>-v<validatorIndex>
Potential Solutions
Overall solution
Commit to these changes
Pass request specific logger through program via context.Context
Identify which fields/types of data are most useful to query
Create documentation about how to write good logs. Have some process/mechanism to verify this in pull requests
Get rid of global / struct level loggers whos individual instance is used by multiple go-routines/message processings
Add contextual information to loggers which are instantiated when a new request/message enters the system (like a ctx.Context)
Add zap middleware to change the log level of too long logs
Define constants for log field names, or write functions which wrap values and return a full zap field.
Change logger level config to a key:value structure, regex-path:level. Logger names/subNames should be defined as constants.
Standardize logger naming so regex can be used to map components to different log levels
Devs use debugger locally & in staging. Debugging server can be easily started locally. Debugging server can be started in staging and can be connected to with some amount of work/configuration.
Custom zap field type for lazy evaluation of log values. Need one field per type, or create a type switch in the implementation.
Identify where errors are being logged and not handled. Determine if that is good practice for the few cases where it happens.
Create documentation &/or tooling to help devs use debugger locally and in staging easier.
Rethink which logs we need when we are debugging. Look though current logs and validate if current is good (delete bad/add good)
Identify a way to create a composite key to set in logs which is unique per message/process, like in the blox project.
Won’t do
Implementation Phases
Code changes which have no dependencies
Path based log level configuration
Standardize
Clean up unhandled errors
Documentation (do last)
Review and improve existing logs (do last)
The text was updated successfully, but these errors were encountered: