Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collate high-volume logs for improved throughput #3662

Merged
merged 4 commits into from Jun 28, 2018

Conversation

@bbrks
Copy link
Member

commented Jun 27, 2018

This PR is intended to improve throughput when we have extremely high volumes of logging. The idea is to batch up logs into a buffer before collating them and pushing to the log output in a single action.

We also have a timeout mechanism, so even if the buffer is not filled, after a short time (1ms), we'll flush it to the output anyway. This avoids having logs sat in a buffer for ages.

I've had to add handling for panic recovery and os interrupts in ServerMain, as without these short delays, we'd miss seeing the last of any logs we wrote.

Based on developer testing, we'd expect to see double the write throughput compared to the 2.1 default config, whilst still logging everything, although still slightly short of the 2.0 default (HTTP only) logging.

screen shot 2018-06-27 at 13 14 35

@bbrks bbrks requested a review from adamcfraser Jun 27, 2018

@bbrks bbrks added the review label Jun 27, 2018

@bbrks bbrks force-pushed the collate_logs branch from 33c3a72 to 07c0f7a Jun 27, 2018

logger.collateBuffer = make(chan string, *config.CollationBufferSize)

// Start up a single worker to consume messages from the buffer
go func() {

This comment has been minimized.

Copy link
@adamcfraser

adamcfraser Jun 27, 2018

Contributor

Is there a way to reuse the worker code between console and file loggers?

logBuffer = append(logBuffer, l)
if len(logBuffer) >= *config.CollationBufferSize {
logger.logger.Print(strings.Join(logBuffer, "\n"))
logBuffer = []string{}

This comment has been minimized.

Copy link
@adamcfraser

adamcfraser Jun 27, 2018

Contributor

Is there a more GC-efficient way of resetting the logBuffer?


// Take the LoggerCollateFlushTimeout and multiply it to give plenty of time to allow
// the log collation buffers to be flushed to outputs before exiting Sync Gateway.
logCollationBufferFlushDelay = 100 * base.LoggerCollateFlushTimeout

This comment has been minimized.

Copy link
@adamcfraser

adamcfraser Jun 27, 2018

Contributor

Maybe move this constant to be defined beside LoggerCollateFlushTimeout to ensure any future modifications are done in step?

Switching to a fixed time (like 1s) might be more appropriate.

// Delay any panics to allow log collation buffers to flush to the outputs.
if r := recover(); r != nil {
base.Errorf(base.KeyAll, "Handling panic: %v", r)
time.Sleep(logCollationBufferFlushDelay)

This comment has been minimized.

Copy link
@adamcfraser

adamcfraser Jun 27, 2018

Contributor

Could switch this to a call like FlushLogs() that lives with the rest of the logging code, to keep the functionality clear. If it makes sense to keep it here, move the comment down.

bbrks added some commits Jun 25, 2018

Add initial log collation code
Use channel and worker for collating logs

Remove unused collate method

Add os.Interrupt handling

Move log flushing delay into ServerMain

Add config option to tweak logging collate buffer size

Improve constant usage across packages

Only set log collation buffer size for verbose logs

Fix collation buffer size init when non-info/debug

Drop default collate buffer size down to 10

@bbrks bbrks force-pushed the collate_logs branch from af1a8ea to 39c313f Jun 27, 2018

bbrks added some commits Jun 27, 2018

@bbrks bbrks merged commit eff3137 into master Jun 28, 2018

2 checks passed

continuous-integration/drone the build was successful
Details
continuous-integration/jenkins/pr-head This commit looks good
Details

@bbrks bbrks deleted the collate_logs branch Sep 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.