New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up replicator logs #846

Merged
merged 2 commits into from Sep 28, 2017

Conversation

Projects
None yet
3 participants
@nickva
Contributor

nickva commented Sep 28, 2017

Previously replicator was unnesessarily verbose during crashes. This commit
reduces the vorbosity and make the error messages more helpful.

Most of the replication failures happen in the startup phase when both target
and source are opened. That's a good place to handle common errors, and there
were a few already handled (db not found, lack of authorization). This commit
adds another other common one - inability to resolve endpoint host names. This
covers cases were there user mistypes the host name or there is a DNS issue.

Also during the startup phase, if an error occurs a stacktrace was logged in
addition to the whole state of the #rep{} record. Most of the rep record and
the stack are not that useful compared to how much noise it generates. So
instead, log only a few relevant fields from #rep{} and only the top 2 stack
frames. Combined with dns lookup failure this change results in almost a 4x
(2KB vs 500B) reduction in log noise while providing better debugging
information.

One last source of excessive log noise the dumping of the full replicator job
state during crashes. This included both the #rep and the #rep_state records.
Those have a lot of redundant information, and since they are dumped as tuples,
it was hard to use and find the values of each individual field. In this case
format_status/2 was improved to dump only a selected set of field along with
their names. This results in another 3x reduction in log noise.

A separate commit replaces the silly multi-line replication start log message with
a shorter and more informative one.

nickva added some commits Sep 28, 2017

Clean up replicator logs
Previously replicator was unnesessarily verbose during crashes. This commit
reduces the vorbosity and make the error messages more helpful.

Most of the replication failures happen in the startup phase when both target
and source are opened. That's a good place to handle common errors, and there
were a few already handled (db not found, lack of authorization). This commit
adds another other common one - inability to resolve endpoint host names. This
covers cases were there user mistypes the host name or there is a DNS issue.

Also during the startup phase, if an error occurs a stacktrace was logged in
addition to the whole state of the #rep{} record. Most of the rep record and
the stack are not that useful compared to how much noise it generates. So
instead, log only a few relevant fields from #rep{} and only the top 2 stack
frames. Combined with dns lookup failure this change results in almost a 4x
(2KB vs 500B) reduction in log noise while providing better debugging
information.

One last source of excessive log noise the dumping of the full replicator job
state during crashes. This included both the #rep and the #rep_state records.
Those have a lot of redundnat information, and since they are dumped as tuples,
it was hard to use and find the values of each individual field. In this case
`format_status/2` was improved to dump only a selected set of field along with
their names. This results in another 3x reduction in log noise.
Replace replication start multi-line log statement
Use a shorter and more informative single line string:

```
Starting replication f9a503bf456a4779fd07901a6dbdb501+continuous+create_target (http://adm:*****@127.0.0.1:15984/a/ -> http://adm:*****@127.0.0.1:15984/bar/) from doc _replicator:my_rep2 worker_procesess:4 worker_batch_size:500 session_id:b4df2a53e33fb6441d82a584a8888f85
```

For replication from _replicate endpoint, doc info is skipped and it is clearly
indicated a `_replicate` replication:

```
Starting replication aa0aa3244d7886842189980108178651+continuous+create_target (http://adm:*****@localhost:15984/a/ -> http://adm:*****@localhost:15984/t/) from _replicate endpoint worker_procesess:4 worker_batch_size:500 session_id:6fee11dafc3d8efa6497c67ecadac35d
```

Also remove redundant `starting new replication...` log.
@davisp

This comment has been minimized.

Show comment
Hide comment
@davisp

davisp Sep 28, 2017

Member

+1

Member

davisp commented Sep 28, 2017

+1

@jaydoane

This comment has been minimized.

Show comment
Hide comment
@jaydoane

jaydoane Sep 28, 2017

Contributor

LGTM

  couch_replicator_scheduler_job: scheduler_job_format_status_test...ok
Contributor

jaydoane commented Sep 28, 2017

LGTM

  couch_replicator_scheduler_job: scheduler_job_format_status_test...ok

@nickva nickva merged commit b756b77 into apache:master Sep 28, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nickva nickva deleted the cloudant:better-logs-for-replicator branch Sep 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment