Update Zipkin connection retry #200

nickjjzhao · 2021-06-14T15:19:55Z

Change Description

on networks like EOS Mainnet, 9 attempts is very small and useless for intermittent failures like running a zipkin upgrade.

For production use, need to work more like:

retry every 30 seconds
have health reporting as to if connected or not
process SIGHUP or similar to force re-connect
Further if telemetry-url is a DNS name that points to multiple A or AAAA records, nodeos should try all the addreses returned before giving up

Notes:

Existing code supports item 4 above.

Change Type

Select ONE:

Documentation

Stability bug fix

Other

Other - special case

Testing Changes

Select ANY that apply:

New Tests

Existing Tests

Test Framework

CI System

Other

Documentation Additions

Documentation Additions

Method handle_sighup() defined in zipkin is to handle signal SIGHUP, and this method is not called directly from the original SIGHUP signal handler but from other handlers, e.g., handle_sighup() of net_plugin, one of the mandatory plugins, can be used to forward signal SIGHUP by calling zipkin's handle_sighup().

Add a new option:
telemetry-retry-interval-us, optional parameter, specifies the retry interval for connecting to zipkin with default value set to 30000000.

src/log/zipkin.cpp

include/fc/log/zipkin.hpp

src/log/zipkin.cpp

include/fc/log/zipkin.hpp

src/log/zipkin.cpp

include/fc/log/zipkin.hpp

… lock free

src/log/zipkin.cpp

include/fc/log/zipkin.hpp

src/log/zipkin.cpp

nickjjzhao added 2 commits June 14, 2021 10:02

Add a log message when zipkin is connected

dc7df62

Retry connecting to zipkin every 30 seconds

b23635c

nickjjzhao marked this pull request as draft June 14, 2021 15:28

Process SIGHUP to enable zipkin_appender

e3c3be7

nickjjzhao marked this pull request as ready for review June 16, 2021 14:35

heifner suggested changes Jun 16, 2021

View reviewed changes

src/log/zipkin.cpp Outdated Show resolved Hide resolved

include/fc/log/zipkin.hpp Outdated Show resolved Hide resolved

nickjjzhao added 3 commits June 16, 2021 15:00

Move sleep(30) into the calling thread of post

66bb5f2

Make handling SIGHUP signal code thread safe

5b493ac

Update the method handling SIGHUP signal

2394a54

heifner reviewed Jun 17, 2021

View reviewed changes

include/fc/log/zipkin.hpp Outdated Show resolved Hide resolved

heifner reviewed Jun 17, 2021

View reviewed changes

include/fc/log/zipkin.hpp Outdated Show resolved Hide resolved

heifner reviewed Jun 17, 2021

View reviewed changes

src/log/zipkin.cpp Outdated Show resolved Hide resolved

nickjjzhao added 4 commits June 17, 2021 16:54

Use a timer instead of sleep to avoid main thread being blocked

e4b4253

Update the handler of asynchronous wait on the timer

218e261

Make method handle_sighup() signal safe

c47902c

Add scope operator to an atomic var

6e3ff70

heifner suggested changes Jun 24, 2021

View reviewed changes

src/log/zipkin.cpp Show resolved Hide resolved

include/fc/log/zipkin.hpp Outdated Show resolved Hide resolved

src/log/zipkin.cpp Outdated Show resolved Hide resolved

Make var timer_expired thread safe

3545c92

heifner reviewed Jun 24, 2021

View reviewed changes

include/fc/log/zipkin.hpp Outdated Show resolved Hide resolved

nickjjzhao added 2 commits June 24, 2021 15:56

Remove unneeded comment

4e99264

Redefine a var as atomic_flag so it is signal safe, thread atomic and…

36ecb2b

… lock free

heifner suggested changes Jun 28, 2021

View reviewed changes

src/log/zipkin.cpp Outdated Show resolved Hide resolved

src/log/zipkin.cpp Outdated Show resolved Hide resolved

nickjjzhao added 2 commits June 28, 2021 14:12

Revert last commit that can cause a race condition

d6912ce

Add a new option retry_interval_us

1d15fda

nickjjzhao mentioned this pull request Jun 28, 2021

Update Zipkin connection retry - develop EOSIO/eos#10432

Merged

10 tasks

heifner suggested changes Jun 29, 2021

View reviewed changes

include/fc/log/zipkin.hpp Outdated Show resolved Hide resolved

src/log/zipkin.cpp Outdated Show resolved Hide resolved

src/log/zipkin.cpp Outdated Show resolved Hide resolved

src/log/zipkin.cpp Outdated Show resolved Hide resolved

Fix new comments

d740af3

heifner approved these changes Jul 6, 2021

View reviewed changes

heifner merged commit 1b3e514 into eosio Jul 6, 2021

heifner deleted the jjz-epe933-zipkin branch July 6, 2021 14:19

nickjjzhao mentioned this pull request Jul 9, 2021

Update Zipkin connection retry - 2.1.x EOSIO/eos#10486

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Zipkin connection retry #200

Update Zipkin connection retry #200

Uh oh!

nickjjzhao commented Jun 14, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update Zipkin connection retry #200

Update Zipkin connection retry #200

Uh oh!

Conversation

nickjjzhao commented Jun 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Description

Change Type

Testing Changes

Documentation Additions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nickjjzhao commented Jun 14, 2021 •

edited

Loading