Skip to content

Conversation

@nickjjzhao
Copy link

@nickjjzhao nickjjzhao commented Jun 14, 2021

Change Description

EPE-933 [GitHub:issue:10255]

on networks like EOS Mainnet, 9 attempts is very small and useless for intermittent failures like running a zipkin upgrade.

For production use, need to work more like:

  1. retry every 30 seconds
  2. have health reporting as to if connected or not
  3. process SIGHUP or similar to force re-connect
  4. Further if telemetry-url is a DNS name that points to multiple A or AAAA records, nodeos should try all the addreses returned before giving up

Notes:

  • Existing code supports item 4 above.

Change Type

Select ONE:

  • Documentation
  • Stability bug fix
  • Other
  • Other - special case

Testing Changes

Select ANY that apply:

  • New Tests
  • Existing Tests
  • Test Framework
  • CI System
  • Other

Documentation Additions

  • Documentation Additions

Method handle_sighup() defined in zipkin is to handle signal SIGHUP, and this method is not called directly from the original SIGHUP signal handler but from other handlers, e.g., handle_sighup() of net_plugin, one of the mandatory plugins, can be used to forward signal SIGHUP by calling zipkin's handle_sighup().

Add a new option:
telemetry-retry-interval-us, optional parameter, specifies the retry interval for connecting to zipkin with default value set to 30000000.

@nickjjzhao nickjjzhao marked this pull request as draft June 14, 2021 15:28
@nickjjzhao nickjjzhao marked this pull request as ready for review June 16, 2021 14:35
@heifner heifner merged commit 1b3e514 into eosio Jul 6, 2021
@heifner heifner deleted the jjz-epe933-zipkin branch July 6, 2021 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants