Skip to content
This repository was archived by the owner on Aug 2, 2022. It is now read-only.

Conversation

@nickjjzhao
Copy link
Contributor

@nickjjzhao nickjjzhao commented Jul 9, 2021

Change Description

EPE-933 [GitHub:issue:10255]:

on networks like EOS Mainnet, 9 attempts is very small and useless for intermittent failures like running a zipkin upgrade. For production use, need to work more like:

  1. retry every 30 seconds
  2. have health reporting as to if connected or not
  3. process SIGHUP or similar to force re-connect
  4. Further if telemetry-url is a DNS name that points to multiple A or AAAA records, nodeos should try all the addreses returned before giving up.

Notes:

Change Type

Select ONE:

  • Documentation
  • Stability bug fix
  • Other
  • Other - special case

Testing Changes

Select ANY that apply:

  • New Tests
  • Existing Tests
  • Test Framework
  • CI System
  • Other

Documentation Additions

  • Documentation Additions

Method handle_sighup() defined in zipkin is to handle signal SIGHUP, and this method is not called directly from the original SIGHUP signal handler but from other handlers, e.g., handle_sighup() of net_plugin, one of the mandatory plugins, can be used to forward signal SIGHUP by calling zipkin's handle_sighup().

Add a new option:
telemetry-retry-interval-us, optional parameter, specifies the retry interval for connecting to zipkin with default value set to 30000000

@nickjjzhao nickjjzhao marked this pull request as draft July 9, 2021 04:08
@nickjjzhao nickjjzhao marked this pull request as ready for review July 9, 2021 13:46
@nickjjzhao nickjjzhao merged commit a5b0a8d into release/2.1.x Jul 9, 2021
@nickjjzhao nickjjzhao deleted the jjz-epe933-zipkin-2.1.x branch July 9, 2021 16:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants