Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth1 endpoints validation #3869

Merged
merged 8 commits into from
Apr 21, 2021
Merged

Conversation

tbenr
Copy link
Contributor

@tbenr tbenr commented Apr 18, 2021

PR Description

Implements Eth1 Provider validation, completing #3832

TODO

  • implement tests in Web3jEth1MonitorableProviderTest

Documentation

  • I thought about documentation and added the documentation label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

Copy link
Contributor

@ajsutton ajsutton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good. Mostly just need to think about thread safety in terms of the structure. We'll also need to test it out a bit and make sure that we don't wind up logging too many errors when nodes are down or in the wrong state - eth1 can very easily fill up the logs with errors but it's not actually that big a problem if the eth1 endpoint is unavailable.

.exceptionallyCompose(
err -> {
LOG.warn("Retrying with next eth1 endpoint", err);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This WARN and the error below will wind up being excessively noisy when a provider is unavailable. We probably need to keep them at debug level until we address #3855 which can take a more wholistic approach to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be the case since I'm skipping non-working endpoints until the next validation check.

@tbenr
Copy link
Contributor Author

tbenr commented Apr 19, 2021

@ajsutton another thing to improve is the startup phase. I was thinking to put a silent on in FallbackAwareEth1Provider::run retry until the Monitor completed a validation cycle.

I'll fix broken tests too :)

@ajsutton
Copy link
Contributor

@ajsutton another thing to improve is the startup phase. I was thinking to put a silent on in FallbackAwareEth1Provider::run retry until the Monitor completed a validation cycle.

I'll fix broken tests too :)

We got the build running for you automatically now. :). btw if you're running acceptance tests locally - they use the docker image, so run ./gradelw distDocker to "deploy" any changes to production code. You can then just run the test itself in IntelliJ which makes life easy.

@ajsutton
Copy link
Contributor

@ajsutton another thing to improve is the startup phase. I was thinking to put a silent on in FallbackAwareEth1Provider::run retry until the Monitor completed a validation cycle.

Forgot to put the response to this bit in. I think that makes a lot of sense - startup is certainly a bit of a special case. It would be nice to be able to handle it cleanly where that first validation cycle returns a SafeFuture so we can just wait for it to complete before sending requests to that node rather than having to retry, but its not a big deal.

@ajsutton
Copy link
Contributor

btw, my guess for the failing tests is that chain ID we're expecting for the Eth1 node doesn't match the actual chain ID so we now ignore the endpoint entirely whereas previously we just logged a warning.

20:30:36.046 ERROR - PLEASE CHECK YOUR ETH1 NODE (endpoint [besunode5:8545 [1]])| Wrong Eth1 chain id (expected=5, actual=2018)

The simplest solution is probably just to set the chain ID in the Besu genesis file: https://github.com/ConsenSys/teku/blob/74ab474b984de7ffe338e87a6e8ab0d7874a7429/acceptance-tests/src/testFixtures/resources/besu/depositContractGenesis.json#L3

@tbenr
Copy link
Contributor Author

tbenr commented Apr 20, 2021

@ajsutton another thing to improve is the startup phase. I was thinking to put a silent on in FallbackAwareEth1Provider::run retry until the Monitor completed a validation cycle.

I'll fix broken tests too :)

We got the build running for you automatically now. :). btw if you're running acceptance tests locally - they use the docker image, so run ./gradelw distDocker to "deploy" any changes to production code. You can then just run the test itself in IntelliJ which makes life easy.

Was already doing that but that time i forgot it😁

@tbenr
Copy link
Contributor Author

tbenr commented Apr 20, 2021

@ajsutton another thing to improve is the startup phase. I was thinking to put a silent on in FallbackAwareEth1Provider::run retry until the Monitor completed a validation cycle.

Forgot to put the response to this bit in. I think that makes a lot of sense - startup is certainly a bit of a special case. It would be nice to be able to handle it cleanly where that first validation cycle returns a SafeFuture so we can just wait for it to complete before sending requests to that node rather than having to retry, but its not a big deal.

I already did it (not exactly as you said, which is far better) But i want to improve it by notifig at first success or at the end. So we can start as soon as a valid endpoint is found (letting timeouts go their way)

@tbenr
Copy link
Contributor Author

tbenr commented Apr 20, 2021

@ajsutton I implemented everything i wanted to. I'm overall satisfied!

Copy link
Contributor

@ajsutton ajsutton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really looking good. I've left a bunch of comments but they're mostly small. I'm going to spend some time doing some manual testing locally as well and pay attention to what logs come out etc but this is great work.

Comment on lines 22 to 23
success,
failed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: by convention enum names are generally all upper case.

Comment on lines 77 to 80
}
}
// should never occur
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd make this a default case that throws an exception so we get a very loud error if a new enum variant is added for some reason.

Suggested change
}
}
// should never occur
return true;
default:
throw new IllegalStateException("Unknown result type: " + lastValidationResult);
}
default:
throw new IllegalStateException("Unknown result type: " + lastCallResult);
}

I'm not entirely sure I have that suggested change right but it should show the idea. :)

Comment on lines 66 to 77
String hostname;
try {
String tmp = Splitter.on("://").splitToList(endpoint).get(1);
hostname = Splitter.on("/").splitToList(tmp).get(0);
} catch (Exception e) {
hostname = "unknown";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't reinvent URL parsing here. I'd suggest something like:

Suggested change
String hostname;
try {
String tmp = Splitter.on("://").splitToList(endpoint).get(1);
hostname = Splitter.on("/").splitToList(tmp).get(0);
} catch (Exception e) {
hostname = "unknown";
}
String hostname;
try {
final URI uri = new URI(endpoint);
if (uri.getPort() != - 1) {
hostname = uri.getHost() + ":" + uri.getPort();
} else {
hostname = uri.getHost();
}
} catch (URISyntaxException e) {
hostname = "unknown";
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. don't know why I did that :)

this.initialValidationCompleted = new SafeFuture<>();
}

public class ValidEth1ProviderIterator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Typically we put internal classes at the bottom of the file.

if (chainId.intValueExact() != Constants.DEPOSIT_CHAIN_ID) {
STATUS_LOG.eth1DepositChainIdMismatch(
Constants.DEPOSIT_CHAIN_ID, chainId.intValueExact(), this.id);
throw new RuntimeException("Wrong Chainid");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this throw an exception or just return false?
And should it update the last validation result?

Copy link
Contributor Author

@tbenr tbenr Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, the idea of throwing there was to delegate everything to the handleComposed

Comment on lines 49 to 76
// let's prepare a parallel validation stream
Stream<SafeFuture<Boolean>> validationStream =
eth1ProviderSelector
.getProviders()
.parallelStream()
.filter(MonitorableProvider::needsToBeValidated)
.map(MonitorableProvider::validate);

if (eth1ProviderSelector.isInitialValidationCompleted()) {
// if we already notified a completion, just execute all validations.
validationStream.forEach(isValidFuture -> isValidFuture.always(() -> {}));
} else {
// otherwise let's notify a validation completion as soon as we have a valid endpoint or in
// any case at the end of all validations.
SafeFuture.allOf(
validationStream
.map(
isValidFuture ->
isValidFuture.thenApply(
(isValid) -> {
if (isValid) {
eth1ProviderSelector.notifyValidationCompletion();
}
return null;
}))
.toArray(SafeFuture[]::new))
.always(eth1ProviderSelector::notifyValidationCompletion);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it's safe to complete a future multiple times, it's ok if we call eth1ProviderSelector::notifyValidationCompletion multiple times as well. So I think I'd remove this if and just always use the else case.

We can also simplify a little by using SafeFuture.thenPeek.

And given that all the requests are made async anyway, I would just use a normal .stream() rather than making it parallel - involving more threads won't help and may actually be slower due to the overhead of moving across threads.

    SafeFuture.allOf(
            eth1ProviderSelector.getProviders().stream()
                .filter(MonitorableProvider::needsToBeValidated)
                .map(MonitorableProvider::validate)
                .map(
                    isValidFuture ->
                        isValidFuture.thenPeek(
                            isValid -> {
                              if (isValid) {
                                eth1ProviderSelector.notifyValidationCompletion();
                              }
                            }))
                .toArray(SafeFuture[]::new))
        .alwaysRun(eth1ProviderSelector::notifyValidationCompletion)
        .finish(error -> LOG.error("Unexpected error while validating eth1 endpoints", error));

The .alwaysRun(...).finish() at the end just ensures that if there's an exception thrown we do wind up logging it with some context about what was happening. We could potentially use .reportExceptions() but it doesn't provide as much useful context and tends to be a lot noisier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think I'd remove this if and just always use the else case.

since we go in the else branch only once, I was on the side of saving an array of SafeFutures and some additional useless lambdas execution, sacrificing code neatness

And given that all the requests are made async anyway, I would just use a normal .stream()

yeah very good point.

The .alwaysRun(...).finish() at the end just ensures that if there's an exception thrown we do wind up logging it with some context about what was happening. We could potentially use .reportExceptions() but it doesn't provide as much useful context and tends to be a lot noisier.

👍 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's a tiny bit wasteful to notify validation complete multiple times but given we run this so rarely and the cost of those calls is so low I don't think you could notice the difference either way. The benefit of clearer code is definitely worth it in this case (as it usually is - generally the simplest code is also the fastest code).

Comment on lines 227 to 229
validating.set(false);
return futureReturn;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For safety, we should set validating to false in an .alwaysRun block. That way we guarantee validating gets set back to false even if an unexpected exception gets thrown in this handling code.

Suggested change
validating.set(false);
return futureReturn;
});
return futureReturn;
})
.alwaysRun(() -> validating.set(false));

.alwaysRun here is like a finally block in a try/catch.

@Override
public SafeFuture<Boolean> validate() {
if (validating.compareAndSet(false, true)) {
LOG.info("Validating endpoint {} ...", this.id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a routine thing to have happen so probably just log at debug level.

updateLastValidation(Result.failed);
futureReturn.complete(Boolean.FALSE);
} else {
LOG.info("Endpoint {} is VALID", this.id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also routine so can just be debug level.

@@ -162,4 +190,46 @@ public Web3jEth1Provider(final Web3j web3j, final AsyncRunner asyncRunner) {
return (List<EthLog.LogResult<?>>) (List) logs;
});
}

@Override
public SafeFuture<Boolean> validate() {
Copy link
Contributor

@ajsutton ajsutton Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent a bit of time working out how to pull all the advice below together and this is what I come up with to replace this whole method:

  @Override
  public SafeFuture<Boolean> validate() {
    if (validating.compareAndSet(false, true)) {
      LOG.debug("Validating endpoint {} ...", this.id);
      return validateChainId()
          .thenCompose(
              result -> {
                if (result == Result.failed) {
                  return SafeFuture.completedFuture(result);
                } else {
                  return validateSyncing();
                }
              })
          .thenApply(
              result -> {
                updateLastValidation(result);
                return result == Result.success;
              })
          .exceptionally(
              error -> {
                LOG.warn(
                    "Endpoint {} is INVALID | {}",
                    this.id,
                    Throwables.getRootCause(error).getMessage());
                updateLastValidation(Result.failed);
                return false;
              })
          .alwaysRun(() -> validating.set(false));
    } else {
      LOG.debug("Already validating");
      return SafeFuture.completedFuture(isValid());
    }
  }

  private SafeFuture<Result> validateChainId() {
    return getChainId()
        .thenApply(
            chainId -> {
              if (chainId.intValueExact() != Constants.DEPOSIT_CHAIN_ID) {
                STATUS_LOG.eth1DepositChainIdMismatch(
                    Constants.DEPOSIT_CHAIN_ID, chainId.intValueExact(), this.id);
                return Result.failed;
              }
              return Result.success;
            });
  }

  private SafeFuture<Result> validateSyncing() {
    return ethSyncing()
        .thenApply(
            syncing -> {
              if (syncing) {
                LOG.warn("Endpoint {} is INVALID | Still syncing", this.id);
                updateLastValidation(Result.failed);
                return Result.failed;
              } else {
                LOG.debug("Endpoint {} is VALID", this.id);
                updateLastValidation(Result.success);
                return Result.success;
              }
            });
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point on splitting things up..

@tbenr
Copy link
Contributor Author

tbenr commented Apr 21, 2021

Thanks @ajsutton for the valuable comments!
Going to study and merge them during the day

Copy link
Contributor

@ajsutton ajsutton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This is really excellent work. Thanks so much for contributing this.

@ajsutton ajsutton merged commit a2fdfff into Consensys:master Apr 21, 2021
@tbenr tbenr deleted the validate_eth1_endpoints branch April 22, 2021 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants