-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth1 endpoints validation #3869
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking good. Mostly just need to think about thread safety in terms of the structure. We'll also need to test it out a bit and make sure that we don't wind up logging too many errors when nodes are down or in the wrong state - eth1 can very easily fill up the logs with errors but it's not actually that big a problem if the eth1 endpoint is unavailable.
pow/src/main/java/tech/pegasys/teku/pow/AbstractMonitorableEth1Provider.java
Outdated
Show resolved
Hide resolved
.exceptionallyCompose( | ||
err -> { | ||
LOG.warn("Retrying with next eth1 endpoint", err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This WARN
and the error below will wind up being excessively noisy when a provider is unavailable. We probably need to keep them at debug level until we address #3855 which can take a more wholistic approach to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be the case since I'm skipping non-working endpoints until the next validation check.
services/powchain/src/main/java/tech/pegasys/teku/services/powchain/Eth1ProviderMonitor.java
Outdated
Show resolved
Hide resolved
@ajsutton another thing to improve is the startup phase. I was thinking to put a silent on in I'll fix broken tests too :) |
a4bcd22
to
9ec358a
Compare
We got the build running for you automatically now. :). btw if you're running acceptance tests locally - they use the docker image, so run |
Forgot to put the response to this bit in. I think that makes a lot of sense - startup is certainly a bit of a special case. It would be nice to be able to handle it cleanly where that first validation cycle returns a |
btw, my guess for the failing tests is that chain ID we're expecting for the Eth1 node doesn't match the actual chain ID so we now ignore the endpoint entirely whereas previously we just logged a warning.
The simplest solution is probably just to set the chain ID in the Besu genesis file: https://github.com/ConsenSys/teku/blob/74ab474b984de7ffe338e87a6e8ab0d7874a7429/acceptance-tests/src/testFixtures/resources/besu/depositContractGenesis.json#L3 |
Was already doing that but that time i forgot it😁 |
I already did it (not exactly as you said, which is far better) But i want to improve it by notifig at first success or at the end. So we can start as soon as a valid endpoint is found (letting timeouts go their way) |
00f56d8
to
e3cf210
Compare
@ajsutton I implemented everything i wanted to. I'm overall satisfied! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really looking good. I've left a bunch of comments but they're mostly small. I'm going to spend some time doing some manual testing locally as well and pay attention to what logs come out etc but this is great work.
success, | ||
failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: by convention enum names are generally all upper case.
} | ||
} | ||
// should never occur | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd make this a default case that throws an exception so we get a very loud error if a new enum variant is added for some reason.
} | |
} | |
// should never occur | |
return true; | |
default: | |
throw new IllegalStateException("Unknown result type: " + lastValidationResult); | |
} | |
default: | |
throw new IllegalStateException("Unknown result type: " + lastCallResult); | |
} |
I'm not entirely sure I have that suggested change right but it should show the idea. :)
String hostname; | ||
try { | ||
String tmp = Splitter.on("://").splitToList(endpoint).get(1); | ||
hostname = Splitter.on("/").splitToList(tmp).get(0); | ||
} catch (Exception e) { | ||
hostname = "unknown"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably shouldn't reinvent URL parsing here. I'd suggest something like:
String hostname; | |
try { | |
String tmp = Splitter.on("://").splitToList(endpoint).get(1); | |
hostname = Splitter.on("/").splitToList(tmp).get(0); | |
} catch (Exception e) { | |
hostname = "unknown"; | |
} | |
String hostname; | |
try { | |
final URI uri = new URI(endpoint); | |
if (uri.getPort() != - 1) { | |
hostname = uri.getHost() + ":" + uri.getPort(); | |
} else { | |
hostname = uri.getHost(); | |
} | |
} catch (URISyntaxException e) { | |
hostname = "unknown"; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. don't know why I did that :)
this.initialValidationCompleted = new SafeFuture<>(); | ||
} | ||
|
||
public class ValidEth1ProviderIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Typically we put internal classes at the bottom of the file.
if (chainId.intValueExact() != Constants.DEPOSIT_CHAIN_ID) { | ||
STATUS_LOG.eth1DepositChainIdMismatch( | ||
Constants.DEPOSIT_CHAIN_ID, chainId.intValueExact(), this.id); | ||
throw new RuntimeException("Wrong Chainid"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this throw an exception or just return false?
And should it update the last validation result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, the idea of throwing there was to delegate everything to the handleComposed
// let's prepare a parallel validation stream | ||
Stream<SafeFuture<Boolean>> validationStream = | ||
eth1ProviderSelector | ||
.getProviders() | ||
.parallelStream() | ||
.filter(MonitorableProvider::needsToBeValidated) | ||
.map(MonitorableProvider::validate); | ||
|
||
if (eth1ProviderSelector.isInitialValidationCompleted()) { | ||
// if we already notified a completion, just execute all validations. | ||
validationStream.forEach(isValidFuture -> isValidFuture.always(() -> {})); | ||
} else { | ||
// otherwise let's notify a validation completion as soon as we have a valid endpoint or in | ||
// any case at the end of all validations. | ||
SafeFuture.allOf( | ||
validationStream | ||
.map( | ||
isValidFuture -> | ||
isValidFuture.thenApply( | ||
(isValid) -> { | ||
if (isValid) { | ||
eth1ProviderSelector.notifyValidationCompletion(); | ||
} | ||
return null; | ||
})) | ||
.toArray(SafeFuture[]::new)) | ||
.always(eth1ProviderSelector::notifyValidationCompletion); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it's safe to complete a future multiple times, it's ok if we call eth1ProviderSelector::notifyValidationCompletion
multiple times as well. So I think I'd remove this if and just always use the else
case.
We can also simplify a little by using SafeFuture.thenPeek
.
And given that all the requests are made async anyway, I would just use a normal .stream()
rather than making it parallel - involving more threads won't help and may actually be slower due to the overhead of moving across threads.
SafeFuture.allOf(
eth1ProviderSelector.getProviders().stream()
.filter(MonitorableProvider::needsToBeValidated)
.map(MonitorableProvider::validate)
.map(
isValidFuture ->
isValidFuture.thenPeek(
isValid -> {
if (isValid) {
eth1ProviderSelector.notifyValidationCompletion();
}
}))
.toArray(SafeFuture[]::new))
.alwaysRun(eth1ProviderSelector::notifyValidationCompletion)
.finish(error -> LOG.error("Unexpected error while validating eth1 endpoints", error));
The .alwaysRun(...).finish()
at the end just ensures that if there's an exception thrown we do wind up logging it with some context about what was happening. We could potentially use .reportExceptions()
but it doesn't provide as much useful context and tends to be a lot noisier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think I'd remove this if and just always use the else case.
since we go in the else
branch only once, I was on the side of saving an array of SafeFutures and some additional useless lambdas execution, sacrificing code neatness
And given that all the requests are made async anyway, I would just use a normal .stream()
yeah very good point.
The .alwaysRun(...).finish() at the end just ensures that if there's an exception thrown we do wind up logging it with some context about what was happening. We could potentially use .reportExceptions() but it doesn't provide as much useful context and tends to be a lot noisier.
👍 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it's a tiny bit wasteful to notify validation complete multiple times but given we run this so rarely and the cost of those calls is so low I don't think you could notice the difference either way. The benefit of clearer code is definitely worth it in this case (as it usually is - generally the simplest code is also the fastest code).
validating.set(false); | ||
return futureReturn; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For safety, we should set validating to false in an .alwaysRun
block. That way we guarantee validating gets set back to false even if an unexpected exception gets thrown in this handling code.
validating.set(false); | |
return futureReturn; | |
}); | |
return futureReturn; | |
}) | |
.alwaysRun(() -> validating.set(false)); |
.alwaysRun
here is like a finally
block in a try/catch.
@Override | ||
public SafeFuture<Boolean> validate() { | ||
if (validating.compareAndSet(false, true)) { | ||
LOG.info("Validating endpoint {} ...", this.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a routine thing to have happen so probably just log at debug level.
updateLastValidation(Result.failed); | ||
futureReturn.complete(Boolean.FALSE); | ||
} else { | ||
LOG.info("Endpoint {} is VALID", this.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also routine so can just be debug level.
@@ -162,4 +190,46 @@ public Web3jEth1Provider(final Web3j web3j, final AsyncRunner asyncRunner) { | |||
return (List<EthLog.LogResult<?>>) (List) logs; | |||
}); | |||
} | |||
|
|||
@Override | |||
public SafeFuture<Boolean> validate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent a bit of time working out how to pull all the advice below together and this is what I come up with to replace this whole method:
@Override
public SafeFuture<Boolean> validate() {
if (validating.compareAndSet(false, true)) {
LOG.debug("Validating endpoint {} ...", this.id);
return validateChainId()
.thenCompose(
result -> {
if (result == Result.failed) {
return SafeFuture.completedFuture(result);
} else {
return validateSyncing();
}
})
.thenApply(
result -> {
updateLastValidation(result);
return result == Result.success;
})
.exceptionally(
error -> {
LOG.warn(
"Endpoint {} is INVALID | {}",
this.id,
Throwables.getRootCause(error).getMessage());
updateLastValidation(Result.failed);
return false;
})
.alwaysRun(() -> validating.set(false));
} else {
LOG.debug("Already validating");
return SafeFuture.completedFuture(isValid());
}
}
private SafeFuture<Result> validateChainId() {
return getChainId()
.thenApply(
chainId -> {
if (chainId.intValueExact() != Constants.DEPOSIT_CHAIN_ID) {
STATUS_LOG.eth1DepositChainIdMismatch(
Constants.DEPOSIT_CHAIN_ID, chainId.intValueExact(), this.id);
return Result.failed;
}
return Result.success;
});
}
private SafeFuture<Result> validateSyncing() {
return ethSyncing()
.thenApply(
syncing -> {
if (syncing) {
LOG.warn("Endpoint {} is INVALID | Still syncing", this.id);
updateLastValidation(Result.failed);
return Result.failed;
} else {
LOG.debug("Endpoint {} is VALID", this.id);
updateLastValidation(Result.success);
return Result.success;
}
});
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point on splitting things up..
Thanks @ajsutton for the valuable comments! |
updated chainid for acceptance tests
988fae5
to
54f6540
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This is really excellent work. Thanks so much for contributing this.
PR Description
Implements Eth1 Provider validation, completing #3832
TODO
Web3jEth1MonitorableProviderTest
Documentation
documentation
label to this PR if updates are required.Changelog