-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Narwhal] improve resilience of nodes growing and following DAG #8288
Comments
I assume we'll want to do that only to the nodes who haven't voted for the underlying header, right?
To confirm that child certificate digests match we'll need to have previously the child certificates processed? Trying to see the difference with current approach. |
Why is that the case? This should be done against all peers for created certificates that have not been ack'ed, with a max size bound e.g. 100, to avoid the recipient having to rely on the higher latency fetch process. |
The difference is that the previous child certificates do not have to be verified, if their digests are included transitively from verified certificates. The caveat is fetch response served from these data may contain certificates with unverified signatures now, so the recipient of the fetch response needs to be able to handle that. |
…#8270) Currently much of `Certificate` processing logic lives in `Core`, which has the following drawbacks: - Processing is single threaded, even if much of `Certificate` processing can be parallelized, e.g. verifications, checking parents, etc. - It is hard to evolve, e.g. if we want to lookup suspended certificates while checking existence of parents, a new channel need to be added. This PR moves the `Certificate` processing logic into `Synchronizer`, which is now responsible for synchronizing DAG, not just the missing parent certificates. In future, additional data structures will be added to `Synchronizer` to keep track of suspended certificates. Majority of the changes are the required plumbing. The most significant change is that certificate broadcast is re-implemented. #8288
## Description Narwhal used to have `HeaderWaiter` that suspends Headers and Certificates, and commits a certificate when it has no missing parent. The active requests for parent certificates were a bit problematic, and the idea of suspending received certificates is useful. As Narwhal decreases its round intervals, more nodes may start to receive certificates out of their causal order. It is useful to temporarily buffer the certificates to smooth out disruptions from network or slower nodes. ## Test Plan unit tests deployed to private testnet #8288
Catch up performance will be tracked separately. Currently the bottleneck to catch up performance is payload fetching. |
Currently, the limiting factor to reduce Narwhal round generation speed (hence reducing e2e Narwhal latency) is the resilience of nodes growing and following DAG. It is observed that once more Narwhal rounds are generated per sec, at some limit nodes start to fail behind in following the DAG, resulting in higher e2e latencies. We plan to make the following improvements to help nodes receiving certificate out of order or missing one or more of them to stay up-to-speed with growth of the DAG.
Propagating recently created certificates
Certificate
handling fromCore
intoSynchronizer
, which allows concurrent processing of certificates and access to metadata.Improving certificates catchup throughput
The text was updated successfully, but these errors were encountered: