Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeds up synchronisation of the blocks for the fuel-core-sync service #1916

Merged
merged 28 commits into from
Jun 4, 2024

Conversation

xgreenx
Copy link
Collaborator

@xgreenx xgreenx commented May 29, 2024

The change speeds up synchronisation and removes noise logs when the P2P request crashes.

The synchronization speed improvement is done by:

  • Making all requests asynchronous. Before, the first request to fetch the blocked header was synchronous and required awaiting it because of and_then.
  • The process of block fetching is separated from block execution. Before, we fetched X blocks and waited for their execution before requesting new X blocks. The fetching is a new parallel task that uses the batch_sender variable for backpressure. The task fetches blocks in parallel and sends them into a channel with buffer 1. The executor's task is to receive the batch from the channel and execute it, allowing the request for a new batch to be made in parallel.

If the execution speed is faster than fetching, we request more than one batch in parallel, up to the buffer limit. Tests shows that.

The change also handles cases better when a P2P request fails without a known PeerId.

Without change, I was syncing 2M blocks for 4 hours. With this change, it takes me 20 minutes.

Checklist

  • New behavior is reflected in tests

Before requesting review

  • I have reviewed the code myself

xgreenx and others added 9 commits May 29, 2024 22:50
…It also fixes the issue of not reconnecting to the reserved nodes when the reserved node is restarted and a new IP.

- The change moves the reconnection handling into the `PeerReport` behavior. It removes ping-ponging reserved peers between the primary behavior and the `PeerReport` behavior and encapsulates the logic inside the `PeerReport`. Also, it eliminates the timer and replaces it with the queue of reconnections, reducing noise in logs(before, we had much more trash errors).
- Added logs for cases when the dial fails. They are very helpful to debug issues with connection.
- Simplified initialization of the `ConnectionTracker` and `FuelAuthenticated`. It allows the reuse of libp2p built-in connections builder.
- Removed the usage of the Mplex since it doesn't have a backpressure mechanism. Now we use Yamux by default. It is breaking the change since nodes with the old codebase can't connect to new nodes.
- Propagated `max_concurrent_streams` for yamux.
…P2P request crashes.

The synchronization speed improvement is done by:
- Making all requests asynchronous. Before, the first request to fetch the blocked header was synchronous and required awaiting it because of `and_then`.
- The process of block fetching is separated from block execution. Before, we fetched X blocks and waited for their execution before requesting new X blocks. The fetching is a new parallel task that uses the `batch_sender` variable for backpressure. The task fetches blocks in parallel and sends them into a channel with buffer `1`. The executor's task is to receive the batch from the channel and execute it, allowing the request for a new batch to be made in parallel.

If the execution speed is faster than fetching, we request more than one batch in parallel, up to the buffer limit.

The change also handles cases better when a P2P request fails without a known `PeerId`.
@xgreenx xgreenx requested review from bvrooman, MitchTurner and a team May 29, 2024 22:48
@xgreenx xgreenx self-assigned this May 29, 2024
if height < *range.start() {
// The committed height is less than the start of the processing range,
// it is a lag between committing and processing.
None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It removes strange logs where committed height goes down. Plus removes duplicate logs

Base automatically changed from feature/fixed-p2p-reconnection-issue to master June 4, 2024 19:12
@xgreenx xgreenx merged commit 5536f5e into master Jun 4, 2024
32 checks passed
@xgreenx xgreenx deleted the feature/speed-up-synchronization branch June 4, 2024 19:55
@xgreenx xgreenx mentioned this pull request Jun 6, 2024
xgreenx added a commit that referenced this pull request Jun 7, 2024
## Version v0.28.0

### Changed
- [#1934](#1934): Updated
benchmark for the `aloc` opcode to be `DependentCost`. Updated
`vm_initialization` benchmark to exclude growing of memory(It is handled
by VM reuse).
- [#1916](#1916): Speed up
synchronisation of the blocks for the `fuel-core-sync` service.
- [#1888](#1888):
optimization: Reuse VM memory across executions.

#### Breaking

- [#1934](#1934): Changed
`GasCosts` endpoint to return `DependentCost` for the `aloc` opcode via
`alocDependentCost`.
- [#1934](#1934): Updated
default gas costs for the local testnet configuration. All opcodes
became cheaper.
- [#1924](#1924):
`dry_run_opt` has new `gas_price: Option<u64>` argument
- [#1888](#1888): Upgraded
`fuel-vm` to `0.51.0`. See
[release](https://github.com/FuelLabs/fuel-vm/releases/tag/v0.51.0) for
more information.

### Added
- [#1939](#1939): Added API
functions to open a RocksDB in different modes.
- [#1929](#1929): Added
support of customization of the state transition version in the
`ChainConfig`.

### Removed
- [#1913](#1913): Removed dead
code from the project.

### Fixed
- [#1921](#1921): Fixed
unstable `gossipsub_broadcast_tx_with_accept` test.
- [#1915](#1915): Fixed
reconnection issue in the dev cluster with AWS cluster.
- [#1914](#1914): Fixed
halting of the node during synchronization in PoA service.

## What's Changed
* Removed dead code by @xgreenx in
#1913
* Added backward and forward compatibility integration tests for
forkless upgrades by @xgreenx in
#1895
* Fixed halting of the node in rare conditions by @xgreenx in
#1914
* Weekly `cargo update` by @github-actions in
#1928
* Fixed logging of the WASM executor by @xgreenx in
#1930
* Added support of customization of the state transition version in the
`ChainConfig` by @xgreenx in
#1929
* Document wasm toolchain installation, add rust-toolchain.toml by
@Dentosal in #1932
* Add optional `gas_price` argument to `dry_run_opt` by @hal3e in
#1924
* Reuse VM memory across executions by @Dentosal in
#1888
* Fixed reconnection issue in the dev cluster with AWS cluster by
@xgreenx in #1915
* Speeds up synchronisation of the blocks for the `fuel-core-sync`
service by @xgreenx in #1916
* Fixed unstable `gossipsub_broadcast_tx_with_accept` test by @xgreenx
in #1921
* Added API functions to open a RocksDB in different modes by @xgreenx
in #1939
* Use `DependentCost` for `aloc` opcode by @xgreenx in
#1934

## New Contributors
* @hal3e made their first contribution in
#1924

**Full Changelog**:
v0.27.0...v0.28.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants