-
Notifications
You must be signed in to change notification settings - Fork 1k
Refactor Chain docs to add performance and troubleshooting #2264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
abec6ac
Refactor Chain docs to add performance and troubleshooting
henridevieux b06e6f5
refactor(server): Add redirect to Connecting to Base page
wbnns e68bc46
refactor(docs): Name of Connecting page
wbnns 37b1e03
refactor(sidebar): Alphabetize
wbnns feeb629
refactor(docs): Drop external link artifact for Flashblocks
wbnns e457cd2
Drop external link to Flashblocks from sidebar
wbnns 07b5664
refactor(node-performance): Add additional content
wbnns 505cb5f
Add additional content to running a base node doc
wbnns 7e10a1f
Add additional content to Snapshots doc
wbnns cd23765
Add content for troubleshooting page
wbnns 94c3e98
Fix link to node snapshots
wbnns 45ec269
Fix link to the Connecting to Base page
wbnns 3b7fe2f
Fix warning boxes on running a node page
wbnns dc474a2
Fix additional and callouts
wbnns 194d094
Merge branch 'master' into henri-devieux/chain-docs-refactor
wbnns 057fad5
fix(docs): Link to snapshots on node performance page
wbnns File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| --- | ||
| title: Node Performance | ||
| slug: /node-performance | ||
| description: Guides on how to improve syncing performance of Base node clients | ||
| --- | ||
|
|
||
| import { Button } from 'vocs/components' | ||
|
|
||
| # Node Performance | ||
|
|
||
| This guide provides recommendations for hardware, client software, and configuration settings to optimize the performance of your Base node. | ||
|
|
||
| ## Hardware | ||
|
|
||
| Running a performant Base node requires adequate hardware. We recommend the following minimum specifications: | ||
|
|
||
| 1. A modern multi-core CPU with good single-core performance. | ||
| 2. At least 32 GB RAM (64 GB recommended). | ||
| 3. A locally attached NVMe SSD drive. RAID 0 configurations can improve performance. | ||
| 4. Sufficient storage capacity calculated as: `(2 * [current chain size](https://base.org/stats) + [snapshot size](https://basechaindata.vercel.app) + 20% buffer)`. This accounts for chain data growth and snapshot restoration space. | ||
|
|
||
| :::info | ||
|
|
||
| If utilizing Amazon Elastic Block Store (EBS), io2 Block Express volumes are recommended to ensure sufficient disk read speeds, preventing latency issues during initial sync. However, **locally attached NVMe SSDs are strongly recommended over networked storage for optimal performance.** | ||
|
|
||
| ::: | ||
|
|
||
| ### Production Hardware Examples | ||
|
|
||
| The following are the hardware specifications used for Base production nodes: | ||
|
|
||
| - **Geth Full Node:** | ||
| - Instance: AWS `i4i.12xlarge` | ||
| - Storage: RAID 0 of all local NVMe drives (`/dev/nvme*`) | ||
| - Filesystem: ext4 | ||
| - **Reth Archive Node:** | ||
| - Instance: AWS `i4ie.6xlarge` | ||
| - Storage: RAID 0 of all local NVMe drives (`/dev/nvme*`) | ||
| - Filesystem: ext4 | ||
|
|
||
| ## Initial Sync | ||
|
|
||
| Using a recent [snapshot](./node-snapshots.mdx) can significantly reduce the time required for the initial node synchronization process. | ||
|
|
||
| ## Client Software | ||
|
|
||
| The [Base Node](https://github.com/base/node) repository contains the current stable configurations and instructions for running different client implementations. | ||
|
|
||
| ### Supported Clients | ||
|
|
||
| Reth is currently the most performant client for running Base nodes. Future optimizations will primarily focus on Reth. You can read more about the migration to Reth [here](https://blog.base.dev/scaling-base-with-reth). | ||
|
|
||
| | Type | Supported Clients | | ||
| | :------ | :------------------------------------------------------------------------------------------------------- | | ||
| | Full | [Reth](https://github.com/base/node/tree/main/reth), [Geth](https://github.com/base/node/tree/main/geth) | | ||
| | Archive | [Reth](https://github.com/base/node/tree/main/reth) | | ||
|
|
||
| ### Geth Performance Tuning | ||
|
|
||
| #### Geth Cache Settings | ||
|
|
||
| For Geth nodes, tuning cache allocation via environment variables can improve performance. These settings are used in the standard Docker configuration: | ||
|
|
||
| ```bash | ||
| # .env.mainnet / .env.sepolia | ||
| GETH_CACHE="20480" # Total P2P cache memory allowance (MB) (default: 1024) | ||
| GETH_CACHE_DATABASE="20" # Percentage of cache memory allowance for database io (default: 75) | ||
| GETH_CACHE_GC="12" # Percentage of cache memory allowance for garbage collection (default: 25) | ||
| GETH_CACHE_SNAPSHOT="24" # Percentage of cache memory allowance for snapshot caching (default: 10) | ||
| GETH_CACHE_TRIE="44" # Percentage of cache memory allowance for trie caching (default: 25) | ||
| ``` | ||
|
|
||
| #### Geth LevelDB Tuning | ||
|
|
||
| For teams running Geth with LevelDB, the following patch allows setting LevelDB initialization parameters via environment variables: | ||
|
|
||
| [https://github.com/0x00101010/goleveldb/commit/55ef34](https://github.com/0x00101010/goleveldb/commit/55ef3429673fb70d389d052a15a4423e13d8b43c) | ||
|
|
||
| This patch can be applied using a `replace` directive in `go.mod` when building `op-geth`. Here's how to modify the Dockerfile: | ||
|
|
||
| ```docker | ||
| RUN git clone $REPO --branch $VERSION --single-branch . && \ | ||
| git switch -c branch-$VERSION $COMMIT && \ | ||
| bash -c '[ "$(git rev-parse HEAD)" = "$COMMIT" ]' | ||
|
|
||
| RUN echo '' >> go.mod && \ // [!code ++] | ||
| echo 'replace github.com/syndtr/goleveldb => github.com/0x00101010/goleveldb v1.0.4-param-customization' >> go.mod && \ // [!code ++] | ||
| go mod tidy \ // [!code ++] | ||
|
|
||
| # Continue building op-geth | ||
| COPY op-geth/ ./ | ||
| RUN go run build/ci.go install -static ./cmd/geth | ||
| ``` | ||
|
|
||
| Recommended LevelDB environment variable values with this patch: | ||
|
|
||
| ```bash | ||
| # Recommended LevelDB Settings | ||
| LDB_BLOCK_SIZE="524288" # 512 KiB block size (matches common RAID 0 chunk sizes) | ||
| LDB_COMPACTION_TABLE_SIZE="8388608" # 8 MiB compaction table size (default: 2 MiB) | ||
| LDB_COMPACTION_TOTAL_SIZE="41943040" # 40 MiB total compaction size (default: 8 MiB) | ||
| LDB_DEBUG_OPTIONS="1" # Emit LevelDB debug logs | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| --- | ||
| title: Node Snapshots | ||
| slug: /node-snapshots | ||
| description: Information for how to restore a Base node from a snapshot | ||
| --- | ||
|
|
||
| # Snapshots | ||
|
|
||
| Using a snapshot significantly reduces the initial time required to sync a Base node. Snapshots are updated regularly. | ||
|
|
||
| :::warning | ||
|
|
||
| Geth Archive Nodes are no longer supported via snapshots due to performance limitations. For Archive functionality, please use Reth. | ||
|
|
||
| ::: | ||
|
|
||
| If you're a prospective or current Base node operator, you can restore from a snapshot to speed up your initial sync. Follow the steps below carefully. | ||
|
|
||
| ### Restoring from Snapshot | ||
|
|
||
| These steps assume you are in the cloned `node` directory (the one containing `docker-compose.yml`). | ||
|
|
||
| 1. **Prepare Data Directory**: | ||
|
|
||
| - **Before running Docker for the first time**, create the data directory on your host machine that will be mapped into the Docker container. This directory must match the `volumes` mapping in the `docker-compose.yml` file for the client you intend to use. | ||
| - For Geth: `mkdir ./geth-data` | ||
| - For Reth: `mkdir ./reth-data` | ||
| - If you have previously run the node and have an existing data directory, **stop the node** (`docker compose down`), remove the _contents_ of the existing directory (e.g., `rm -rf ./geth-data/*`), and proceed. | ||
|
|
||
| 2. **Download Snapshot**: Choose the appropriate snapshot for your network and client from the table below. Use `wget` or a similar tool to download it into the `node` directory. | ||
|
|
||
| | Network | Client | Snapshot Type | Download Command (`wget ...`) | | ||
| | ------- | ------ | ------------- | ---------------------------------------------------------------------------------------------------------------- | | ||
| | Testnet | Geth | Full | `https://sepolia-full-snapshots.base.org/$(curl https://sepolia-full-snapshots.base.org/latest)` | | ||
| | Testnet | Reth | Archive | `https://sepolia-reth-archive-snapshots.base.org/$(curl https://sepolia-reth-archive-snapshots.base.org/latest)` | | ||
| | Mainnet | Geth | Full | `https://mainnet-full-snapshots.base.org/$(curl https://mainnet-full-snapshots.base.org/latest)` | | ||
| | Mainnet | Reth | Archive | `https://mainnet-reth-archive-snapshots.base.org/$(curl https://mainnet-reth-archive-snapshots.base.org/latest)` | | ||
|
|
||
| :::info | ||
|
|
||
| Ensure you have enough free disk space to download the snapshot archive (`.tar.gz` file) _and_ extract its contents. The extracted data will be significantly larger than the archive. | ||
|
|
||
| ::: | ||
|
|
||
| 3. **Extract Snapshot**: Untar the downloaded snapshot archive. Replace `<snapshot-filename.tar.gz>` with the actual downloaded filename. | ||
|
|
||
| ```bash | ||
| tar -xzvf <snapshot-filename.tar.gz> | ||
| ``` | ||
|
|
||
| 4. **Move Data**: The extraction process will likely create a directory (e.g., `geth` or similar, check the output of the `tar` command). | ||
|
|
||
| - Move the _contents_ of this extracted directory into the data directory you created in Step 1. | ||
| - Example (if archive extracted to a `geth` folder): | ||
| ```bash | ||
| # For Geth | ||
| mv ./geth/* ./geth-data/ | ||
| rm -rf ./geth # Clean up empty extracted folder | ||
| ``` | ||
| - Example (if archive extracted to a `reth` folder - **verify actual folder name**): | ||
| ```bash | ||
| # For Reth | ||
| mv ./reth/* ./reth-data/ | ||
| rm -rf ./reth # Clean up empty extracted folder | ||
| ``` | ||
| - The goal is to have the chain data (directories like `chaindata`, `nodes`, `segments`, etc.) directly inside `./geth-data` or `./reth-data`, not within an extra subfolder. | ||
|
|
||
| 5. **Start the Node**: Now that the snapshot data is in place, start the node using the appropriate command (see the [Running a Base Node](/chain/run-a-base-node#setting-up-and-running-the-node) guide): | ||
|
|
||
| ```bash | ||
| # Example for Mainnet Geth | ||
| docker compose up --build -d | ||
| ``` | ||
|
|
||
| 6. **Verify and Clean Up**: Monitor the node logs (`docker compose logs -f <service_name>`) or use the [sync monitoring](/chain/run-a-base-node#monitoring-sync-progress) command to ensure the node starts syncing from the snapshot's block height. Once confirmed, you can safely delete the downloaded snapshot archive (`.tar.gz` file) to free up disk space. |
109 changes: 109 additions & 0 deletions
109
apps/base-docs/docs/pages/chain/node-troubleshooting.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| --- | ||
| title: Node Troubleshooting | ||
| slug: /node-troubleshooting | ||
| description: A guide to diagnosing and resolving common issues when running a Base node. | ||
| author: hndvx | ||
| --- | ||
|
|
||
| # Node Troubleshooting | ||
|
|
||
| This guide covers common issues encountered when setting up and running a Base node using the official [Base Node Docker setup](https://github.com/base/node) and provides steps to diagnose and resolve them. | ||
|
|
||
| ## General Troubleshooting Steps | ||
|
|
||
| Before diving into specific issues, here are some general steps that often help: | ||
|
|
||
| 1. **Check Container Logs**: This is usually the most informative step. Use `docker compose logs -f <service_name>` to view the real-time logs for a specific container. | ||
| - L2 Client (Geth): `docker compose logs -f op-geth` | ||
| - L2 Client (Reth): `docker compose logs -f op-reth` | ||
| - Rollup Node: `docker compose logs -f op-node` | ||
| Look for errors, warnings, or repeated messages. | ||
| 2. **Check Container Status**: Ensure the relevant Docker containers are running: `docker compose ps`. If a container is restarting frequently or exited, check its logs. | ||
| 3. **Check Resource Usage**: Monitor your server's CPU, RAM, disk I/O, and network usage. Performance issues are often linked to insufficient resources. Tools like `htop`, `iostat`, and `iftop` can be helpful. | ||
| 4. **Verify RPC Endpoints**: Use `curl` to check if the L2 client's RPC endpoint is responding (see [Running a Base Node > Verify Node is Running](./run-a-base-node#verify-node-is-running)). Also, verify your L1 endpoints are correct and accessible from the node server. | ||
| 5. **Check L1 Node**: Ensure your configured L1 node (Execution and Consensus) is fully synced, healthy, and accessible. Issues with the L1 node will prevent the L2 node from syncing correctly. | ||
|
|
||
| ## Common Issues and Solutions | ||
|
|
||
| ### Setup & Configuration Issues | ||
|
|
||
| - **Issue**: Docker command fails (`docker compose up ...`). | ||
| - **Check**: Is Docker and Docker Compose installed and the Docker daemon running? | ||
| - **Check**: Are you in the correct directory (the cloned `node` directory containing `docker-compose.yml`)? | ||
| - **Check**: Syntax errors in the command (e.g., misspelled `NETWORK_ENV` or `CLIENT`). | ||
| - **Issue**: Container fails to start, logs show errors related to `.env` files or environment variables. | ||
| - **Check**: Did you correctly configure the L1 endpoints (`OP_NODE_L1_ETH_RPC`, `OP_NODE_L1_BEACON`) in the correct `.env` file (`.env.mainnet` or `.env.sepolia`)? | ||
| - **Check**: Is the `OP_NODE_L1_BEACON_ARCHIVER` endpoint set if required by your configuration or L1 node? | ||
| - **Check**: Is `OP_NODE_L1_RPC_KIND` set correctly for your L1 provider? | ||
| - **Check**: (Reth) Are `RETH_CHAIN` and `RETH_SEQUENCER_HTTP` correctly set in the `.env` file? | ||
| - **Issue**: Errors related to JWT secret or authentication between `op-node` and L2 client. | ||
| - **Check**: Ensure you haven't manually modified the `OP_NODE_L2_ENGINE_AUTH` variable or the JWT file path (`$OP_NODE_L2_ENGINE_AUTH`) unless you know what you're doing. The `docker-compose` setup usually handles this automatically. | ||
| - **Issue**: Permission errors related to data volumes (`./geth-data`, `./reth-data`). | ||
| - **Check**: Ensure the user running `docker compose` has write permissions to the directory where the `node` repository was cloned. Docker needs to be able to write to `./geth-data` or `./reth-data`. Sometimes running Docker commands with `sudo` can cause permission issues later; try running as a non-root user added to the `docker` group. | ||
|
|
||
| ### Syncing Problems | ||
|
|
||
| - **Issue**: Node doesn't start syncing or appears stuck (block height not increasing). | ||
| - **Check**: `op-node` logs. Look for errors connecting to L1 endpoints or the L2 client. | ||
| - **Check**: L2 client (`op-geth`/`op-reth`) logs. Look for errors connecting to `op-node` via the Engine API (port `8551`) or P2P issues. | ||
| - **Check**: L1 node health and sync status. Is the L1 node accessible and fully synced? | ||
| - **Check**: System time. Ensure the server's clock is accurately synchronized (use `ntp` or `chrony`). Significant time drift can cause P2P issues. | ||
| - **Issue**: Syncing is extremely slow. | ||
| - **Check**: Hardware specifications. Are you meeting the recommended specs (especially RAM and **NVMe SSD**) outlined in the [Node Performance](/chain/node-performance) guide? Disk I/O is often the bottleneck. | ||
| - **Check**: L1 node performance. Is your L1 RPC endpoint responsive? A slow L1 node will slow down L2 sync. | ||
| - **Check**: Network connection quality and bandwidth. | ||
| - **Check**: `op-node` and L2 client logs for any performance warnings or errors. | ||
| - **Issue**: `optimism_syncStatus` (port `7545` on `op-node`) shows a large time difference or errors. | ||
| - **Action**: Check the logs for both `op-node` and the L2 client (`op-geth`/`op-reth`) around the time the status was checked to identify the root cause (e.g., L1 connection issues, L2 client issues). | ||
| - **Issue**: `Error: nonce has already been used` when trying to send transactions. | ||
| - **Cause**: The node is not yet fully synced to the head of the chain. | ||
| - **Action**: Wait for the node to fully sync. Monitor progress using `optimism_syncStatus` or logs. | ||
|
|
||
| ### Performance Issues | ||
|
|
||
| - **Issue**: High CPU, RAM, or Disk I/O usage. | ||
| - **Check**: Hardware specifications against recommendations in [Node Performance](/chain/node-performance). Upgrade if necessary. Local NVMe SSDs are critical. | ||
| - **Check**: (Geth) Review Geth cache settings and LevelDB tuning options mentioned in [Node Performance](/chain/node-performance#geth-performance-tuning) and [Advanced Configuration](/chain/run-a-base-node#geth-configuration-via-environment-variables). | ||
| - **Check**: Review client logs for specific errors or bottlenecks. | ||
| - **Action**: Consider using Reth if running Geth, as it's generally more performant for Base. | ||
|
|
||
| ### Snapshot Restoration Problems | ||
|
|
||
| Refer to the [Snapshots](/chain/node-snapshots) guide for the correct procedure. | ||
|
|
||
| - **Issue**: `wget` command fails or snapshot download is corrupted. | ||
| - **Check**: Network connectivity. | ||
| - **Check**: Available disk space. | ||
| - **Action**: Retry the download. Verify the download URL is correct. | ||
| - **Issue**: `tar` extraction fails. | ||
| - **Check**: Downloaded file integrity (is it corrupted?). | ||
| - **Check**: Available disk space (extraction requires much more space than the download). | ||
| - **Check**: `tar` command syntax. | ||
| - **Issue**: Node fails to start after restoring snapshot; logs show database errors or missing files. | ||
| - **Check**: Did you stop the node (`docker compose down`) _before_ modifying the data directory? | ||
| - **Check**: Did you remove the _contents_ of the old data directory (`./geth-data/*` or `./reth-data/*`) before extracting/moving the snapshot data? | ||
| - **Check**: Was the snapshot data moved correctly? The chain data needs to be directly inside `./geth-data` or `./reth-data`, not in a nested subfolder (e.g., `./geth-data/geth/...`). Verify the folder structure. | ||
| - **Issue**: Ran out of disk space during download or extraction. | ||
| - **Action**: Free up disk space or provision a larger volume. Remember the storage formula: `(2 * chain_size + snapshot_size + 20% buffer)`. | ||
|
|
||
| ### Networking / Connectivity Issues | ||
|
|
||
| - **Issue**: RPC/WS connection refused (e.g., `curl` to `localhost:8545` fails). | ||
| - **Check**: Is the L2 client container (`op-geth`/`op-reth`) running (`docker compose ps`)? | ||
| - **Check**: Are you using the correct port (`8545` for HTTP, `8546` for WS by default)? | ||
| - **Check**: L2 client logs. Did it fail to start the RPC server? | ||
| - **Check**: Are the `--http.addr` and `--ws.addr` flags set to `0.0.0.0` in the client config/entrypoint to allow external connections (within the Docker network)? | ||
| - **Issue**: Node has low peer count. | ||
| - **Check**: P2P port (default `30303`) accessibility. Is it blocked by a firewall on the host or network? | ||
| - **Check**: Node logs for P2P errors. | ||
| - **Action**: If behind NAT, configure the `--nat=extip:<your-ip>` flag via `ADDITIONAL_ARGS` in the `.env` file (see [Advanced Configuration](/chain/run-a-base-node#improving-peer-connectivity)). | ||
| - **Issue**: Port conflicts reported in logs or `docker compose up` fails. | ||
| - **Check**: Are other services running on the host using the default ports (`8545`, `8546`, `8551`, `6060`, `7545`, `30303`)? Use `sudo lsof -i -P -n | grep LISTEN` or `sudo netstat -tulpn | grep LISTEN`. | ||
| - **Action**: Stop the conflicting service or change the ports used by the Base node containers by modifying the `ports` section in `docker-compose.yml` and updating the relevant environment variables (`$RPC_PORT`, `$WS_PORT`, etc.) in the `.env` file if necessary. | ||
|
|
||
| ## Getting Further Help | ||
|
|
||
| If you've followed this guide and are still encountering issues, seek help from the community: | ||
|
|
||
| - **Discord**: Join the [Base Discord](https://discord.gg/buildonbase) and post in the `🛠|node-operators` channel, providing details about your setup, the issue, and relevant logs. | ||
| - **GitHub**: Check the [Base Node repository issues](https://github.com/base-org/node/issues) or open a new one if you suspect a bug. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see this being used, is this necessary?