Confirmed on quantus-miner v3.0.1 with an NVIDIA RTX 50-series GPU environment on Ubuntu Linux.
Environment
- quantus-miner: v3.0.1
- quantus-node: v0.6.3
- quantus-cli: v1.3.3
- GPU generation: NVIDIA RTX 50-series (Blackwell)
- Driver: NVIDIA 590.48.01
- Backend: Vulkan
- OS: Ubuntu 24.04 LTS
- Node state: fully synced, peers=12, isSyncing=false
- Miner mode: external miner via QUIC to 127.0.0.1:9833
What I observed
- The miner connects successfully to the node.
- Mining jobs are received and dispatched to GPU workers.
- The GPU is detected correctly.
- However, actual GPU performance is extremely low compared with expectations.
- Prometheus metrics only show:
- miner_active_jobs 1
- miner_cpu_workers 0
- miner_effective_cpus 12
- miner_gpu_devices
- miner_workers
- Metrics such as miner_gpu_hash_rate, miner_hash_rate, and miner_hashes_total do not appear.
Node side looks healthy
- chain_height increases normally
- isSyncing=false
- peers=12
- block_time_ema remains in a healthy range
Relevant miner logs
- QUIC connection established to 127.0.0.1:9833
- Connected to node
- Waiting for mining jobs
- Received job
- Job dispatched to GPU workers
- Worker thread assigned to GPU device(s)
- GPU dispatch config shows:
- 3276 workgroups × 256 threads
Benchmark result
Running the built-in GPU benchmark shows unexpectedly low throughput for an RTX 50-series environment.
Important output:
- Max hardware workgroups: 65535
- Optimal workgroups: 3276
- Throughput is far below the expected GPU mining range mentioned in the official guide
Why this seems related to Issue #52
Issue #52 explains that RTX 50-series cards may fall through to the generic fallback dispatch path because there is no explicit "rtx 50" match. My benchmark and runtime behavior are consistent with that description.
The reported value:
- Max hardware workgroups: 65535
- Optimal workgroups: 3276
matches the generic fallback calculation:
65535 / 20 = 3276
Expected behavior
According to the official mining guide, GPU mining is expected to be roughly 500-1000 MH/s. The observed performance in this RTX 50-series environment is dramatically below that range.
Impact
- Mining appears connected and active, but effective GPU throughput is severely reduced.
- This can be hard to diagnose because the miner still receives jobs and stays connected.
- Prometheus metrics do not clearly expose effective GPU hashrate in this situation.
Suggested fix
Please add an explicit "rtx 50" branch before the existing "rtx 40" branch, or replace the current substring matching with a more future-proof GPU generation parser.
Possible additional improvements
- Emit a warning when an unknown RTX generation falls back to the generic dispatch path
- Export clearer Prometheus metrics for actual GPU hashrate and total hashes
- Add a log/metric when jobs are dispatched but effective GPU throughput is unusually low
If helpful, I can provide additional benchmark/log excerpts without disclosing exact hardware model details.
Confirmed on quantus-miner v3.0.1 with an NVIDIA RTX 50-series GPU environment on Ubuntu Linux.
Environment
What I observed
Node side looks healthy
Relevant miner logs
Benchmark result
Running the built-in GPU benchmark shows unexpectedly low throughput for an RTX 50-series environment.
Important output:
Why this seems related to Issue #52
Issue #52 explains that RTX 50-series cards may fall through to the generic fallback dispatch path because there is no explicit "rtx 50" match. My benchmark and runtime behavior are consistent with that description.
The reported value:
matches the generic fallback calculation:
65535 / 20 = 3276
Expected behavior
According to the official mining guide, GPU mining is expected to be roughly 500-1000 MH/s. The observed performance in this RTX 50-series environment is dramatically below that range.
Impact
Suggested fix
Please add an explicit "rtx 50" branch before the existing "rtx 40" branch, or replace the current substring matching with a more future-proof GPU generation parser.
Possible additional improvements
If helpful, I can provide additional benchmark/log excerpts without disclosing exact hardware model details.