Implement pure OpenCL batch hashing. #78

porcuquine · 2021-01-16T00:42:38Z

This PR implements GPU batch hashing in pure OpenCL, implemented in the new proteus module. (Proteus and Triton are both moons of Neptune, hence the naming.)

This work is intended to introduce no change of behavior when the gpu feature flag is provided. If instead, the opencl feature flag is provided, a new BatcherType, BatcherType::OpenCL can be used instead of BatcherType::GPU.

This implementation provides the following benefits when compared to the extant neptune-triton GPU implementation:

Better perfomance (almost 2x, see below).
Fewer external dependencies (removes dependence on elaborate Futhark code-generation and toolchain).
Much less total code.
Much lower GPU memory usage.
No known problem with multiple batch hashers being used at once. (vs. an outstanding bug in current neptune-triton code path).

Once the opencl feature has been tested and stabilized, it should be made the default, for all of these reasons.

Historical context: although replacing neptune-triton is an obvious next step now, its replacement benefits from the design which went into the Rust interface to neptune-triton, to the development of rust-gpu-tools and cl-ff-gen (neither of which existed at the time of the initial GPU implementation), and from the significant learning which went into neptune-triton's development. Although the current result is simpler, the path to it was not obvious from the outset.

Speedup is ~2x on column tree building. See gbench output using the same 2080Ti for both methods.

gbench with gpu feature:

RUST_LOG=info cargo run --release --features gpu,blst --no-default-features
    Finished release [optimized] target(s) in 0.07s
     Running `/home/porcuquine/dev/neptune/target/release/gbench`
[2021-01-16T00:07:05Z INFO  gbench] KiB: 4194304
[2021-01-16T00:07:05Z INFO  gbench] leaves: 134217728
[2021-01-16T00:07:05Z INFO  gbench] max column batch size: 400000
[2021-01-16T00:07:05Z INFO  gbench] max tree batch size: 700000
[2021-01-16T00:07:05Z INFO  gbench] GPU[Selector: BatcherType::GPU] --> Run 0
[2021-01-16T00:07:05Z INFO  gbench] GPU[Selector: BatcherType::GPU]: Creating ColumnTreeBuilder
[2021-01-16T00:07:05Z INFO  neptune::triton::cl] getting default futhark context
[2021-01-16T00:07:05Z INFO  neptune::triton::cl] getting context for ~Index(0)
[2021-01-16T00:07:06Z INFO  neptune::triton::cl] device: Device { brand: Nvidia, name: "GeForce RTX 2080 Ti", memory: 11551440896, bus_id: Some(33), platform: Platform(PlatformId(0x7f1f000b5590)), device: Device(DeviceId(0x7f1f000b5ae0)) }
[2021-01-16T00:07:11Z INFO  gbench] GPU[Selector: BatcherType::GPU]: ColumnTreeBuilder created
[2021-01-16T00:07:11Z INFO  gbench] GPU[Selector: BatcherType::GPU]: Using effective batch size 400000 to build columns
[2021-01-16T00:07:11Z INFO  gbench] GPU[Selector: BatcherType::GPU]: adding column batches
[2021-01-16T00:07:11Z INFO  gbench] GPU[Selector: BatcherType::GPU]: start commitment
...............................................................................................................................................................................................................................................................................................................................................
[2021-01-16T00:09:15Z INFO  gbench] GPU[Selector: BatcherType::GPU]: adding final column batch and building tree
[2021-01-16T00:09:31Z INFO  gbench] GPU[Selector: BatcherType::GPU]: end commitment
[2021-01-16T00:09:31Z INFO  gbench] GPU[Selector: BatcherType::GPU]: commitment time: 139.632183641s

gbench with opencl feature:

RUST_LOG=info cargo run --release --features opencl,blst --no-default-features
    Finished release [optimized] target(s) in 0.06s
     Running `/home/porcuquine/dev/neptune/target/release/gbench`
[2021-01-16T00:19:55Z INFO  gbench] KiB: 4194304
[2021-01-16T00:19:55Z INFO  gbench] leaves: 134217728
[2021-01-16T00:19:55Z INFO  gbench] max column batch size: 400000
[2021-01-16T00:19:55Z INFO  gbench] max tree batch size: 700000
[2021-01-16T00:19:55Z INFO  gbench] GPU[Selector: BatcherType::OpenCL] --> Run 0
[2021-01-16T00:19:55Z INFO  gbench] GPU[Selector: BatcherType::OpenCL]: Creating ColumnTreeBuilder
[2021-01-16T00:19:56Z INFO  neptune::proteus::gpu] device: Device { brand: Nvidia, name: "GeForce RTX 2080 Ti", memory: 11551440896, bus_id: Some(33), platform: Platform(PlatformId(0x7f7d240b5510)), device: Device(DeviceId(0x7f7d240b5a60)) }
[2021-01-16T00:19:58Z INFO  gbench] GPU[Selector: BatcherType::OpenCL]: ColumnTreeBuilder created
[2021-01-16T00:19:58Z INFO  gbench] GPU[Selector: BatcherType::OpenCL]: Using effective batch size 400000 to build columns
[2021-01-16T00:19:58Z INFO  gbench] GPU[Selector: BatcherType::OpenCL]: adding column batches
[2021-01-16T00:19:58Z INFO  gbench] GPU[Selector: BatcherType::OpenCL]: start commitment
...............................................................................................................................................................................................................................................................................................................................................
[2021-01-16T00:21:04Z INFO  gbench] GPU[Selector: BatcherType::OpenCL]: adding final column batch and building tree
[2021-01-16T00:21:14Z INFO  gbench] GPU[Selector: BatcherType::OpenCL]: end commitment
[2021-01-16T00:21:14Z INFO  gbench] GPU[Selector: BatcherType::OpenCL]: commitment time: 75.21889048s

dignifiedquire · 2021-01-16T16:21:34Z

Benchmarks on RTX3090

GPU (current): 97.686103731s
OpenCL (this PR): 52.741545444s

dignifiedquire

two small notes, nice work

dignifiedquire · 2021-01-20T22:38:22Z

CHANGELOG.md

@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://book.async.rs/overview

 ## Unreleased

+## 2.4.1 - 2021-1-15


shouldn’t this go under unreleased technically?

I'll change it. I was originally hoping to just release this version immediately after.

dignifiedquire · 2021-01-20T22:40:19Z

src/batch_hasher.rs

    GPU(GPUBatchHasher<A>),
+    #[cfg(not(feature = "gpu"))]


shouldn’t this be not any gpu or opencl?

Hmmm... I think that whole variant can be removed now. My latest does that and seems to build fine on macos now.

porcuquine requested a review from dignifiedquire January 16, 2021 00:45

porcuquine force-pushed the feat/opencl branch 8 times, most recently from 93a8f10 to 2d6acaa Compare January 16, 2021 04:36

porcuquine force-pushed the feat/opencl branch 2 times, most recently from f39156c to 678b26c Compare January 19, 2021 17:48

dignifiedquire approved these changes Jan 20, 2021

View reviewed changes

Implement pure OpenCL batch hashing.

569db95

porcuquine force-pushed the feat/opencl branch from 79c97b9 to 569db95 Compare January 20, 2021 23:46

porcuquine merged commit 7f90f5b into master Jan 20, 2021

porcuquine deleted the feat/opencl branch January 20, 2021 23:56

porcuquine mentioned this pull request Feb 25, 2021

Optimize opencl and make it default gpu feature. filecoin-project/rust-fil-proofs#1420

Closed

porcuquine mentioned this pull request Jun 30, 2021

Benchmark tracking #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement pure OpenCL batch hashing. #78

Implement pure OpenCL batch hashing. #78

porcuquine commented Jan 16, 2021 •

edited by dignifiedquire

Loading

dignifiedquire commented Jan 16, 2021

dignifiedquire left a comment

dignifiedquire Jan 20, 2021

porcuquine Jan 20, 2021

dignifiedquire Jan 20, 2021

porcuquine Jan 20, 2021

		@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://book.async.rs/overview

		## Unreleased

		## 2.4.1 - 2021-1-15

Implement pure OpenCL batch hashing. #78

Implement pure OpenCL batch hashing. #78

Conversation

porcuquine commented Jan 16, 2021 • edited by dignifiedquire Loading

dignifiedquire commented Jan 16, 2021

dignifiedquire left a comment

Choose a reason for hiding this comment

dignifiedquire Jan 20, 2021

Choose a reason for hiding this comment

porcuquine Jan 20, 2021

Choose a reason for hiding this comment

dignifiedquire Jan 20, 2021

Choose a reason for hiding this comment

porcuquine Jan 20, 2021

Choose a reason for hiding this comment

porcuquine commented Jan 16, 2021 •

edited by dignifiedquire

Loading