ARROW-1392: [C++] Add GPU IO interfaces for CUDA #985

wesm · 2017-08-22T19:04:52Z

This makes it easy to write from host to device and read from device to host. We also need a zero-copy device reader for IPC purposes (where we don't want to move any data to the host), can do that in a subsequent patch.

Change-Id: I3ada86d70d3bf93d1e5dd02a17bbaa5f2faebc34

Change-Id: I205b2861797c57756a38f4c695937d354d497b53

wesm · 2017-08-22T19:05:40Z

I'm going to start a quick cuda-benchmark since I'm curious if buffering writes affects performance at all

wesm · 2017-08-22T19:11:57Z

cc @seibert @sklam

wesm · 2017-08-22T19:38:11Z

Write buffering is important when the size of individual writes are small; this will be important to enable when writing to GPU memory on IPC hot paths.

In this benchmark, a total of 128MB is written to a GPU buffer using a chunk size varying from 256 bytes to 64K. In the Buffered case, an 8MB host buffer created with cudaMallocHost is used

$ ./release/cuda-benchmark 
Run on (8 X 4200 MHz CPU s)
2017-08-22 15:33:53
Benchmark                                                  Time           CPU Iterations
----------------------------------------------------------------------------------------
BM_Writer_Buffered/256/min_time:1.000/real_time     24026339 ns   24027197 ns         58   5.20262GB/s
BM_Writer_Buffered/4k/min_time:1.000/real_time      23202300 ns   23202817 ns         60    5.3874GB/s
BM_Writer_Buffered/64k/min_time:1.000/real_time     23167510 ns   23168351 ns         60   5.39549GB/s
BM_Writer_Unbuffered/256/min_time:1.000/real_time 1691322141 ns 1691377175 ns          1   75.6804MB/s
BM_Writer_Unbuffered/4k/min_time:1.000/real_time   125177161 ns  125181662 ns         11   1022.55MB/s
BM_Writer_Unbuffered/64k/min_time:1.000/real_time   26685025 ns   26685828 ns         54   4.68428GB/s

Change-Id: Idd283920e38682a85f42621cc5123ad08837cd28

Change-Id: I7acd582f4f4ea3dba28ecd4ddb2d19c0101d2409

wesm · 2017-08-22T21:52:44Z

I still need to add reader tests. On contemplation I think this should be zero-copy and return device pointers; that would be much more useful at the moment.

wesm · 2017-08-23T01:33:35Z

cc @m1mc, I can copy you on GPU related PRs if you want to help review

…it tests Change-Id: I84b55e37a26989f4e99dd9d4fa5ae7ed5a4ad484

wesm · 2017-08-23T02:44:01Z

OK, I will merge this on a green build. These interfaces will see hardening when used on the IPC code paths. The idea is that arrow::gpu::CudaBufferReader can be used for zero-copy IPC reads, and the reconstructed record batches will contain device pointers internally. The user can unbox these pointers into whatever data structure they want

Change-Id: Id0c05025d2c5a72c0461a63b91d2f2bbb53f030e

wesm · 2017-08-23T04:51:45Z

@xhochy I added const to the Tell method on files. This probably should have been done in the first place. The impact of this on downstream users is nil, but if any third parties have created their own implementations they will have to add const. This seems unlikely, so I don't think it's needed to go through a deprecation cycle with this

wesm · 2017-08-23T13:41:48Z

+1. One of the Travis CI entries had an unrelated transient failure

wesm added 2 commits August 22, 2017 14:44

Implement IO interfaces for CUDA buffers

a2708f2

Change-Id: I3ada86d70d3bf93d1e5dd02a17bbaa5f2faebc34

Test CudaBuffer::CopyFromHost

1cf1196

Change-Id: I205b2861797c57756a38f4c695937d354d497b53

wesm added 2 commits August 22, 2017 15:39

Add cuda-benchmark module

5daa59e

Change-Id: Idd283920e38682a85f42621cc5123ad08837cd28

Make Tell overrides in arrow-glib const

415157a

Change-Id: I7acd582f4f4ea3dba28ecd4ddb2d19c0101d2409

Refactor CudaBufferReader to return zero-copy device pointers. Add un…

229a268

…it tests Change-Id: I84b55e37a26989f4e99dd9d4fa5ae7ed5a4ad484

Add section to C++ README about building libarrow_gpu

ae24cb5

Change-Id: Id0c05025d2c5a72c0461a63b91d2f2bbb53f030e

asfgit closed this in 2c3a5f4 Aug 23, 2017

wesm deleted the ARROW-1392 branch August 23, 2017 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-1392: [C++] Add GPU IO interfaces for CUDA #985

ARROW-1392: [C++] Add GPU IO interfaces for CUDA #985

wesm commented Aug 22, 2017

wesm commented Aug 22, 2017

wesm commented Aug 22, 2017

wesm commented Aug 22, 2017 •

edited

Loading

wesm commented Aug 22, 2017

wesm commented Aug 23, 2017

wesm commented Aug 23, 2017 •

edited

Loading

wesm commented Aug 23, 2017

wesm commented Aug 23, 2017

ARROW-1392: [C++] Add GPU IO interfaces for CUDA #985

ARROW-1392: [C++] Add GPU IO interfaces for CUDA #985

Conversation

wesm commented Aug 22, 2017

wesm commented Aug 22, 2017

wesm commented Aug 22, 2017

wesm commented Aug 22, 2017 • edited Loading

wesm commented Aug 22, 2017

wesm commented Aug 23, 2017

wesm commented Aug 23, 2017 • edited Loading

wesm commented Aug 23, 2017

wesm commented Aug 23, 2017

wesm commented Aug 22, 2017 •

edited

Loading

wesm commented Aug 23, 2017 •

edited

Loading