Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1392: [C++] Add GPU IO interfaces for CUDA #985

Closed
wants to merge 6 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Aug 22, 2017

This makes it easy to write from host to device and read from device to host. We also need a zero-copy device reader for IPC purposes (where we don't want to move any data to the host), can do that in a subsequent patch.

Change-Id: I3ada86d70d3bf93d1e5dd02a17bbaa5f2faebc34
Change-Id: I205b2861797c57756a38f4c695937d354d497b53
@wesm
Copy link
Member Author

wesm commented Aug 22, 2017

I'm going to start a quick cuda-benchmark since I'm curious if buffering writes affects performance at all

@wesm
Copy link
Member Author

wesm commented Aug 22, 2017

cc @seibert @sklam

@wesm
Copy link
Member Author

wesm commented Aug 22, 2017

Write buffering is important when the size of individual writes are small; this will be important to enable when writing to GPU memory on IPC hot paths.

In this benchmark, a total of 128MB is written to a GPU buffer using a chunk size varying from 256 bytes to 64K. In the Buffered case, an 8MB host buffer created with cudaMallocHost is used

$ ./release/cuda-benchmark 
Run on (8 X 4200 MHz CPU s)
2017-08-22 15:33:53
Benchmark                                                  Time           CPU Iterations
----------------------------------------------------------------------------------------
BM_Writer_Buffered/256/min_time:1.000/real_time     24026339 ns   24027197 ns         58   5.20262GB/s
BM_Writer_Buffered/4k/min_time:1.000/real_time      23202300 ns   23202817 ns         60    5.3874GB/s
BM_Writer_Buffered/64k/min_time:1.000/real_time     23167510 ns   23168351 ns         60   5.39549GB/s
BM_Writer_Unbuffered/256/min_time:1.000/real_time 1691322141 ns 1691377175 ns          1   75.6804MB/s
BM_Writer_Unbuffered/4k/min_time:1.000/real_time   125177161 ns  125181662 ns         11   1022.55MB/s
BM_Writer_Unbuffered/64k/min_time:1.000/real_time   26685025 ns   26685828 ns         54   4.68428GB/s

Change-Id: Idd283920e38682a85f42621cc5123ad08837cd28
Change-Id: I7acd582f4f4ea3dba28ecd4ddb2d19c0101d2409
@wesm
Copy link
Member Author

wesm commented Aug 22, 2017

I still need to add reader tests. On contemplation I think this should be zero-copy and return device pointers; that would be much more useful at the moment.

@wesm
Copy link
Member Author

wesm commented Aug 23, 2017

cc @m1mc, I can copy you on GPU related PRs if you want to help review

…it tests

Change-Id: I84b55e37a26989f4e99dd9d4fa5ae7ed5a4ad484
@wesm
Copy link
Member Author

wesm commented Aug 23, 2017

OK, I will merge this on a green build. These interfaces will see hardening when used on the IPC code paths. The idea is that arrow::gpu::CudaBufferReader can be used for zero-copy IPC reads, and the reconstructed record batches will contain device pointers internally. The user can unbox these pointers into whatever data structure they want

Change-Id: Id0c05025d2c5a72c0461a63b91d2f2bbb53f030e
@wesm
Copy link
Member Author

wesm commented Aug 23, 2017

@xhochy I added const to the Tell method on files. This probably should have been done in the first place. The impact of this on downstream users is nil, but if any third parties have created their own implementations they will have to add const. This seems unlikely, so I don't think it's needed to go through a deprecation cycle with this

@wesm
Copy link
Member Author

wesm commented Aug 23, 2017

+1. One of the Travis CI entries had an unrelated transient failure

@asfgit asfgit closed this in 2c3a5f4 Aug 23, 2017
@wesm wesm deleted the ARROW-1392 branch August 23, 2017 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant