-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1392: [C++] Add GPU IO interfaces for CUDA #985
Conversation
Change-Id: I3ada86d70d3bf93d1e5dd02a17bbaa5f2faebc34
Change-Id: I205b2861797c57756a38f4c695937d354d497b53
I'm going to start a quick |
Write buffering is important when the size of individual writes are small; this will be important to enable when writing to GPU memory on IPC hot paths. In this benchmark, a total of 128MB is written to a GPU buffer using a chunk size varying from 256 bytes to 64K. In the Buffered case, an 8MB host buffer created with $ ./release/cuda-benchmark
Run on (8 X 4200 MHz CPU s)
2017-08-22 15:33:53
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------
BM_Writer_Buffered/256/min_time:1.000/real_time 24026339 ns 24027197 ns 58 5.20262GB/s
BM_Writer_Buffered/4k/min_time:1.000/real_time 23202300 ns 23202817 ns 60 5.3874GB/s
BM_Writer_Buffered/64k/min_time:1.000/real_time 23167510 ns 23168351 ns 60 5.39549GB/s
BM_Writer_Unbuffered/256/min_time:1.000/real_time 1691322141 ns 1691377175 ns 1 75.6804MB/s
BM_Writer_Unbuffered/4k/min_time:1.000/real_time 125177161 ns 125181662 ns 11 1022.55MB/s
BM_Writer_Unbuffered/64k/min_time:1.000/real_time 26685025 ns 26685828 ns 54 4.68428GB/s |
Change-Id: Idd283920e38682a85f42621cc5123ad08837cd28
Change-Id: I7acd582f4f4ea3dba28ecd4ddb2d19c0101d2409
I still need to add reader tests. On contemplation I think this should be zero-copy and return device pointers; that would be much more useful at the moment. |
cc @m1mc, I can copy you on GPU related PRs if you want to help review |
…it tests Change-Id: I84b55e37a26989f4e99dd9d4fa5ae7ed5a4ad484
OK, I will merge this on a green build. These interfaces will see hardening when used on the IPC code paths. The idea is that |
Change-Id: Id0c05025d2c5a72c0461a63b91d2f2bbb53f030e
@xhochy I added const to the |
+1. One of the Travis CI entries had an unrelated transient failure |
This makes it easy to write from host to device and read from device to host. We also need a zero-copy device reader for IPC purposes (where we don't want to move any data to the host), can do that in a subsequent patch.