-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform for multicore CPUs #626
Conversation
"}\n"; | ||
|
||
m_count = detail::iterator_range_size(first, last); | ||
#ifndef __APPLE__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder if it would be better to use run-time dispatching (something like device.platform().name() == "Apple"
) rather than compile-time dispatching (probably unlikely, but something that is compiled on Apple host may be executed on a non-Apple compute device or vice-versa)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, could you add a comment explaining why we have this special case for Apple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device.platform().name() == "Apple"
I can do this. I was just worried about using those kind of conditions in code. __APPLE__
seemed more reliable.
Either way, could you add a comment explaining why we have this special case for Apple?
Sure. Apple OpenCL platform (at least its compiler for the CPU) has some kind of bug that makes some kernels impossible to compile. I've figured out that conditions for the bug to show itself are: loop with condition with two variables (e.g. comparing two variables index < last_index
; if you have index < 10000
everything is fine) + in this loop you have to write constant or result of function into a buffer (if you're copying value from one buffer to another, e.g. buf1[idx] = buf2[idx];
, bug does not appear). And that's why some tests work and some don't.
Examples: _buf0[i]=42;, _buf0[i]=ret42();.
The same bug is a problem for vexcl - ddemidov/vexcl#92.
I wonder if there is a way to mock |
f58352f
to
2ce959a
Compare
Yet another bug on Apple OpenCL Platform.
2ce959a
to
a10e7d3
Compare
This improves
transform()
(copy_on_device()
) performance for multicore CPUs.This update is turned off for Apple OpenCL Platform as its compiler for CPU does not work correctly and can not compile kernel in
copy_on_device_cpu()
- see https://travis-ci.org/boostorg/compute/jobs/142560405. In other words, for Apple we keep the old performance.