-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some test failures with A770 #302
Comments
It looks at the thread count, but since that is generally set to 1 in #304 I've changed it to look at the number of CPU threads instead. |
Small update, I tried the lastest code from the master branch, I can confirm that the test suite uses all the CPU threads and that I still get the same tests not passing. |
I can confirm the failures. One was only introduced recently, in the test from JuliaGPU/GPUArrays.jl#459. They should be reduced to MWEs we can debug further, but I don't have the time for that right now. |
Isolated the |
MWE for the other issue:
Reduced to: using oneAPI
# complete reduction values by a group, using local memory for communication
@inline function partialsum_group(val::T, neutral) where {T}
items = get_local_size(0)
item = get_local_id(0)
# local mem for a complete reduction
shared = oneLocalArray(T, (1024,))
@inbounds shared[item] = val
# perform a reduction
d = 1
while d < items
barrier()
index = 2 * d * (item-1) + 1
@inbounds if index <= items
other_val = if index + d <= items
shared[index+d]
else
neutral
end
shared[index] = shared[index] + other_val
end
d *= 2
end
# load the final value on the first item
if item == 1
val = @inbounds shared[item]
end
return val
end
# partial reduction of the input vector, using a grid-stride loop
function partialsum(elements, reduced, input)
localIdx = get_local_id(0)
localDim = get_local_size(0)
groupIdx = get_group_id(0)
groupDim = get_num_groups(0)
@inbounds begin
# load the neutral value
#
# for this MWE, the value is always 0, but hard-coding it makes the bug occur
# less often.
neutral = unsafe_load(reduced, groupIdx)
val = neutral + neutral
# reduce serially across chunks of the input vector.
#
# for this MWE, we only execute the loop body once (as we allocate exactly
# items * groups elements), but removing the loop makes the bug happen less often.
ireduce = localIdx + (groupIdx - 1) * localDim
while ireduce <= elements
val = val + unsafe_load(input, ireduce)
ireduce += localDim * groupDim
end
# reduce all values within the group
val = partialsum_group(val, neutral)
# write back to memory
if localIdx == 1
unsafe_store!(reduced, val, groupIdx)
end
end
return
end
function reduced(items=800, groups=100)
elements = Int32(items * groups)
input = oneAPI.ones(Int32, elements)
reduced = oneAPI.zeros(Int32, groups)
@oneapi items=items groups=groups partialsum(elements, pointer(reduced), pointer(input))
reduced
end
function main()
for x in 1:10
x = reduced()
println(x)
all(isequal(800), Array(x)) || return false
end
return true
end Filed upstream as intel/compute-runtime#636 |
These should be fixed. |
Running the oneAPI.jl release v1.0.2 with Julia 1.8.5 on Fedora 37 server (kernel 6.2.8), I get a couple of test failures when running
pkg> test oneAPI
like in the README. The CPU is an Intel Xeon E5-2695 V3 with 128gb of ram and the GPU is an Intel Arc A770 16gb. Also, can the test suite use more than 1 core?The text was updated successfully, but these errors were encountered: