-
Notifications
You must be signed in to change notification settings - Fork 373
Open
Labels
Milestone
Description
Not sure if this is a feature request or a bug report. It is unclear to me what operations pertaining to DataFrames, if any, are thread-safe. For example, I would have expected the below code to be thread-safe since each thread is operating on different parts of the memory, yet it explodes in a cloud of corruption:
using DataFrames
#WARNING: DO NOT RUN THIS
function tsmwecorrupt(N=100_000)
df = DataFrame(rand(N,100))
df.grpcol = (i->i%50).(1:N)
Threads.@threads for sdf ∈ groupby(df, :grpcol)
sdf.x3 .= -1.
end
println(sum(df.x3))
end
tsmwecorrupt()
OUTPUT:
(many pages of garbage)On the other hand, this code seems fine:
function tsmwe(N=100_000)
df = DataFrame(rand(N,100))
df.grpcol = (i->i%50).(1:N)
Threads.@threads for r ∈ eachrow(df)
r.x3 = -1.
end
println(sum(df.x3))
end
tsmwe()
OUTPUT:
-100000.0and so does this:
function tsmwe2(N=100_000)
df = DataFrame(rand(N,100))
df.grpcol = (i->i%50).(1:N)
Threads.@threads for sdf ∈ collect(groupby(df, :grpcol))
sdf.x3 .= -1.
end
println(sum(df.x3))
end
tsmwe2()
OUTPUT:
-100000.0If this is not a bug, then perhaps this issue can serve as a feature request for thread safety under a wider variety of use cases.
EDIT: Also see 1896