-
Notifications
You must be signed in to change notification settings - Fork 194
Closed
Description
The sample! function returns incorrectly biased samples when the destination array is the same as the array being sampled.
There are sampling methods that can work correctly in-place such as the Fischer–Yates shuffle and StatsBase implements this algorithm for sampling as StatsBase.fisher_yates_sample.
The Julia implementation, however, does the wrong thing when the destination array is the same as the source array:
julia> using StatsBase
julia> let counts = zeros(5)
for _ in 1:10^5
let a = [1, 2, 3, 4, 5]
for samp in StatsBase.fisher_yates_sample!(a, a)
counts[samp] += 1
end
end
end
counts
end
5-element Array{Float64,1}:
19909.0
66047.0
104055.0
138017.0
171972.0This seems to be the case for sample! more generally, regardless of whether sampling is done with or without replacement.
Without replacement:
julia> let counts = zeros(5)
for _ in 1:10^5
let a = [1, 2, 3, 4, 5]
# the destination is a copy of a
for samp in StatsBase.sample!(copy(a), a, replace=false)
counts[samp] += 1
end
end
end
counts
end
5-element Array{Float64,1}:
100000.0
100000.0
100000.0
100000.0
100000.0
julia> let counts = zeros(5)
for _ in 1:10^5
let a = [1, 2, 3, 4, 5]
# the destination is a
for samp in StatsBase.sample!(a, a, replace=false)
counts[samp] += 1
end
end
end
counts
end
5-element Array{Float64,1}:
20058.0
65809.0
104801.0
137268.0
172064.0With replacement:
julia> let counts = zeros(5)
for _ in 1:10^5
let a = [1, 2, 3, 4, 5]
# the destination is copy of a
for samp in StatsBase.sample!(copy(a), a, replace=true)
counts[samp] += 1
end
end
end
counts
end
5-element Array{Float64,1}:
99871.0
100068.0
100124.0
100303.0
99634.0
julia> let counts = zeros(5)
for _ in 1:10^5
let a = [1, 2, 3, 4, 5]
# the destination is a
for samp in StatsBase.sample!(a, a, replace=true)
counts[samp] += 1
end
end
end
counts
end
5-element Array{Float64,1}:
41455.0
76065.0
104968.0
128636.0
148876.0With ordering and replacement:
julia> let counts = zeros(5)
for _ in 1:10^5
let a = [1, 2, 3, 4, 5]
# the destination is copy of a
for samp in StatsBase.sample!(copy(a), a, ordered=true, replace=true)
counts[samp] += 1
end
end
end
counts
end
5-element Array{Float64,1}:
99974.0
99499.0
99764.0
100527.0
100236.0
julia> let counts = zeros(5)
for _ in 1:10^5
let a = [1, 2, 3, 4, 5]
# the destination is a
for samp in StatsBase.sample!(a, a, ordered=true, replace=true)
counts[samp] += 1
end
end
end
counts
end
5-element Array{Float64,1}:
41697.0
76248.0
104670.0
128734.0
148651.0using Julia 1.5.3 and StatsBase v0.33.2.
mahiki
Metadata
Metadata
Assignees
Labels
No labels