Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram dropping values when dealing with signed zero #766

Closed
mtanneau opened this issue Feb 18, 2022 · 2 comments · Fixed by #768
Closed

Histogram dropping values when dealing with signed zero #766

mtanneau opened this issue Feb 18, 2022 · 2 comments · Fixed by #768

Comments

@mtanneau
Copy link

I encountered this issue through UnicodePlots (JuliaPlots/UnicodePlots.jl#229), but I believe the root cause lies in StatsBase.

It appears that histogram will drop some entries when the left-end of the data is a signed zero -0.0.
From the original issue

julia> histogram([0.0, 1.0])
              ┌                                        ┐ 
   [0.0, 0.5) ┤█████████████████████████████████████  1  
   [0.5, 1.0) ┤  0                                       
   [1.0, 1.5) ┤█████████████████████████████████████  1  
              └                                        ┘ 
                               Frequency

julia> histogram([-0.0, 1.0])
              ┌                                        ┐ 
   [0.0, 0.5) ┤  0                                         # where did that zero go?
   [0.5, 1.0) ┤  0                                       
   [1.0, 1.5) ┤█████████████████████████████████████  1  
              └                                        ┘ 
                               Frequency

julia> histogram([0.0, -0.0])
              ┌                                        ┐ 
   [0.0, 1.0) ┤█████████████████████████████████████  1    # only one value?
              └                                        ┘ 
                               Frequency

On the one hand, I can understand that -0.0 would technically fall in a separate bin than +0.0, because of floating-point ordering.
However, it was surprising to see those entries dropped from the total count.

@t-bltg
Copy link

t-bltg commented Feb 19, 2022

I think this will be more representative in StatsBase:

julia> using StatsBase
julia> fit(Histogram, [-0.0, 1.0])
Histogram{Int64, 1, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}
edges:
  0.0:0.5:1.5
weights: [0, 0, 1]
closed: left
isdensity: false

julia> fit(Histogram, [-0.0, 1.0], closed=:right)
Histogram{Int64, 1, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}
edges:
  -0.5:0.5:1.0
weights: [1, 0, 1]
closed: right
isdensity: false

@nalimilan
Copy link
Member

Good catch. Even beyond the bug, it seems that treating both zeros exactly the same makes more sense for histograms. See #768.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants