Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Scatter plots silently plot wrong data when NaNs are present #3258

Closed
niclasmattsson opened this issue Jan 29, 2021 · 8 comments · Fixed by #3320
Closed

[BUG] Scatter plots silently plot wrong data when NaNs are present #3258

niclasmattsson opened this issue Jan 29, 2021 · 8 comments · Fixed by #3320
Labels

Comments

@niclasmattsson
Copy link

Details

When you plot an XY scatter plot with additional dimensions of data in markers or colors, any NaNs in the XY data will shift the markers and colors and cause Plots to display incorrect data.

I hope this gets immediate attention because I think this is a super serious problem. I very nearly submitted a paper with completely messed up results because of this.

Demo (correct): note that the third circle is large and green.

julia> scatter([1,2,3], [3,2,1], markersize=[30,10,30], c=[1,2,3], xlims=(0,4), ylims=(0,4), legend=false)

newplot (2)

Now change a coordinate for circle 2 to NaN and note what happens with circle 3.

julia> scatter([1,2,3], [3,NaN,1], markersize=[30,10,30], c=[1,2,3], xlims=(0,4), ylims=(0,4), legend=false)

newplot (3)

Backends

This bug occurs on ( insert x below )

Backend yes no untested
gr (default) X
pyplot X
plotly X
plotlyjs X
pgfplotsx X
inspectdr X

Versions

Plots.jl version: 1.10.2
Backend version (]st -m):
Output of versioninfo():

julia> versioninfo()
Julia Version 1.6.0-beta1.0
Commit b84990e1ac (2021-01-08 12:42 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.0 (ORCJIT, broadwell)
Environment:
  JULIA_EDITOR = "C:\Program Files\Microsoft VS Code\Code.exe"
  JULIA_NUM_THREADS = 4
  JULIA_PKG_DEVDIR = C:/Stuff
@daschw
Copy link
Member

daschw commented Jan 29, 2021

I can see how this is unexpected and that is really unfortunate. However, I am afraid that this is the expected behavior, because NaNs are used in Plots to seperate series segments as illustrated in reference image 49. Changing this would also break the behavior described in http://docs.juliaplots.org/latest/input_data/#Unconnected-Data-within-same-groups.

@niclasmattsson
Copy link
Author

Yes, I've used that segment functionality and found it very convenient. But now that this bug/feature bit me really hard I think something needs to change. I had no lines in my figure so segment breaks weren't even on the radar in my case. Further, NaNs naturally arise in scientific computations and since they propagate correctly without errors (and plot invisibly as expected), many times people don't bother detecting and handling them. Having Plots silently shift other dimensions when NaNs are present is just too dangerous behavior for the main visualization tool in a computations-oriented language. So let me begin brainstorming how to resolve this:

  • Could Plots check the lengths of all the data dimensions and throw an error or warning when they don't match? In my example above there are unused elements in the marker and color vectors.
  • Similarly, maybe Plots could require that when NaNs are used as segment breaks, there must be NaNs in that position in all data dimensions?
  • Maybe Plots could come up with some other marker to indicate segment breaks?

@RodolfoFigueroa
Copy link

RodolfoFigueroa commented Jan 29, 2021

I have to agree. The segment functionality is very handy, but having Plots shift parameter arrays without so much as a warning is extremely dangerous. I'm currently combing through my old code to see if this affected any of my plots, because I had no idea this "feature" existed until now.

@mkborregaard
Copy link
Member

I think @niclasmattsson first suggestion:

Could Plots check the lengths of all the data dimensions and throw an error or warning when they don't match? In my example above there are unused elements in the marker and color vectors.

sounds like a good solution.

@daschw
Copy link
Member

daschw commented Feb 1, 2021

If, I'm not mistaken, then we would not be able to allow automatic cycling anymore, like in

scatter(rand(6), color=[:red, :blue, :green], marker=[:square, :circle])

I'm not sure how widely these things are used and maybe it would be better to be a little bit more restrictive regarding the input in Plots and I am open to discuss this, but a change like this would be really breaking and could only happen in Plots 2.0

@niclasmattsson
Copy link
Author

How about semiautomatic cycling then? :)

scatter(rand(6), color=[:red, :blue, :green], marker=[:square, :circle], cycle=true)

That keyword argument flag suggests a fourth idea for my brainstorm list: maybe interpreting NaNs as segment breaks must be enabled by specifically adding nanbreaks=true?

I understand that any breaking change has to wait for the next major release. In any case, thanks for listening and taking this seriously.

@mkborregaard
Copy link
Member

Relevant issues
#2980
#1325
#1151 (mysteriously closed)

@yha
Copy link
Member

yha commented Feb 24, 2021

I've just found this issue report, after writing a fix (I'll send the PR soon, after adding some tests). I've been bitten by this several times recently.
I think the current behavior is bad enough to warrant fixing at the price of a breaking change. NaNs can appear in data "at random" so it should not affect how indexes in data correspond to indexes in attributes. It's seems to more much more likely to silently produce wrong plots than to be used intentionally.
Also, this behavior seems to be somewhat new: I see that it was introduced in #2940 (31 August 2020), and before that there was no consistency among backends. According to the "before" example there, the pyplot backend was already doing the right thing (at least for the markershape and color attributes) before that PR. So I think this should be considered a regression to be fixed rather than intended behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants