[BUG] Scatter plots silently plot wrong data when NaNs are present #3258

niclasmattsson · 2021-01-29T13:19:00Z

Details

When you plot an XY scatter plot with additional dimensions of data in markers or colors, any NaNs in the XY data will shift the markers and colors and cause Plots to display incorrect data.

I hope this gets immediate attention because I think this is a super serious problem. I very nearly submitted a paper with completely messed up results because of this.

Demo (correct): note that the third circle is large and green.

julia> scatter([1,2,3], [3,2,1], markersize=[30,10,30], c=[1,2,3], xlims=(0,4), ylims=(0,4), legend=false)

Now change a coordinate for circle 2 to NaN and note what happens with circle 3.

julia> scatter([1,2,3], [3,NaN,1], markersize=[30,10,30], c=[1,2,3], xlims=(0,4), ylims=(0,4), legend=false)

Backends

This bug occurs on ( insert x below )

Backend	yes	untested
gr (default)	X
pyplot		X
plotly	X
plotlyjs		X
pgfplotsx		X
inspectdr		X

Versions

Plots.jl version: 1.10.2
Backend version (]st -m):
Output of versioninfo():

julia> versioninfo()
Julia Version 1.6.0-beta1.0
Commit b84990e1ac (2021-01-08 12:42 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.0 (ORCJIT, broadwell)
Environment:
  JULIA_EDITOR = "C:\Program Files\Microsoft VS Code\Code.exe"
  JULIA_NUM_THREADS = 4
  JULIA_PKG_DEVDIR = C:/Stuff

The text was updated successfully, but these errors were encountered:

daschw · 2021-01-29T15:27:54Z

I can see how this is unexpected and that is really unfortunate. However, I am afraid that this is the expected behavior, because NaNs are used in Plots to seperate series segments as illustrated in reference image 49. Changing this would also break the behavior described in http://docs.juliaplots.org/latest/input_data/#Unconnected-Data-within-same-groups.

niclasmattsson · 2021-01-29T19:18:20Z

Yes, I've used that segment functionality and found it very convenient. But now that this bug/feature bit me really hard I think something needs to change. I had no lines in my figure so segment breaks weren't even on the radar in my case. Further, NaNs naturally arise in scientific computations and since they propagate correctly without errors (and plot invisibly as expected), many times people don't bother detecting and handling them. Having Plots silently shift other dimensions when NaNs are present is just too dangerous behavior for the main visualization tool in a computations-oriented language. So let me begin brainstorming how to resolve this:

Could Plots check the lengths of all the data dimensions and throw an error or warning when they don't match? In my example above there are unused elements in the marker and color vectors.
Similarly, maybe Plots could require that when NaNs are used as segment breaks, there must be NaNs in that position in all data dimensions?
Maybe Plots could come up with some other marker to indicate segment breaks?

RodolfoFigueroa · 2021-01-29T19:31:57Z

I have to agree. The segment functionality is very handy, but having Plots shift parameter arrays without so much as a warning is extremely dangerous. I'm currently combing through my old code to see if this affected any of my plots, because I had no idea this "feature" existed until now.

mkborregaard · 2021-02-01T08:25:59Z

I think @niclasmattsson first suggestion:

Could Plots check the lengths of all the data dimensions and throw an error or warning when they don't match? In my example above there are unused elements in the marker and color vectors.

sounds like a good solution.

daschw · 2021-02-01T19:42:19Z

If, I'm not mistaken, then we would not be able to allow automatic cycling anymore, like in

scatter(rand(6), color=[:red, :blue, :green], marker=[:square, :circle])

I'm not sure how widely these things are used and maybe it would be better to be a little bit more restrictive regarding the input in Plots and I am open to discuss this, but a change like this would be really breaking and could only happen in Plots 2.0

niclasmattsson · 2021-02-01T22:31:58Z

How about semiautomatic cycling then? :)

scatter(rand(6), color=[:red, :blue, :green], marker=[:square, :circle], cycle=true)

That keyword argument flag suggests a fourth idea for my brainstorm list: maybe interpreting NaNs as segment breaks must be enabled by specifically adding nanbreaks=true?

I understand that any breaking change has to wait for the next major release. In any case, thanks for listening and taking this seriously.

mkborregaard · 2021-02-02T10:16:59Z

Relevant issues
#2980
#1325
#1151 (mysteriously closed)

yha · 2021-02-24T21:17:08Z

I've just found this issue report, after writing a fix (I'll send the PR soon, after adding some tests). I've been bitten by this several times recently.
I think the current behavior is bad enough to warrant fixing at the price of a breaking change. NaNs can appear in data "at random" so it should not affect how indexes in data correspond to indexes in attributes. It's seems to more much more likely to silently produce wrong plots than to be used intentionally.
Also, this behavior seems to be somewhat new: I see that it was introduced in #2940 (31 August 2020), and before that there was no consistency among backends. According to the "before" example there, the pyplot backend was already doing the right thing (at least for the markershape and color attributes) before that PR. So I think this should be considered a regression to be fixed rather than intended behavior.

niclasmattsson added the bug label Jan 29, 2021

yha mentioned this issue Feb 25, 2021

Fix for "segmented" attributes with NaNs #3320

Merged

yha closed this as completed in #3320 Mar 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Scatter plots silently plot wrong data when NaNs are present #3258

[BUG] Scatter plots silently plot wrong data when NaNs are present #3258

niclasmattsson commented Jan 29, 2021

daschw commented Jan 29, 2021

niclasmattsson commented Jan 29, 2021

RodolfoFigueroa commented Jan 29, 2021 •

edited

mkborregaard commented Feb 1, 2021

daschw commented Feb 1, 2021

niclasmattsson commented Feb 1, 2021

mkborregaard commented Feb 2, 2021

yha commented Feb 24, 2021

[BUG] Scatter plots silently plot wrong data when NaNs are present #3258

[BUG] Scatter plots silently plot wrong data when NaNs are present #3258

Comments

niclasmattsson commented Jan 29, 2021

Details

Backends

Versions

daschw commented Jan 29, 2021

niclasmattsson commented Jan 29, 2021

RodolfoFigueroa commented Jan 29, 2021 • edited

mkborregaard commented Feb 1, 2021

daschw commented Feb 1, 2021

niclasmattsson commented Feb 1, 2021

mkborregaard commented Feb 2, 2021

yha commented Feb 24, 2021

RodolfoFigueroa commented Jan 29, 2021 •

edited