Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly warn that CDSView is unsupported on line glyphs #9388

Closed
mcsoini opened this issue Nov 8, 2019 · 8 comments · Fixed by #9559
Closed

Explicitly warn that CDSView is unsupported on line glyphs #9388

mcsoini opened this issue Nov 8, 2019 · 8 comments · Fixed by #9559

Comments

@mcsoini
Copy link

mcsoini commented Nov 8, 2019

Bokeh: 1.3.4
Python: 3.6.8
OS: GNU/Linux x86_64
Browser: Firefox 70.0.1/Chromium 78.0.3904.70

Figures containing line glyphs generated from the same ColumnDataSource with different CDSViews share the same initial y axis range (however, they pan and zoom independently).
This is not the case for other plot types like circles, areas, bars (independent y range adapted to data ranges for each figure separately).

Expected behavior

  • independent axes ranges
  • consistency between glyphs
from bokeh.layouts import row
from bokeh.plotting import figure, show, output_file
from bokeh.sampledata.autompg import autompg
from bokeh.models import CDSView, GroupFilter
from bokeh.models.sources import ColumnDataSource

autompg = autompg.assign(efficient=(autompg.mpg > 20).astype(str))
autompg = autompg.groupby(['yr', 'efficient']).mpg.mean().reset_index()
autompg = autompg.sort_values(['efficient', 'yr'])

Example 1: Line plots and views --> produces unexpected result

list_p = []
source=ColumnDataSource(autompg)
for eff in ['True', 'False']:
    filter_ = GroupFilter(column_name='efficient', group=eff)
    view = CDSView(source=source, filters=[filter_])

    list_p.append(figure(title=eff))
    list_p[-1].line(x='yr', y='mpg', source=source, view=view)

output_file("shared_y_views.html")
show(row(list_p))

image

Example 2: Circle plots and views --> produces expected result

list_p = []
source=ColumnDataSource(autompg)
for eff in ['True', 'False']:
    filter_ = GroupFilter(column_name='efficient', group=eff)
    view = CDSView(source=source, filters=[filter_])

    list_p.append(figure(title=eff))
    list_p[-1].circle(x='yr', y='mpg', source=source, view=view)

output_file("shared_y_views_circle.html")
show(row(list_p))

image

Example 3: Line plots and separate sources --> produces expected result

# 
list_p = []
for eff in ['True', 'False']:
    source_slct = ColumnDataSource(autompg.query('efficient == "%s"'%eff))
    
    list_p.append(figure(title=eff))
    list_p[-1].line(x='yr', y='mpg', source=source_slct)

output_file("shared_y_sources.html")
show(row(list_p))

image

@bryevdv
Copy link
Member

bryevdv commented Nov 9, 2019

I'm not actually sure that CDSView together with lines can be considered supported at present. AFAIK the behavior is not well-defined or ideal in the presence of a connected topology. Ideally subsetting line data would result in multiple disjoint polylines anywhere the view leaves contiguous groups of points, split wherever points were excluded in between the groups. It definitely doesn't do that now, and if it did you'd also be left with the question of what to do with isolated points. So to the extent it does anything now, I think it just draws a single polyline through all remaining points, whether they were adjacent originally or not. I would regard this as accidental and not explicitly intentional, as that is (IMO) a questionable thing to do from a visualization perspective.

FWIW in the particular example above, I think line is not a good choice since this is not time series data (model years are discrete and categorical, really) and lines imply intermediate values that simply do not exist. Using views as they are now, only compounds that problem. If views behaved ideally on lines as I described I don't think any lines woudl get drawn at all (all the points would be isolated). A Scatter, Dot Plot, or Bar Graph would all be improvements here (that would also function with views).

If you really want lines in this case I am inclined to say that it's better to extract the subsets you want (in Python) and call line separately with those concrete subsets, rather than use views.

I'm fairly inclined to punt on this, or if any action is taken at all, to make Bokeh warn or error when views are used with lines, stating clearly that this combination is unsupported.

cc @bokeh/dev for thoughts

@p-himik
Copy link
Contributor

p-himik commented Nov 10, 2019

FWIW, I would definitely prefer an error or a warning here.

Also it may be worth pointing out that if a user wants to change views/filters via some UI controls in order to toggle some lines' visibility, it would probably make sense to use multi_line instead of line.

@mcsoini
Copy link
Author

mcsoini commented Nov 11, 2019

I understand that CDSView combined with lines has some (conceptual) issues, as discussed in issue #7070. In fact, this is why I need the autompg.sort_values(['efficient', 'yr']) step in my first example.
If the x-axis values were disconnected in the input CDS, only a single line segment would be drawn. As a user, I'm ok with how this behaves now (after figuring it out), it's really just the y-range I'm struggling with.

Concerning the appropriateness of line plots, they provide the greatest visual clarity for exploratory data analysis in my use case.

@p-himik I see that multi_lines is indeed able to produce the plots I'm looking for, using views, with independent y-ranges. However, the structure of the CDS needs to be different. This would inhibit having common CDS and views (in a parent class) used for different plot types (in various child classes).

For example, a CDS like

{'value0': [4, 6, 7, 0.8, 0.2, 0.3],
 'value1': [4, 5, 6, 0.7, 0.8, 0.4],
 'x': [0, 1, 2, 0, 1, 2],
 'subplot': ['left', 'left', 'left', 'right', 'right', 'right']}

can be used for figure.line, figure.varea_stack, etc. In contrast, figure.multi_lines would require something like

{'value': [[4, 6, 7], [0.8, 0.2, 0.3], [4, 5, 6], [0.7, 0.8, 0.4]],
 'x': [[0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2]],
 'color': ['blue', 'blue', 'red', 'red'],
 'subplot': ['left', 'right', 'left', 'right']}

TL;DR I'm really just looking for a way to have flexible plot types with the same underlying CDS structure and identical behavior, e.g. in terms of y-ranges. Otherwise, the second-best approach seem to be independent CDS's per subplot without views.

@bryevdv
Copy link
Member

bryevdv commented Nov 11, 2019

As a user, I'm ok with how this behaves now (after figuring it out)

Right in this particular case it happens to do something ok-ish (or at least close to what you want). But I think that is a coincidence and not true in general, which means leaving things in a quasi-supported state only invites support issues. I'd rather be unambiguous (with a warning/error) that this combination is simply not supported.

Given that, I agree that:

the best approach seem to be independent CDS's per subplot without views.

@bryevdv bryevdv changed the title [BUG] Unexpected shared y range for line glyphs from same source with views Explicitly warn that CDSView is unsupported on line glyphs Nov 11, 2019
@bryevdv bryevdv added this to the 2.0 milestone Nov 11, 2019
@matwilso
Copy link

matwilso commented Jun 8, 2022

This should really raise an exception or something. And the error message didn't even show when I was using multi_line

I got this warning when running with fig.line,

ERROR:bokeh.core.validation.check:E-1024 (CDSVIEW_FILTERS_WITH_CONNECTED): CDSView filters are not compatible with glyphs with connected topology such as Line or Patch: GlyphRenderer(id=1038, glyph=Line(id='1036', ...), ...)

But this did not show up with fig.multi_line, bokeh version '2.2.2', so I spent a few hours trying to figure out why it wasn't showing in my plot, to finally find this issue.

There should also be a big warning message in the docs that states this limitation: https://docs.bokeh.org/en/latest/docs/user_guide/data.html#filtering-data. Already raised here, I guess: #7070

@bryevdv
Copy link
Member

bryevdv commented Jun 9, 2022

@matwilso I would actually expect a CDS view to work with multi-line, though perhaps not in the way you are wanting? A CDS view filters "top" level indices, so for a multiline it would filter out entire sub-lines, since each top-level index in the data for a multi-line corresponds to an entire line.

If you are saying that a CDS view does not filter sub-lines of a multi-line, that is a separate concern, please open a new issue with MRE.

@matwilso
Copy link

matwilso commented Jun 9, 2022

It's alright, I was just frustrated yesterday. I found a different way of doing what I needed.

I had a dataframe with indices, where if you index into it, you will get a list of lists like the multi-line api expects. I would expect that cdsview just works in a similar way that it does for other plot types and its surprising that it doesn't I guess idk. Was my first encounter with using cdsview. Anyway, just expressing a signal of something I stumbled over

@bryevdv
Copy link
Member

bryevdv commented Jun 9, 2022

@matwilso TBH I still don't understand whether you have run in to an actual bug or not. These discussions are always more focused with a complete Minimal Reproducible Example so there does not have to be speculation. A CDS view should work with multi-line, by acting on entire individual sub-lines. But it's not clear if:

  • That's what you saw, but not what you wanted, or
  • That's not what you saw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants