New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Error Bar Plots #2352

Closed
MaxNoe opened this Issue May 29, 2015 · 48 comments

Comments

Projects
None yet
@MaxNoe

MaxNoe commented May 29, 2015

For scientists one hugely important feature would be plots with error bars und lower/upper limits like these:

http://matplotlib.org/examples/statistics/errorbar_limits.html

Also see this stackexchange question:
http://stackoverflow.com/questions/29166353/how-do-you-add-error-bars-to-bokeh-plots-in-python

@damianavila damianavila added this to the short-term milestone May 30, 2015

@MaxNoe

This comment has been minimized.

MaxNoe commented Jul 24, 2015

Any news on this?

@michaelaye

This comment has been minimized.

michaelaye commented Sep 1, 2015

👍

@bsipocz

This comment has been minimized.

Contributor

bsipocz commented Sep 24, 2015

👍

And thanks to @MaxNoe for the SO answer, that will do until it gets built in.

@bsipocz

This comment has been minimized.

Contributor

bsipocz commented Sep 24, 2015

Now looking into it more closely, the SO solution is not working with interactive plots. Back to square one.

@MaxNoe

This comment has been minimized.

MaxNoe commented Sep 24, 2015

What's the problem?

@bsipocz

This comment has been minimized.

Contributor

bsipocz commented Sep 24, 2015

errorbar(pp, x='hjd', y='flux', xerr='fluxerr', source=s2)

​source.callback = CustomJS(args=dict(s2=s2), code="""
        var inds = cb_obj.get('selected')['1d'].indices;
        var d1 = cb_obj.get('data');

​        s2.get('data')['hjd'] = d1['hjd'][inds[0]]
        s2.get('data')['flux'] = d1['flux'][inds[0]]
        s2.get('data')['fluxerr'] = d1['fluxerr'][inds[0]]

        cb_obj.trigger('change');
        s2.trigger('change');
    """)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-524c4b97c72f> in <module>()
     30 #    pp.line(hjd[i], flux[i])
     31 
---> 32 errorbar(pp, x='hjd', y='flux', xerr='fluxerr', source=s2)
     33 
     34 source.callback = CustomJS(args=dict(s2=s2), code="""

<ipython-input-29-ce33a5d4f725> in errorbar(fig, x, y, xerr, yerr, color, source, point_kwargs, error_kwargs)
      7       x_err_y = []
      8       for px, py, err in zip(x, y, xerr):
----> 9           x_err_x.append((px - err, px + err))
     10           x_err_y.append((py, py))
     11       fig.multi_line(x_err_x, x_err_y, color=color, source=source, **error_kwargs)

TypeError: unsupported operand type(s) for -: 'str' and 'str'

@MaxNoe

This comment has been minimized.

MaxNoe commented Sep 24, 2015

You need to give it the data as something iterable.

@jacopo-chevallard

This comment has been minimized.

jacopo-chevallard commented May 24, 2016

+1

4 similar comments
@andyfaff

This comment has been minimized.

andyfaff commented Jul 20, 2016

+1

@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Oct 4, 2016

+1

@Apfelkuchen

This comment has been minimized.

Apfelkuchen commented Oct 13, 2016

+1

@kippvs

This comment has been minimized.

kippvs commented Dec 5, 2016

+1

@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Dec 14, 2016

@bokeh/dev, has there been consideration as to plotting errorbars? Would this be best in plotting.figure.errorbar, i.e., as a glyph similar to circle?

This method produces a nice result by wrapping existing code:

def errorbar(fig, x, y, xerr=None, yerr=None, color='red', 
             point_kwargs={}, error_kwargs={}):

    fig.circle(x, y, color=color, **point_kwargs)

    if xerr is not None:
        x_err_x = []
        x_err_y = []
        for px, py, err in zip(x, y, xerr):
            x_err_x.append((px - err, px + err))
            x_err_y.append((py, py))
        fig.multi_line(x_err_x, x_err_y, color=color, **error_kwargs)

    if yerr is not None:
        y_err_x = []
        y_err_y = []
        for px, py, err in zip(x, y, yerr):
            y_err_x.append((px, px))
            y_err_y.append((py - err, py + err))
        fig.multi_line(y_err_x, y_err_y, color=color, **error_kwargs)

I am thinking to try to implement this though it may take me a few weeks to get the time to fully complete it. Please comment with any thoughts/recommendations.

@ArtyomKaltovich

This comment has been minimized.

ArtyomKaltovich commented Feb 3, 2017

I think it should be 2 variant: user pass errors as param, and they are calculating automatically by code.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Feb 4, 2017

So I think there are a couple of levels here. Give +/- error values there are a variety of ways to indicate them visually:

  • lines (stems) drawn between the +/- endpoints
  • markers or segments (whiskers) drawn at the +/- endpoints
  • both of the above
  • a filled area between the +/- sides

I think people will want all these ways, so any solution should cater to all of them.

Given that, the first thing is that I think there is a missing useful Annotation, which is

  • FilledArea

Given two series and a dimension, (horizontal or vertical) it will draw a filled area between the two series. This differs from PolyAnnotation in that it specifically fills between two series, it's not an arbitrary polygon. It could perhaps be a subclass of PolyAnnotation to re-use implementation, though. This is useful on its own and could easily be a standalone PR.

It might also make sense to have Stem and Whisker as new annotations too. Then we could have a higher level function bokeh.plotting.errorbars or something that takes the upper and lower error, some description of the kind of visual desired, and whatever other configuration is needed and constructs the appropriate lower level annotations to add to a plot.

This is all very rough at the moment, though. Thoughts?

@bryevdv

This comment has been minimized.

Member

bryevdv commented Feb 4, 2017

I think we'd want new Annotations, btw so that we could send upper and lower as actual data to BokehJS. If we do that, it opens up great possibilities with the new data spec transforms:

FilledArea(upper={'field': 'y':, 'transform': ADD_FIELD('upper_error')}, ...
           source=source)

Then (pending one open issue) if the y data or 'upper_error' columns were changed, everything would adjust automatically.

@bryevdv bryevdv modified the milestones: 0.12.6, short-term Feb 4, 2017

@bryevdv

This comment has been minimized.

Member

bryevdv commented Feb 4, 2017

Lastly, I don't think at this point we should be automatically computing error bars. That is something for higher level facilities that are close to the original input data for an analysis to generate.

@ArtyomKaltovich

This comment has been minimized.

ArtyomKaltovich commented Feb 4, 2017

@bryevdv
About error computation. I think calculating standard deviation is enough. Also it is possible to pass function for error computing as a parameter. Mb this is even better: user can pass list of int (errors values) or function for error calculating.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Feb 4, 2017

@ArtyomKaltovich respectfully I still maintain my earlier statement. At this point, looking towards a 1.0 release, we are actively looking for ways to shrink the library, reduce dependencies, and present a smaller testing surface, in order to make Bokeh a rock-solid, more easily maintainable foundation for other tools to build upon at higher levels.

@ArtyomKaltovich

This comment has been minimized.

ArtyomKaltovich commented Feb 4, 2017

Yeah, and than create this :)

So I think there are a couple of levels here. Give +/- error values there are a variety of ways to indicate them visually:

lines (stems) drawn between the +/- endpoints
markers or segments (whiskers) drawn at the +/- endpoints
both of the above
a filled area between the +/- sides

Anyway, creating plots like this http://seaborn.pydata.org/_images/seaborn-barplot-1.png is important feature, and many scientists use it a lot.
So, mb it is better to add error parameter to bar() and circle() methods of figure, instead of creating separated methods.

@jbednar

This comment has been minimized.

Contributor

jbednar commented Feb 5, 2017

@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Feb 5, 2017

I don't think at this point we should be automatically computing error bars. That is something for higher level facilities that are close to the original input data for an analysis to generate.

there are lots of ways of defining error bars, and just supporting one (e.g. stddev) is not going to be very useful

I completely agree. There is too much misunderstanding about error estimates, particularly the difference between standard deviation and standard error of the mean.

Automating the error calculation separates the user from the computation process, making it more likely they will not accurately represent the steps taken when presenting the data. This would effectively move responsibility from the user to the plotting package, requiring them to lookup either in the docs or the code what they are actually representing.

It is a simple thing for users to provide whatever error they want to show.

@michaelaye

This comment has been minimized.

michaelaye commented Feb 5, 2017

I second priority on user being able to pass in the errors. Much more important than any automatic calculation that just prevents the user from carefully thinking about what errors actually are.

@MaxNoe

This comment has been minimized.

MaxNoe commented Feb 6, 2017

I also think bokeh should only do the visualisation part, not the statistics part.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 22, 2017

I'd like to get to this in 0.12.6. To summarize the current plan as I understand it. Three new annotations:

  • FilledArea
  • Stem
  • Whisker
@MaxNoe

This comment has been minimized.

MaxNoe commented Mar 22, 2017

Arrows would also be great to mark upper and lower limits

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 22, 2017

Arrows already exist

@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Mar 23, 2017

Just to be clear, are these examples what you are referring to?

  • FilledArea
    filled
  • Stem
    stem
  • Whisker
    whisker
@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Mar 23, 2017

Some considerations to keep in mind for the Whisker option, I have seen some plotting tools (Matplotlib or Matlab) make the Whisker width much larger when plot on a logarithmic axis. This can obscure the data. A simple solution is to switch to the Stem method of showing errorbars. A more useful default would be that the whisker width consistently depends on the number of pixels in the corresponding axis range, i.e., the same width on log/linear and the width does get larger/smaller when zooming in/out. Also, having control on the width of the whiskers (maybe as a multiplicative scale factor to the default value) could be useful.

@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Mar 23, 2017

In addition to being able to add errorbars to lines and scatter plots, it would be worthwhile to add them to the top (or potentially the bottom?) of vertical bar charts (or the right/left for horizontal bar charts). This should be able to reuse most of the same code, just center the errorbars at the edge of the glyph.
bar chart error

@MaxNoe

This comment has been minimized.

MaxNoe commented Mar 23, 2017

Is it possible to achieve this with the current arrow annotation?
(Mixed arrows / whiskers / stem)

img

@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Mar 23, 2017

Another thing to keep in mind, related to log plots, is that when changing to a logarithmic axis, the errorbars should be rescaled to the logarithmic reference frame accordingly. Mathematically, we scale from the y frame to the y' frame as

y' = log(y) = ln(y) / ln(10)

Similarly, the error in y, we will use dy, is also transformed

dy'/dy = d/dy log(y) = d/dy ln(y) / ln(10)
dy'/dy = (1/y) / 2.303 = 0.434 / y
dy' = 0.434 dy / y

So instead of using dy', the error should be 0.434 dy / y.

Here is a bit more in depth explanation.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 23, 2017

@MaxNoe Absolutely, and with the CustomJSTransform in 0.12.5 you can compute the end points from the data points in the browser.

@StevenCHowell that's roughly the breakdown, though for Whisker I would imagine just the whisker part, not the stem as well.

The log considerations are important, thanks for bringing them up. It may necessary to just get a basic functionality added first and add refinements later.

@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Mar 23, 2017

for Whisker I would imagine just the whisker part, not the stem as well.

Personally, I have not seen this used before (though I did find someone who made the stems dashed). @bryevdv, have you seen whiskers w/o stems much? I wonder if the error limits could get confused as another data set or other points (though if they are close enough together in x it would start to look like dashed line resembling the FilledArea option).

FWIW, the combined whisker and stem representation was the default in Matplotlib 1.5 (2.0 dropped the whiskers) and also Matlab. It may surprise/confuse users more familiar with the combined representation to see just whiskers, or to have to use both Whisker and Stem to get what seems a pretty standard representation.

@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Mar 23, 2017

Personally, I think that for most datasets the stem representation is less cluttered and sufficient. It seems others share this preference as Matplotlib changed their default.

@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Mar 23, 2017

To achieve

Mixed arrows / whiskers / stem

what would the API be like?

Would there be a unified errorbar method

p = bokeh.plotting.figure()
p.errorbar(x, y, xerr=xerr, yerr=yerr, xerr_style=x_style_array, yerr_style=y_style_array)

or would it use boolean arrays with a separate method for each errorbar type?

# define a mask to select how to display the errors
whisker_mask = np.array([0, 0, 1, 0, 1, 0, 0, 0, 1, 0], dtype=bool)
arrow_mask = ~whiskers

p1 = bokeh.plotting.figure()
p1.Whiskers(x1[whisker_mask], y1[whisker_mask], yerr=yerr[whisker_mask])
p1.Arrow(x2[arrow_mask], y2[arrow_mask], yerr=yerr[arrow_mask])

p2 = bokeh.plotting.figure()
p2.line(x3, y3, y_whiskers=yerr[whisker_mask], y_arrows=[arrow_mask])
@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Mar 23, 2017

Note that to get the one directional arrows, @MaxNoe linked to, Matplotlib uses uplims and lolims kwargs.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 23, 2017

Personally, I have not seen this used before (though I did find someone who made the stems dashed). @bryevdv, have you seen whiskers w/o stems much?

This concerns the set of low level models that is needed to support all the use cases. Higher level APIs could assemble them in other ways or in combination

@bryevdv

This comment has been minimized.

Member

bryevdv commented Apr 17, 2017

relevant comment from another issue #5349 (comment)

since I am currently drawing points/errorbars as two separate sets of glyps, the hiding only half works, but I am guessing this functionality will work with the upcoming FilledArea/Whisker/Stem classes?

@StevenCHowell

This comment has been minimized.

Member

StevenCHowell commented Apr 18, 2017

I have a colleague who just came to me trying to draw whisker errorbars. Though somewhat of a hack, FWIW I modified the MultiLine example to help him out. I changed the definition of xpts and ypts using

h = 2.0/3
w = 1.0/3
xpts = np.array([0, w, w/2, w/2, 0, w])
ypts = np.array([0, 0, 0, h, h, h])

to produce this
image
It's a small step to customizing each one to his data.

@canavandl canavandl referenced this issue Apr 20, 2017

Merged

FilledArea Annotation #6177

1 of 2 tasks complete

@canavandl canavandl self-assigned this May 16, 2017

@canavandl canavandl referenced this issue Jun 4, 2017

Merged

Whisker annotation #6381

3 of 3 tasks complete

@bryevdv bryevdv closed this in #6381 Jun 5, 2017

@bryevdv

This comment has been minimized.

Member

bryevdv commented Jun 5, 2017

The two merged PRs add support for an error band, and whisker style error bar annotations. I think it would still be good to add add a higher level errorbar function but let's leave that for a separate issue/PR.

@bsipocz

This comment has been minimized.

Contributor

bsipocz commented Jun 5, 2017

@bryevdv @canavandl - Brilliant! I can't wait to try this out and get rid of my errorbar workarounds. Thanks.

@portermahoney

This comment has been minimized.

portermahoney commented Jul 26, 2017

Whisker annotation works great for scatter plots. Are there plans to get it working with vbar?

@bryevdv

This comment has been minimized.

Member

bryevdv commented Jul 26, 2017

It should currently, are you saying you've tried and it doesn't?

@portermahoney

This comment has been minimized.

portermahoney commented Jul 26, 2017

Sorry, my mistake. It does work! Here's an example of scatter and vbar, incase it helps anyone else. Thanks!

        x = [1,2,3,4,5]
        y = [6,7,2,4,5]
        p = figure(title='test', x_axis_label = 'x', y_axis_label = 'y')
        p.circle(x, y, size=10, color="red", legend='Temp.', alpha=0.5)
        base = x
        upper = [7,7.1,3,4.5,5.5]
        lower = [2,2,2,2,2]
        errors = ColumnDataSource(data=dict(base=base, lower=lower, upper=upper))
        p.add_layout(Whisker(source=errors, base='base', upper='upper',
        lower='lower'))

        bar_test = figure(title='test')
        bar_test.vbar(x = x, width = 0.5, top = y)
        bar_test.add_layout(Whisker(source = errors, base = 'base',
        upper = 'upper', lower = 'lower'))
@bryevdv

This comment has been minimized.

Member

bryevdv commented Jul 26, 2017

Great! :)

@varchasgopalaswamy

This comment has been minimized.

varchasgopalaswamy commented Aug 30, 2017

Since interactive legends (on-click hide policy) work on an on-glyph basis,trying to hide scatter plots with whiskers on them don't work as expected - the scatter glyphs are hidden, but the whiskers remain. Is there a way to create a glyph that combines both the marker and the whiskers into a single super-glyph?

@mattpap

This comment has been minimized.

Contributor

mattpap commented Aug 31, 2017

Is there a way to create a glyph that combines both the marker and the whiskers into a single super-glyph?

No. There's an open issue for that (#177). However, this wouldn't work anyway, because whiskers are annotations, and those are governed by a separate set of rules. If whiskers were glyphs, this would work right now without compound glyphs, you would just need to add whisker glyph to legend's renderers set.

There are two things that could happen here. Either we can add a special case for annotations in legend (relax type constraint on LegendItem.renderers). Or we could rethink what is an annotation and what is not (another example is arrows which make sense in both setups, and not having arrow glyphs prohibit certain kinds of plots).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment