Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for legend_field with geo data #9398

Open
raholler opened this issue Nov 11, 2019 · 6 comments
Open

Add support for legend_field with geo data #9398

raholler opened this issue Nov 11, 2019 · 6 comments

Comments

@raholler
Copy link

raholler commented Nov 11, 2019

ALL software version info (bokeh, python, notebook, OS, browser, any other relevant packages)

  • bokeh version 1.4.0
  • python 3.7
  • Windows 10

Description of expected behavior and the observed behavior

I want to add a legend to my plot of Geodata. In particular, I plot point data with different coloring according to a categorical variable in my data set. I transform my geopandas to to a GeoJsonDataSource accordingly. Everything works well, except creating the legend.

When I follow the following example: https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#legends
I get the following error:

Column to be grouped does not exist in glyph data source.

Even though I include the source in the glyph method, i.e.

p1.circle(
    "x",
    "y",
    source=geosource,
    fill_color={"field": "share_prot", "transform": color_mapper},
    line_color="black",
    line_alpha=0.5,
    line_width=0.3,
    alpha=0.6,
    #size=2,
    legend_group="share_prot"
)

Complete, minimal, self-contained example code that reproduces the issue

from bokeh.io import output_file, show
from bokeh.models import GeoJSONDataSource
from bokeh.plotting import figure
from bokeh.sampledata.sample_geojson import geojson
import json

data = json.loads(geojson)
for i in range(len(data['features'])):
    data['features'][i]['properties']['Color'] = ['blue', 'red'][i%2]

geo_source = GeoJSONDataSource(geojson=json.dumps(data))
p = figure(background_fill_color="lightgrey")
p.circle(x='x', y='y', size=15, color='Color', alpha=0.7, source=geo_source, legend_group='Color')

show(p)

Stack traceback and/or browser JavaScript console output

<ipython-input-133-afa857e8c002> in <module>
     12 color_mapper = CategoricalColorMapper(factors=share_prot.unique(), palette=palette)
     13 p = figure(background_fill_color="lightgrey")
---> 14 p.circle(x='x', y='y', size=15, color='Color', alpha=0.7, source=geo_source, legend_group='Color')
     15 
     16 show(p)

fakesource in circle(self, x, y, **kwargs)

~\Anaconda3\envs\adv\lib\site-packages\bokeh\plotting\helpers.py in func(self, **kwargs)
    930 
    931         if legend_kwarg:
--> 932             _update_legend(self, legend_kwarg, glyph_renderer)
    933 
    934         self.renderers.append(glyph_renderer)

~\Anaconda3\envs\adv\lib\site-packages\bokeh\plotting\helpers.py in _update_legend(plot, legend_kwarg, glyph_renderer)
    487     kwarg, value = list(legend_kwarg.items())[0]
    488 
--> 489     _LEGEND_KWARG_HANDLERS[kwarg](value, legend, glyph_renderer)
    490 
    491 

~\Anaconda3\envs\adv\lib\site-packages\bokeh\plotting\helpers.py in _handle_legend_group(label, legend, glyph_renderer)
    454         raise ValueError("Cannot use 'legend_group' on a glyph without a data source already configured")
    455     if not (hasattr(source, 'column_names') and label in source.column_names):
--> 456         raise ValueError("Column to be grouped does not exist in glyph data source")
    457 
    458     column = source.data[label]

ValueError: Column to be grouped does not exist in glyph data source
@bryevdv
Copy link
Member

bryevdv commented Nov 12, 2019

@raholler It would be a fair amount of effort to make legend_group, which does does grouping on the Python side, because GeoJSON data source columns are actually not fully realized until things hit the browser.

However, a very minor change allows legend_field, which does grouping in the browser, to function:

Screen Shot 2019-11-11 at 6 38 42 PM

I am going to mark this issue as a feature to add support for legend_field, and also to raise a more informative error when legend_group is attempted to be used with a geo source, stating explicitly that that combination is not supported (but that legend_field is).

Would you like to work on this PR, with some guidance?

@bryevdv bryevdv changed the title [BUG] Automatic Grouping (Python) does not work for geodata Add support for legend_field with geo data Nov 12, 2019
@bryevdv
Copy link
Member

bryevdv commented Nov 12, 2019

For refernece, the change I made to generate the above plot with legend_field was:

diff --git a/bokeh/models/annotations.py b/bokeh/models/annotations.py
index 421fd0bd1..e48bc4e8e 100644
--- a/bokeh/models/annotations.py
+++ b/bokeh/models/annotations.py
@@ -154,14 +154,14 @@ class LegendItem(Model):
             if len({r.data_source for r in self.renderers}) != 1:
                 return str(self)

-    @error(BAD_COLUMN_NAME)
-    def _check_field_label_on_data_source(self):
-        if self.label and 'field' in self.label:
-            if len(self.renderers) < 1:
-                return str(self)
-            source = self.renderers[0].data_source
-            if self.label.get('field') not in source.column_names:
-                return str(self)
+    # @error(BAD_COLUMN_NAME)
+    # def _check_field_label_on_data_source(self):
+    #     if self.label and 'field' in self.label:
+    #         if len(self.renderers) < 1:
+    #             return str(self)
+    #         source = self.renderers[0].data_source
+    #         if self.label.get('field') not in source.column_names:
+    #             return str(self)

A real solution would not comment out the validation check, but either:

  • make it skip when the source is a GeoJSONDataSource, or
  • have it more carefully inspect the geo json properties directly

@bryevdv bryevdv added this to the short-term milestone Nov 12, 2019
@raholler
Copy link
Author

I generally would work on it with some guidance, but I am overly busy until December 13th. If after that is fine for you, I can do it (try, am not that experienced in working on packages).

But I think in general, one could rethink the treatment of geodata. Most people that work with geodata in python use geopandas. Maybe a more direct link to geopandas would be useful/easier instead of going through GeoJSON. I saw a related issue, but cannot find it anymore.

@bryevdv
Copy link
Member

bryevdv commented Nov 13, 2019

@raholler There's not hurry so happy to work with you on this whenever you are able to look at it.

As for GeoPandas, I think that would be fantastic, but is also an orthogonal concern, I think. It would be appropriate to make a new issue to start a discussion about that.

@meenurajapandian
Copy link

I came across this while looking for a way to add a legend to a heatmap with a linear colormapper that uses a GeoJSONDataSource. But the legend_field does not work for me either.

The figure is basically this:
p.patches('xs', 'ys', fill_color={'field': 'some_field', 'transform': mapper}, source=geo_src, legend_field='some_field')

some_field is a feature of each patch and geo_src is a GeoJSONDataSource. Color is mapped as required but not able to add legend. Is there a work around to get the legend?

@hmanuel1
Copy link

hmanuel1 commented Apr 17, 2020

I ran into the same issue with the custom legend for geopandas. This is working somewhat for me. It will work if you don't expect x_range or y_range to change. Trying to figure out how to prevent the "fake quads" from rendering, so it will not affect x_range or y_range when replacing the content of a figure. It will be nice that manual legends can be implemented without having to specify an actual coordinate pair in the plot.

image

from bokeh.io import show
from bokeh.models import LogColorMapper, Legend
from bokeh.palettes import Viridis6 as palette
from bokeh.plotting import figure
from bokeh.sampledata.unemployment import data as unemployment
from bokeh.sampledata.us_counties import data as counties

palette = tuple(reversed(palette))

counties = {
    code: county for code, county in counties.items() if county["state"] == "tx"
}

county_xs = [county["lons"] for county in counties.values()]
county_ys = [county["lats"] for county in counties.values()]

county_names = [county['name'] for county in counties.values()]
county_rates = [unemployment[county_id] for county_id in counties]
color_mapper = LogColorMapper(palette=palette)

data=dict(
    x=county_xs,
    y=county_ys,
    name=county_names,
    rate=county_rates,
)

TOOLS = "pan,wheel_zoom,reset,hover,save"

p = figure(
    title="Texas Unemployment, 2009", tools=TOOLS,
    x_axis_location=None, y_axis_location=None,
    tooltips=[
        ("Name", "@name"), ("Unemployment rate", "@rate%"), ("(Long, Lat)", "($x, $y)")
    ])
p.grid.grid_line_color = None
p.hover.point_policy = "follow_mouse"

p.patches('x', 'y', source=data,
          fill_color={'field': 'rate', 'transform': color_mapper},
          fill_alpha=0.7, line_color="white", line_width=0.5)

"""
 custom geo legend that works with this example
 with geopandas dataframe see the next commented block
"""
xq, yq = data['x'][0][0], data['y'][0][0]

"""
 for geopandas dataframe you can do the same by selecting a coord of a valid polygon
 the following line will take the coordinate pair xq and yq from a
 geopandas dataframe (gdf) polygon...
(xq, yq) = list(gdf['geometry'].values[0].envelope.centroid.coords)[0]
"""

legend_names = []
for i in range(len(palette)):
    legend_names.append(f"Legend Item {i}")

items = []
for i in reversed(range(len(palette))):
    items += [(legend_names[i], [p.quad(top=yq, bottom=yq, left=xq,
              right=xq, fill_color=palette[i])])]

p.add_layout(Legend(items=items, location='bottom_left',
             title="Unemployment Rate:"))

show(p)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants