New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patches with Holes #2321
Comments
I already shared some thoughts on this with Sarah over the mail, i'll put the gist of it here as well. First of all, i don't think geographic patches/polygons are fundamentally different from non-geographic ones. So the same interface could be used for both, and i would avoid any 'geo' in the function names since it suggests something special, if all it does is plot coordinates on a plane. In my opinion the only reason to implement a new method/glyph is to preserve the API of the current one. Using an existing specification like GeoJSON seems like a good idea, improving interoperability with other systems and software compared to a Bokeh-specific syntax. Apart from GeoJSON, Well-Known-Text could also be considered. OGR (GDAL) geometry objects for example have both an ExportToJson() and Wkt() method, the same goes for Shapely etc.. One benefit over GeoJSON that i know of is the possibility of curved geometries, but i don't know if that's something Bokeh should support. Alternatively some low-level mapping of the HTML5 canvas drawing method's could also be considered, using statements like LineTo, MoveTo, ArcTo etc. Its not the most user-friendly, but very flexible. Its also very similar to how Matplotlib's Path's work (http://matplotlib.org/users/path_tutorial.html). Other formats like SVG, postscript use it as well. Here is an example how in Matplotlib you can plot like that, and converting from WKT. You basically get, for each polygon, a long array of vertices and codes, like:
http://nbviewer.ipython.org/gist/RutgerK/e6d6dd99701b444427ff |
A few quick thoughts: Smooth paths is a long-standing open feature issue: #183 I would definitely like to support them in some way. It should not actually be that hard, the question is what is the right way to transport the data since it does not really fit the ColumnDataSource model. Regarding richer representation for polygons, I think it is exactly the same situation. So now is maybe a good time to hash out the details of a way forward. Here are the options I see:
I'm not sure I have a strong preference either way. Thoughts? Bryan |
Here are some R packages for GIS and interactive JS mapping. Perhaps they will yield some useful ideas: https://github.com/dkahle/ggmap EDIT: Also would be great to have geopandas integration |
Summary:
Sorry for the long string of thoughts below, I needed to get it out in front of me to feel sure about an opinion, and figured it didn't hurt to post it. Also, if you have the patience to read it all, please do let me know if my understanding is wrong. Thought 1 - What is a patch & what is a patches? Right now we have Patch & Patches. In Patch, every row is a point:
This is useful when you're building, for example, area plots and your data looks like this (your "x" is the year and there are 3 sets of "y" for each energy column):
In Patches every row is a set of points:
And is useful when you have a data values associated with each shape, like a map, and you may color or label each patch with the associated value :
Both Patch & Patches support the idea that you can have a NaN as a data point and this will close a path and start a new one. In the case of a Patch, a point is associated with a data row, and the whole column generates a shape. THE POINT: I don't know quite how to say it, but I can't shake the feeling that Patch is a visualization tool - a way of more clearly seeing a series of points. Whereas a Patches has intrinsic meaning, the shape is a property of the data point. Thought 2 - lens of point, path, polygon, multipolygon ggplot2 has point, path, polygon A polygon is a closed path. Point
Path bokeh: if the data is column-wise - Line
bokeh: if the data is the row-wise - MultiLine
Polygon bokeh: if the data is the column-wise - Patch
bokeh: if the data is the row-wise - Patches
There are two further extensions:
Multi-polygon We currently handle this as follows: if the data is the column-wise - Patch
and if the data is the row-wise - Patches
What's interesting is that Bokeh copes with some pretty messy data, it will render:
even though I'm not sure the above data makes any sense. To my taste, I would rather have the row-wise specification be a list of lists:
That way you could easily do some checks and make sure every x had a corresponding y. Polygon with hole in it I'm not sure the column-wise data would ever need (could?) support the Polygon with a hole in it. I just can't imagine a table of data where that makes sense.... In the case of row-wise data, I can imagine it making sense as the shape is what's associated with the row, so it's valid to have a hole in it. Something like - using NAN to seperate:
OR - using more deeply nested structure
THE POINT: After all of that, is Patches what we want it to be? After all the above, this feels like it's exclusively about Patches (or a new glyph type) not Patch @bryevdv mentioned #183 (which reads "Right now you can specify vectors of linear, quadratic, or bezier curves independently. It would be good to be able to render single paths that have all those components together in one path.") Thinking about this as paths & polygons for now, so as not to get too attached to existing things, I see the following: For paths:
For polygons:
Obviously I'm just scribbling this down but it covers my use cases for a polygon with a hole and the "smooth path" issue (at least at a high level) Having written all this down, shoe-horning this into a ColumnDataSource doesn't seem like the right thing to do. I think to achieve this ticket, we can implement:
and still have room to implement the items from #183. I think to do this we may need to implement a Point object. As Linear/Quadratic/Bezier are different kinds of points. But we can start by implementing Linear Point. One final random tangent. In the case of a map, at least, it's normal to have data / properties associated with an entity. It would be nice to be able to link a ColumnDataSource to a SpatialDataSource in a way that can avoid having to handle the large SpatialDataSource (in particular avoid |
I should add that I can imagine how this all links together when the data is row-wise. But when the data is column-wise I cannot intuit the solution for the complex shapes. Is it fair to imagine implementing this in a more row-wise fashion, and then maybe there's a subset that can be passed as column-wise but that's really a wrapper around row-wise? |
related tickets #185 |
@AlanKavanagh is interested in this ticket. |
@bryevdv @birdsarah, I'm going to take a crack at this. I'll write up my intended approach as soon as I have one. |
Ok so does it make sense to have a patch that overlaps with itself? I would expect that to make a hole: import numpy as np
from bokeh.models import ColumnDataSource, DataRange1d, Plot
from bokeh.models.glyphs import Patch
from bokeh.io import show
x = [0, 3, 3, 0, np.nan, 1, 2, 2, 1, np.nan, 2.5, 3.5, 3.5, 2.5]
y = [0, 0, 3, 3, np.nan, 1, 1, 2, 2, np.nan, 2.5, 2.5, 3.5, 3.5]
source = ColumnDataSource(dict(x=x, y=y))
xdr = DataRange1d()
ydr = DataRange1d()
plot = Plot(
title=None, x_range=xdr, y_range=ydr, plot_width=300, plot_height=300)
glyph = Patch(x="x", y="y", fill_alpha=.25)
plot.add_glyph(source, glyph)
show(plot) |
That approach will break backward compatibility in at least some cases that I know of. It could be a valid decision to break backward compatibility but it would have to be discussed. It's also worth thinking about how this will work for the I've thought about this a bit over the years and would be happy to meet to discuss. |
There's a lot of reasons I would not support this approach, apart from compatibility breakage, it just makes too much implicit. What happens when there is partial overlap? What happens when three (four? N?) patches overlap? What if there are partially overlapping patches completely contained inside another patch? All of these cases seem very hard to figure out, especially the partial-overlap cases, if you have to start computing intersections of arbitrary polygons and dealing with points that aren't part of the original data. Or even if partial overlap doesn't count as a hoIe you still have to then detect it. What happens to otherwise internal patches with single point overlap? Or just a border? I have no idea how to explain all this to users in short succinct way. A motivating use case to keep in mind is:
If the hole is defined by some rule about overlaps in a multipatch, I don't really have any idea how to do that. It's buried somewhere implicit and anonymous in the column data and I can't get ahold of it without actually analyzing everything in the column somehow. To me that points to making holes something explicit and tangible and easily accessible in the data. This also hews close to the canvas operations, since if the hole is explicitly defined, all we have to do is "draw it backwards" and voila, there is the hole. Part of the hard problem here is that
i.e. just allowing "one level" of holes (a given patch can have multiple holes, but no counter-enclaves or counter-counter-enclaves) and just storing the data for the holes in their own columns. That will make drawing simpler, hit testing simpler, explaining simpler. No nan's used to signal anything. This would support contour plots, pretty much 100% as is I think. It's probably some sort of compromise for GIS cases, but I'm interested in a practical view on just how much of one. It seems if necessary it would always be possible to draw a "multi-patch" as multiple "hole patches" that are tagged with some identifier, that could be used to group them for color mapping or hovering or database lookups in click callbacks, or whatever. I think the only true obstacle would be real, actual counter-enclaves and I personally am absolutely willing to throw in the towel on that. (But if there are arguments otherwise, let's hear them) |
If we really want to extend this to include anti-holes inside holes, then one way could be to use
Every time the renderer encounters a |
I hadn't even thought about anti-holes but having separate Your proposal for signaling anti-holes seems very sensible to me but as you say I'd start with the simple case first and tackle that when it's needed at a later stage. |
Ok so clearly I haven't thought about this for very long and don't understand the current state or the expected behavior very well. I was thinking about things more from the perspective of someone drawing a polygon but it makes sense that the internal storage would be more explicit than I suggested above. I like the suggestion of adding a new HoleyPatch that Bryan laid out above. I will start with no nans and no anti-holes. |
I haven't thought about this in a very very long time. However, I had a fully working PR here: #3399 I not longer think that my reasons for closing that PR stand. Having very quickly reviewed last night, it was a non-breaking implementation. Other than that I'm afraid I don't have anything intelligent to say. |
I guess my thoughts are the same now as then: There are certainly downsides to the new proposal as well: notably, to add a hole in any patch, you must construct a full-length
so that
cc @jsignell indexing might be worth considering |
I suppose we could consider adding an indexed |
Using a dictionary would not allow it to take advantage of binary and base64 encoded transfers right since those are only implemented for a CDS (unless that has changed). Could |
All of that machinery is actually encapsulated in the
but that idea would need to be tested... There may also be issues around having integer keys (we can't convert numpy scalars automatically, e.g., and if they are changed on the JS side they might come back as strings...) |
Ok I will try to get the indexed holes working as suggested. |
This is commonly needed in Geographic Information Systems (GIS), there may be other uses too.
Briefly mentioned this to @pzwang and @bryevdv and it was thought to be a good idea.
It's worth noting the GeoJSON spec for this exact problem:
Polygon
http://geojson.org/geojson-spec.html#id4
Coordinates of a Polygon are an array of LinearRing coordinate arrays. The first element in the array represents the exterior ring. Any subsequent elements represent interior rings (or holes).
MultiPolygon
http://geojson.org/geojson-spec.html#id7
Coordinates of a MultiPolygon are an array of Polygon coordinate arrays:
I'd be delighted to try and implement, but need some direction in terms of implementation.
Some thoughts:
I had initially imagined modifying Patch. However, Patch currently supports MultiPolygon's by separating the list of xs or ys with a NaN (https://github.com/bokeh/bokeh/blob/master/bokeh/models/glyphs.py#L604). If we were trying to follow the GeoJSON logic, and maintain easy backward compatibility if seems like we'd need some other type of separator to signify when it was an inner ring, and then the NaN would signify the start of a new Patch.....this feels like it's getting unwieldy.
Another thing to consider is whether GIS is a special case, and it would be more appropriate to accept a list of [lon, lat] coordinates instead of splitting out xs and ys. This would take us away from the ggplot style geom_polygon and much more towards geojson. (Sidenote: the geojson spec allows for a height as well (x, y, z))
Finally, this article http://journal.r-project.org/archive/2012-2/RJournal_2012-2_Murrell2.pdf seems like it might be relevant - it discusses two different types of "holes."
My straw man offer of an implementation is:
The text was updated successfully, but these errors were encountered: