Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patches with Holes #2321

Closed
birdsarah opened this issue May 20, 2015 · 20 comments · Fixed by #8340
Closed

Patches with Holes #2321

birdsarah opened this issue May 20, 2015 · 20 comments · Fixed by #8340

Comments

@birdsarah
Copy link
Member

This is commonly needed in Geographic Information Systems (GIS), there may be other uses too.

Briefly mentioned this to @pzwang and @bryevdv and it was thought to be a good idea.

It's worth noting the GeoJSON spec for this exact problem:

Polygon

http://geojson.org/geojson-spec.html#id4

Coordinates of a Polygon are an array of LinearRing coordinate arrays. The first element in the array represents the exterior ring. Any subsequent elements represent interior rings (or holes).

No holes:

{ "type": "Polygon",
    "coordinates": [
      [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]
      ]
   }

With holes:

{ "type": "Polygon",
    "coordinates": [
      [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ],
      [ [100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2] ]
      ]
   }

MultiPolygon

http://geojson.org/geojson-spec.html#id7

Coordinates of a MultiPolygon are an array of Polygon coordinate arrays:

{ "type": "MultiPolygon",
    "coordinates": [
      [[[102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0], [102.0, 2.0]]],
      [[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0]],
       [[100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2]]]
   ]
}

I'd be delighted to try and implement, but need some direction in terms of implementation.

Some thoughts:

I had initially imagined modifying Patch. However, Patch currently supports MultiPolygon's by separating the list of xs or ys with a NaN (https://github.com/bokeh/bokeh/blob/master/bokeh/models/glyphs.py#L604). If we were trying to follow the GeoJSON logic, and maintain easy backward compatibility if seems like we'd need some other type of separator to signify when it was an inner ring, and then the NaN would signify the start of a new Patch.....this feels like it's getting unwieldy.

Another thing to consider is whether GIS is a special case, and it would be more appropriate to accept a list of [lon, lat] coordinates instead of splitting out xs and ys. This would take us away from the ggplot style geom_polygon and much more towards geojson. (Sidenote: the geojson spec allows for a height as well (x, y, z))

Finally, this article http://journal.r-project.org/archive/2012-2/RJournal_2012-2_Murrell2.pdf seems like it might be relevant - it discusses two different types of "holes."

My straw man offer of an implementation is:

  • implement Patch/Patches to accept a new separator to signify a hole
  • implement GeoPolygon & GeoMultiPolygon Glyphs which accept input that looks much more like GeoJSON but use Patch/Patches under the hood. (This could pave the way for implementing the other the GeoJSON types Point/MultiPoint and Line/MultiLine)
@RutgerK
Copy link

RutgerK commented May 26, 2015

I already shared some thoughts on this with Sarah over the mail, i'll put the gist of it here as well.

First of all, i don't think geographic patches/polygons are fundamentally different from non-geographic ones. So the same interface could be used for both, and i would avoid any 'geo' in the function names since it suggests something special, if all it does is plot coordinates on a plane. In my opinion the only reason to implement a new method/glyph is to preserve the API of the current one.

Using an existing specification like GeoJSON seems like a good idea, improving interoperability with other systems and software compared to a Bokeh-specific syntax. Apart from GeoJSON, Well-Known-Text could also be considered. OGR (GDAL) geometry objects for example have both an ExportToJson() and Wkt() method, the same goes for Shapely etc..
http://en.wikipedia.org/wiki/Well-known_text#Geometric_objects

One benefit over GeoJSON that i know of is the possibility of curved geometries, but i don't know if that's something Bokeh should support.

Alternatively some low-level mapping of the HTML5 canvas drawing method's could also be considered, using statements like LineTo, MoveTo, ArcTo etc. Its not the most user-friendly, but very flexible. Its also very similar to how Matplotlib's Path's work (http://matplotlib.org/users/path_tutorial.html). Other formats like SVG, postscript use it as well. Here is an example how in Matplotlib you can plot like that, and converting from WKT. You basically get, for each polygon, a long array of vertices and codes, like:

(x0,y0), MoveTO # start of a ring
(x1,y1), LineTO
(x2,y2), LineTO
(x3,y3), LineTO
(x4,y4), MoveTo # next ring
(x5,y5), LineTO

http://nbviewer.ipython.org/gist/RutgerK/e6d6dd99701b444427ff

@bryevdv
Copy link
Member

bryevdv commented May 26, 2015

A few quick thoughts:

Smooth paths is a long-standing open feature issue: #183

I would definitely like to support them in some way. It should not actually be that hard, the question is what is the right way to transport the data since it does not really fit the ColumnDataSource model.

Regarding richer representation for polygons, I think it is exactly the same situation. So now is maybe a good time to hash out the details of a way forward. Here are the options I see:

  1. Encapsulate paths, rich polygons, etc, in their own objects, and let these objects be individual items in columns in existing ColumnDataSources.

  2. Create a new type of DataSource for these glyphs, that directly holds this more specialized data. I would imagine the new DataSource to really just be a namespace/dict that names path or polygon or geoJSON or whatever objects for glyphs to use.

I'm not sure I have a strong preference either way. Thoughts?

Bryan

@datnamer
Copy link

Here are some R packages for GIS and interactive JS mapping. Perhaps they will yield some useful ideas:

https://github.com/dkahle/ggmap
https://rstudio.github.io/leaflet/
https://ropensci.org/tutorials/lawn_tutorial.html

EDIT: Also would be great to have geopandas integration

@birdsarah
Copy link
Member Author

Summary:

Sorry for the long string of thoughts below, I needed to get it out in front of me to feel sure about an opinion, and figured it didn't hurt to post it. Also, if you have the patience to read it all, please do let me know if my understanding is wrong.


Thought 1 - What is a patch & what is a patches?

Right now we have Patch & Patches.

In Patch, every row is a point:

x y
1 1
2 2
3 3
1 1

This is useful when you're building, for example, area plots and your data looks like this (your "x" is the year and there are 3 sets of "y" for each energy column):

year oil gas solar
2010 1 3 0
2011 2 2 1
2012 3 2 1
2013 1 3 1

In Patches every row is a set of points:

xs ys
[1, 2, 3, 1] [1, 2, 3, 1]
[2, 4, 6, 2] [2, 4, 6, 2]
[3, 6, 9, 3] [ 3, 6, 9, 3]
[1, -1, -2, 0] [1, 4, 0, 2]

And is useful when you have a data values associated with each shape, like a map, and you may color or label each patch with the associated value :

xs ys value
[1, 2, 3, 1] [1, 2, 3, 1] 10%
[2, 4, 6, 2] [2, 4, 6, 2] 20%
[3, 6, 9, 3] [ 3, 6, 9, 3] 10%
[1, -1, -2, 0] [1, 4, 0, 2] 15%

Both Patch & Patches support the idea that you can have a NaN as a data point and this will close a path and start a new one.

In the case of a Patch, a point is associated with a data row, and the whole column generates a shape.
In the case of a Patches, a shape is associated with a data row, and the whole column generates a picture.

THE POINT: I don't know quite how to say it, but I can't shake the feeling that Patch is a visualization tool - a way of more clearly seeing a series of points. Whereas a Patches has intrinsic meaning, the shape is a property of the data point.


Thought 2 - lens of point, path, polygon, multipolygon

ggplot2 has point, path, polygon

A polygon is a closed path.

Point
ggplot2: Point
bokeh: Marker

x y
1 1
2 2
3 3
1 1

Path
ggplot2: Path

bokeh: if the data is column-wise - Line

x y
1 1
2 2
3 3
1 1

bokeh: if the data is the row-wise - MultiLine

xs ys
[1, 2, 3, 1] [1, 2, 3, 1]
[2, 4, 6, 2] [2, 4, 6, 2]
[3, 6, 9, 3] [ 3, 6, 9, 3]
[1, -1, -2, 0] [1, 4, 0, 2]

Polygon
ggplot2: Polygon

bokeh: if the data is the column-wise - Patch

x y
1 1
2 2
3 3
1 1

bokeh: if the data is the row-wise - Patches

xs ys
[1, 2, 3, 1] [1, 2, 3, 1]
[2, 4, 6, 2] [2, 4, 6, 2]
[3, 6, 9, 3] [ 3, 6, 9, 3]
[1, -1, -2, 0] [1, 4, 0, 2]

There are two further extensions:

  • Multi polygon, the notion of distinct but associated polygons.
  • The ability for a Polygon to have a hole in it.

Multi-polygon

We currently handle this as follows:

if the data is the column-wise - Patch

x y
1 1
2 2
3 NaN
1 1
2 2

and if the data is the row-wise - Patches

xs ys
[1, 2, 3, 1, NAN, 1, 2, 3, 1] [1, 2, 3, 1, NAN, 1, 2, 3, 1]
[2, 4, NAN, 2, 2, 4, 6, 2] [2, 4, 6, 2, 2, 4, NAN, 2]
[3, 6, 9, 3, 3, 6, 9, 3] [3, 6, 9, 3, 3, 6, 9, 3]
[1, -1, -2, 0, 1, -1, -2, 0] [1, 4, 0, 2, 1, -1, -2, 0]

What's interesting is that Bokeh copes with some pretty messy data, it will render:

xs ys
[1, 2, 3, 1, NAN, 1, 2, 3, 1] [1, 2, 3, 1, NAN, 1, 2, 3, 1]
[2, 4,] [2, 4, 6, 2, 2, 4, NAN, 2]
[3, 6, 9, 3, 3, 6, 9, 3] [ 3, 6, 9]
[1, -1, -2, 0, 1, -1, -2, 0] [1, 4, 0, NAN, NAN, NAN]

even though I'm not sure the above data makes any sense.

To my taste, I would rather have the row-wise specification be a list of lists:

xs ys
[[1, 2, 3, 1], [1, 2, 3, 1]] [[3, 6, 9, 3], [3, 6, 9, 3]]
[[1, 2, 2, 1], [1, 2, 5, 1]] [[1, 1, 3, 1], [1, 2, 3, 0]]
[[3, 6, 9, 3], [3, 6, 9, 3]] [[1, 2, 3, 1], [1, 2, 3, 1]]

That way you could easily do some checks and make sure every x had a corresponding y.

Polygon with hole in it

I'm not sure the column-wise data would ever need (could?) support the Polygon with a hole in it. I just can't imagine a table of data where that makes sense....

In the case of row-wise data, I can imagine it making sense as the shape is what's associated with the row, so it's valid to have a hole in it. Something like - using NAN to seperate:

xs ys
[[1, 2, 3, 1, NAN, 0.5, 1, 1.5, 0.5], [1, 2, 3, 1]] [[3, 6, 9, 3, NAN, 1.5, 4, 1.5, 1.5], [3, 6, 9, 3]]
[[1, 2, 2, 1], [1, 2, 5, 1]] [[1, 1, 3, 1], [1, 2, 3, 0]]

OR - using more deeply nested structure

xs ys
[ [ [1, 2, 3, 1], [0.5, 1, 1.5, 0.5] ], [ [1, 2, 3, 1] ] ] [ [ [3, 6, 9, 3], [1.5, 4, 1.5, 1.5] ], [ [3, 6, 9, 3] ] ]
[ [ [1, 2, 2, 1] ] , [ [1, 2, 5, 1] ] ] [ [ [1, 1, 3, 1] ] , [ [1, 2, 3, 0] ] ]

THE POINT: After all of that, is Patches what we want it to be?


After all the above, this feels like it's exclusively about Patches (or a new glyph type) not Patch


@bryevdv mentioned #183 (which reads "Right now you can specify vectors of linear, quadratic, or bezier curves independently. It would be good to be able to render single paths that have all those components together in one path.")

Thinking about this as paths & polygons for now, so as not to get too attached to existing things, I see the following:

For paths:

  • path(points=[<list_of_points>], type="linear/quadratic/bezier", line_props={})
  • complex_path(paths=[<list_of_paths>]) - this joins paths together but has one start & one end, having the line_props on an individual path would allow you to construct a path with changing line props
  • multi_path(paths=[]) - this has multiple starts & ends

For polygons:

  • polygon(points=[<list_of_points>], path_type="linear/quadratic/bezier", mask=True/False, line_props={}, fill_props={}) - closed version of path, here we can specify whether it's a mask (as per svg or canvas masks)
  • complex_polygon(polygons=[<list_of_polygons>]) - lets you build up from additive & destructive - in the case of an area with a hole, this would be an additive path, followed be a desctructive path)
  • multi_polygon(polygons=[<list_of_polygons_or_complex_polygons>]) - i'm not sure this is necessary or what you could achieve that would be different from complex_polygon

Obviously I'm just scribbling this down but it covers my use cases for a polygon with a hole and the "smooth path" issue (at least at a high level)

Having written all this down, shoe-horning this into a ColumnDataSource doesn't seem like the right thing to do.

I think to achieve this ticket, we can implement:

  • Path, Polygon, MultiPolygon

and still have room to implement the items from #183. I think to do this we may need to implement a Point object. As Linear/Quadratic/Bezier are different kinds of points. But we can start by implementing Linear Point.


One final random tangent. In the case of a map, at least, it's normal to have data / properties associated with an entity.
One thing that I noticed while building the Water map is that the spatial data wasn't changing even when the properties were
e.g. a value over time for a given place.

It would be nice to be able to link a ColumnDataSource to a SpatialDataSource in a way that can avoid having to handle the large SpatialDataSource (in particular avoid
having to send it to a server and back).

@birdsarah
Copy link
Member Author

I should add that I can imagine how this all links together when the data is row-wise. But when the data is column-wise I cannot intuit the solution for the complex shapes.

Is it fair to imagine implementing this in a more row-wise fashion, and then maybe there's a subset that can be passed as column-wise but that's really a wrapper around row-wise?

@birdsarah
Copy link
Member Author

related tickets #185

@birdsarah birdsarah added this to the short-term milestone Jul 7, 2015
@birdsarah
Copy link
Member Author

@AlanKavanagh is interested in this ticket.

@jsignell
Copy link
Contributor

jsignell commented Oct 9, 2018

@bryevdv @birdsarah, I'm going to take a crack at this. I'll write up my intended approach as soon as I have one.

@jsignell
Copy link
Contributor

jsignell commented Oct 9, 2018

Ok so does it make sense to have a patch that overlaps with itself? I would expect that to make a hole:

screen shot 2018-10-09 at 4 54 04 pm

import numpy as np

from bokeh.models import ColumnDataSource, DataRange1d, Plot
from bokeh.models.glyphs import Patch
from bokeh.io import show

x = [0, 3, 3, 0, np.nan, 1, 2, 2, 1, np.nan, 2.5, 3.5, 3.5, 2.5]
y = [0, 0, 3, 3, np.nan, 1, 1, 2, 2, np.nan, 2.5, 2.5, 3.5, 3.5]
source = ColumnDataSource(dict(x=x, y=y))

xdr = DataRange1d()
ydr = DataRange1d()

plot = Plot(
    title=None, x_range=xdr, y_range=ydr, plot_width=300, plot_height=300)

glyph = Patch(x="x", y="y", fill_alpha=.25)
plot.add_glyph(source, glyph)
show(plot)

@philippjfr
Copy link
Contributor

That approach will break backward compatibility in at least some cases that I know of. It could be a valid decision to break backward compatibility but it would have to be discussed.

It's also worth thinking about how this will work for the patches glyph.

I've thought about this a bit over the years and would be happy to meet to discuss.

@bryevdv
Copy link
Member

bryevdv commented Oct 9, 2018

There's a lot of reasons I would not support this approach, apart from compatibility breakage, it just makes too much implicit. What happens when there is partial overlap? What happens when three (four? N?) patches overlap? What if there are partially overlapping patches completely contained inside another patch? All of these cases seem very hard to figure out, especially the partial-overlap cases, if you have to start computing intersections of arbitrary polygons and dealing with points that aren't part of the original data. Or even if partial overlap doesn't count as a hoIe you still have to then detect it. What happens to otherwise internal patches with single point overlap? Or just a border? I have no idea how to explain all this to users in short succinct way.

A motivating use case to keep in mind is:

I have a patch, it has a hole in the center. I want to update just the hole. How do I do that?

If the hole is defined by some rule about overlaps in a multipatch, I don't really have any idea how to do that. It's buried somewhere implicit and anonymous in the column data and I can't get ahold of it without actually analyzing everything in the column somehow.

To me that points to making holes something explicit and tangible and easily accessible in the data. This also hews close to the canvas operations, since if the hole is explicitly defined, all we have to do is "draw it backwards" and voila, there is the hole.

Part of the hard problem here is that Patches allows "multi-patches" split by nans. That was probably not a good choice to begin with on my part, but it's where we are today. If it would simplify things I would consider just making an entirely new glyph for "Patches (but not multi-patches) with Holes" For that we could even consider something like:

plot.holey_patches(
    xs=[ [p1], [p2], ... ], 
    ys=[ [p1], [p2], ... ], 
    hole_xs=[ [[p1h1], [p1h2], ...], [[p2h1]], ... ], 
    hole_ys=[ [[p1h1], [p1h2], ...], [[p2h1]], ... ]
)

i.e. just allowing "one level" of holes (a given patch can have multiple holes, but no counter-enclaves or counter-counter-enclaves) and just storing the data for the holes in their own columns. That will make drawing simpler, hit testing simpler, explaining simpler. No nan's used to signal anything.

This would support contour plots, pretty much 100% as is I think. It's probably some sort of compromise for GIS cases, but I'm interested in a practical view on just how much of one. It seems if necessary it would always be possible to draw a "multi-patch" as multiple "hole patches" that are tagged with some identifier, that could be used to group them for color mapping or hovering or database lookups in click callbacks, or whatever. I think the only true obstacle would be real, actual counter-enclaves and I personally am absolutely willing to throw in the towel on that. (But if there are arguments otherwise, let's hear them)

@bryevdv
Copy link
Member

bryevdv commented Oct 9, 2018

If we really want to extend this to include anti-holes inside holes, then one way could be to use nans to signal divisions, but only in hole_xs and hole_ys, e.g.:

                             |
                             |
            <-- holes --->   v   <----- anti-holes ------>
hole_xs=[ [ [p1h1], [p1h2], nan, [p1ah1], [p1ah2], [p1ah3], ... ] ... ], 
          <=================== for patch 1 =====================>

Every time the renderer encounters a nan it just switches direction, so that is easy enough to implement. Hit testing becomes a bit more involved. It is explainable, if tedious to construct. That said, we could start with the version above, and always add this later without breaking compatibility.

@philippjfr
Copy link
Contributor

I hadn't even thought about anti-holes but having separate hole_xs and hole_ys columns is also pretty much where I ended up as one of the few viable approaches. The problem with nans in patches is indeed problematic but since this is a new feature I don't think it would be too problematic to say that if you want holes then you shouldn't also use multi-patches split by nans.

Your proposal for signaling anti-holes seems very sensible to me but as you say I'd start with the simple case first and tackle that when it's needed at a later stage.

@jsignell
Copy link
Contributor

Ok so clearly I haven't thought about this for very long and don't understand the current state or the expected behavior very well. I was thinking about things more from the perspective of someone drawing a polygon but it makes sense that the internal storage would be more explicit than I suggested above. I like the suggestion of adding a new HoleyPatch that Bryan laid out above. I will start with no nans and no anti-holes.

@birdsarah
Copy link
Member Author

I haven't thought about this in a very very long time.

However, I had a fully working PR here: #3399

I not longer think that my reasons for closing that PR stand. Having very quickly reviewed last night, it was a non-breaking implementation.

Other than that I'm afraid I don't have anything intelligent to say.

@bryevdv
Copy link
Member

bryevdv commented Oct 10, 2018

I guess my thoughts are the same now as then: Patches is already almost overly complicated, and the format is already tedious and difficult to use. I'd rather dump the new features in a new glyph that sheds some of that complexity, and keeps the format two levels of nesting throughout. It's also not clear to me if the old PR supports multiple holes per patch, every example and comment there mentions only a single hole.

There are certainly downsides to the new proposal as well: notably, to add a hole in any patch, you must construct a full-length hole_xs. If we wanted to work around that we could instead make holes indexed:

hole_xs = Dict(Int, List(List(Float))))

so that

hole_xs = { 10: [ [...], ...] } # x coords for holes for patch 10

cc @jsignell indexing might be worth considering

@bryevdv
Copy link
Member

bryevdv commented Oct 10, 2018

I suppose we could consider adding an indexed hole_xs to the existing Patches glyph? If not present it is just ignored. If present, the implementation simply draws those holes in reverse direction, after the top level patch (which might have nans/be a multi patch) is drawn.

@philippjfr
Copy link
Contributor

Using a dictionary would not allow it to take advantage of binary and base64 encoded transfers right since those are only implemented for a CDS (unless that has changed). Could figure.patches instead accept a holes dict of the kind you describe and expand that into columns appropriately?

@bryevdv
Copy link
Member

bryevdv commented Oct 10, 2018

All of that machinery is actually encapsulated in the ColumnData property type, and it's associated customized property descriptor. So, then this might just work out of the box:

hole_xs = ColumnData(Int, Seq(Any))
hole_ys = ColumnData(Int, Seq(Any))

but that idea would need to be tested... There may also be issues around having integer keys (we can't convert numpy scalars automatically, e.g., and if they are changed on the JS side they might come back as strings...)

@jsignell
Copy link
Contributor

Ok I will try to get the indexed holes working as suggested.

@bryevdv bryevdv modified the milestones: short-term, 1.0 Oct 18, 2018
@bryevdv bryevdv changed the title Support a "Patch" with a hole in it / GIS support Patches with Holes Oct 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

7 participants