Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing voronoi example with the new 0.2.2 release #69

Closed
jorisvandenbossche opened this issue Jan 7, 2019 · 12 comments
Closed

Failing voronoi example with the new 0.2.2 release #69

jorisvandenbossche opened this issue Jan 7, 2019 · 12 comments

Comments

@jorisvandenbossche
Copy link

The geoplot release seems to have broken the geopandas examples (the voronoi one). I am getting the following error on our readthedocs build:

Unexpected failing examples:
/home/docs/checkouts/readthedocs.org/user_builds/geopandas/checkouts/latest/examples/plotting_with_geoplot.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/geopandas/checkouts/latest/examples/plotting_with_geoplot.py", line 80, in <module>
    linewidth=0)
  File "/home/docs/checkouts/readthedocs.org/user_builds/geopandas/conda/latest/lib/python3.6/site-packages/geoplot/geoplot.py", line 2133, in voronoi
    geoms = _build_voronoi_polygons(df)
  File "/home/docs/checkouts/readthedocs.org/user_builds/geopandas/conda/latest/lib/python3.6/site-packages/geoplot/geoplot.py", line 2687, in _build_voronoi_polygons
    ls = np.vstack([np.asarray(infinite_segments), np.asarray(finite_segments)])
  File "/home/docs/checkouts/readthedocs.org/user_builds/geopandas/conda/latest/lib/python3.6/site-packages/numpy/core/shape_base.py", line 234, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input arrays must have same number of dimensions
@jorisvandenbossche
Copy link
Author

The example is

ax = geoplot.voronoi(
    injurious_collisions.sample(1000),
    hue='NUMBER OF PERSONS INJURED', cmap='Reds', scheme='fisher_jenks',
    clip=boroughs.geometry,
    linewidth=0)
geoplot.polyplot(boroughs, ax=ax)

from https://geopandas.readthedocs.io/en/latest/gallery/plotting_with_geoplot.html

@ResidentMario
Copy link
Owner

I assume the build is using 0.2.3 and not 0.2.2? 0.2.2 was a briefly-lived release that I had to patch over when I realized the patch that I thought fixed the __pysal_choro thing had a fencepost error.

This looks like an unrelated failure...that codebase hasn't changed in the intervening time, so perhaps there was an update in numpy behavior in the meantime (what version is the build on?). I will look into it.

@jorisvandenbossche
Copy link
Author

Yeah, didn't yet look into detail. It's just that the readthedocs build passed few days ago, and then failed with the above error the day before yesterday, and after checking that there was a new release of geoplot in that timeframe, I assumed that might have been the cause.

@ResidentMario
Copy link
Owner

@jorisvandenbossche What version of numpy is the ReadTheDocs build using? This code succeeds with geopandas==0.4.0, geoplot==0.2.3, and numpy==1.15.4, so there appears to be something wrong with a specific (probably older?) version of numpy.

@jorisvandenbossche
Copy link
Author

The log output of readthedocs is a bit annoying, in the sense you cannot see the exact environment that is created (truncated log), so I am not fully sure.

@jorisvandenbossche
Copy link
Author

I tried to recreate a similar local environment using the same environment file as we are using on readthedocs (which installed numpy 1.15.4), but I cannot also not reproduce it ..

I triggered a re-build on readthedocs, let's see if the issue is persistent.

@jorisvandenbossche
Copy link
Author

OK, now the build succeeded .. Sorry, not sure what was going wrong before, but so ignore this!

@ResidentMario
Copy link
Owner

ResidentMario commented Jan 13, 2019

It's theoretically possible there is a non-deterministic error in the algorithm (I remember debugging this exact error message in certain cases while I was working out the procedure). If someone else reports this or if it comes up again I'll have to take a closer look.

@jorisvandenbossche
Copy link
Author

OK, figured out the reason why I only sometimes saw the failure and couldn't consistently reproduce it: it is because the example in the geopandas gallery is doing a sample, and thus each time uses a different subset of the data. And for some subsets, it triggers the error.

And with the following random state, it can be reproduced on my laptop:

import geopandas 
injurious_collisions = geopandas.read_file( 
    "https://github.com/ResidentMario/geoplot-data/raw/master/nyc-injurious-collisions.geojson")
import geoplot              
geoplot.geoplot._build_voronoi_polygons(injurious_collisions.sample(1000, random_state=8))

resulting in

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-23-19b662ca3c99> in <module>
----> 1 geoplot.geoplot._build_voronoi_polygons(injurious_collisions.sample(1000, random_state=8))

~/miniconda3/envs/latest/lib/python3.7/site-packages/geoplot/geoplot.py in _build_voronoi_polygons(df)
   2706                     ls_sorted.append(l1)
   2707 
-> 2708                 ls_sorted.append([m.tolist() for m in matches if m.tolist() not in ls_sorted][0])
   2709 
   2710             # Build and return the final polygon.

IndexError: list index out of range

@jorisvandenbossche
Copy link
Author

And depending on the random state, you can also have a different error:

In [24]: for i in range(100): 
    ...:     try: 
    ...:         geoplot.geoplot._build_voronoi_polygons(injurious_collisions.sample(1000, random_state=i)) 
    ...:     except Exception as e: 
    ...:         print(i, e) 
    ...:                                                                                                                                                                                                            
8 list index out of range
11 list index out of range
19 list index out of range
20 list index out of range
21 all the input arrays must have same number of dimensions
25 list index out of range
30 list index out of range
33 list index out of range
37 all the input arrays must have same number of dimensions
63 list index out of range
79 list index out of range
83 list index out of range
84 list index out of range
94 list index out of range
95 all the input arrays must have same number of dimensions
97 list index out of range
99 list index out of range

jorisvandenbossche added a commit to geopandas/geopandas that referenced this issue Mar 6, 2019
…rors (#933)

See ResidentMario/geoplot#69: depending on how the data are subsampled, we stumble into a potential bug in geoplot. So fixing the random state with a working one to prevent this.
jorisvandenbossche added a commit to geopandas/geopandas that referenced this issue Mar 6, 2019
…rors (#933)

See ResidentMario/geoplot#69: depending on how the data are subsampled, we stumble into a potential bug in geoplot. So fixing the random state with a working one to prevent this.

(cherry picked from commit 6dd50b8)
@ResidentMario
Copy link
Owner

Will look at this on the weekend.

The short term fix is to just set a seed for the docs build that is known to work...which, really, I should have always been doing. I see you've already done that. Sorry about that.

The long term fix is to figure out what the conditions for this failure are. That's going to be more involved because it has to do with the way the voronoi polygon construction algorithm works in certain edge cases. I was aware that the algorithm had some issues, but I thought it was constrained to edge cases involving very small samples...clearly if it's failing this much on 1000-point samples that's simply not true. 😦

@ResidentMario
Copy link
Owner

Fixed in 0.2.4. It's not clear to me why the algorithm constructs these empty geometries in certain rare cases, but I've patched the issue by skipping them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants