Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors related to using pygeos #10

Closed
robyngit opened this issue Nov 9, 2022 · 12 comments
Closed

Errors related to using pygeos #10

robyngit opened this issue Nov 9, 2022 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@robyngit
Copy link
Member

robyngit commented Nov 9, 2022

Pygeos was added as a requirement to help resolve #3. However, it appears that when GeoPandas uses pygeos under-the-hood, it causes all sorts of unexpected behaviour in other parts of the workflow that were not initially detected!

During rasterization, I'm getting the error: GEOSException: IllegalArgumentException: Argument must be Polygonal or LinearRing. This originates from the overlay operation (when we "slice" the GDF with the grid lines). (I checked, and all inputs have geom_type Polygon)

During 3D tile creation, we are seeing the error: IndexError: index 2 is out of bounds for axis 0 with size 2 from py3dtiles. This may be because the output from MultiPolygon([polygon]).wkb is different when we are using pygeos than when we are not. This output is passed to TriangleSoup.from_wkb_multipolygon, where the error is originating.

Both of the above errors are resolved by uninstalling pygeos.

Additionally, I'm seeing a new error during staging that I did not see before changes made in #3, but I'm not sure yet if it's related to pygeos, or perhaps the newer version of Shapely that we're now using. Kastan was able to get through staging without these errors, so it might be unrelated.

During staging, I'm getting an error concatenating two GDFs (with pd.concat, happens when we merge new polygons into an existing tile). The error implies that the two GDFs differ in CRS, but this is not the case: Cannot determine common CRS for concatenation inputs, got ['WGS 84 (CRS84)', 'WGS 84 (CRS84)']. Use to_crs() to transform geometries to the same CRS before merging.

(See PermafrostDiscoveryGateway/viz-staging#11 for the "Cannot determine common CRS for concatenation inputs" error during staging, which is not related to pygeos)

@robyngit robyngit added the bug Something isn't working label Nov 9, 2022
@julietcohen
Copy link
Collaborator

I created a new conda virtual environment (with python=3.9) and only installed the following packages from github:

  • pdgstaging with pip install git+https://github.com/PermafrostDiscoveryGateway/viz-staging.git
  • pdgraster with pip install git+https://github.com/PermafrostDiscoveryGateway/viz-raster.git
  • fork of py3dtiles (repo here) with pip install git+https://github.com/PermafrostDiscoveryGateway/py3dtiles.git
  • viz-3dtiles with pip install git+https://github.com/PermafrostDiscoveryGateway/viz-3dtiles.git

Then I got the staging error Robyn recorded above, so I ran pip uninstall pygeos and got the confirmation Successfully uninstalled pygeos-0.13

Restarted kernel, cleared all outputs in notebook, and got the same error during staging.

Closed remote connection to datateam server, quit VS code, and reopened connection and got the same error during staging. Used pip list --local to check that pygeos was not listed, and it was not.

julietcohen added a commit that referenced this issue Nov 19, 2022
… fix error documented in issue #10 that may be related to pygeos
@julietcohen
Copy link
Collaborator

Update:

After meeting with Robyn to discuss potential sources of error for the staging step, she identified the latest version of geopandas (0.12) which is imported with the pdgstaging as the source of the error. One of their bug fixes for this version is:

Combining GeoSeries/GeoDataFrames with pandas.concat will no longer silently override CRS information if not all inputs have the same CRS (#2056).

While the CRS's of both GeoDataFrames that we combine are the same, perhaps the CRS metadata or something else minor about the CRS is different, because they are flagged by geopandas 0.12 as being inconsistent. Uninstalling pygeos, then also uninstalling geopandas 0.12 and installing version 0.11.1 fixes the CRS error reported above! The files are staged successfully in parallel.

But a different staging error occurs with the same approach with Ingmar's data sample:

image

Both data samples are geopackages. This is something to investigate moving forward.

@julietcohen
Copy link
Collaborator

Reproducing staging error versus reproducing successful staging with different data inputs

Robyn found a way to fix to the CRS mismatch error mentioned in the first comment of this issue. See her solution (also linked in first comment) includes changing the TMS to WGS1984Quad instead of using WorldCRS84Quad.

However, I was able to run the staging step on a very small data subset (2 files that overlap spatially, provided by Robyn) without changing the TMS. Additionally, I reproduced the unique error reported above (IndexError: index 0 is out of bounds for axis 0 with size 0) with the exact same config and python environment, but with different input files. Both these workflows were run in parallel with parsl and process geopackages.

  • The error reported above that does not mention CRSs (IndexError: index 0 is out of bounds for axis 0 with size 0) can be found in this notebook. Staging summary for this file is error_staging_summary.csv.
  • The successful staging with the "old" TMS WorldCRS84Quad can be found in this notebook. Staging summary for this file is staging_summary.csv.
  • links were made with commit ID's so should persist even as documents are updated

The config and list of installed packages for these workflows is included within these notebooks just below the import chunk, as well as the following:

{
    "simplify_tolerance": 0.0001,
    "tms_id": "WorldCRS84Quad",
    "z_range": [0, 11],
    "statistics": [
        {
            "name": "polygon_count",
            "weight_by": "count",
            "property": "centroids_per_pixel",
            "aggregation_method": "sum",
            "resampling_method": "sum",
            "val_range": [0, null],
            "nodata_val": 0,
            "nodata_color": "#ffffff00",
            "palette": ["#d9c43f", "#d93fce"]
        },
        {
            "name": "coverage",
            "weight_by": "area",
            "property": "area_per_pixel_area",
            "aggregation_method": "sum",
            "resampling_method": "average",
            "val_range": [0, 1],
            "nodata_val": 0,
            "nodata_color": "#ffffff00",
            "palette": ["#d9c43f", "#d93fce"]
        }
    ],
    "deduplicate_at": ["staging"],
    "deduplicate_method": "neighbor",
    "deduplicate_keep_rules": [["staging_filename", "larger"]],
    "deduplicate_overlap_tolerance": 0.1,
    "deduplicate_overlap_both": false,
    "deduplicate_centroid_tolerance": null
  }
Package              Version
-------------------- -----------
addict               2.4.0
affine               2.3.1
asttokens            2.1.0
attrs                22.1.0
backcall             0.2.0
bcrypt               4.0.1
certifi              2022.9.24
cffi                 1.15.1
charset-normalizer   2.1.1
click                8.1.3
click-plugins        1.1.1
cligj                0.7.2
coloraide            0.18.1
colormaps            0.3
comm                 0.1.0
ConfigArgParse       1.5.3
contourpy            1.0.6
cryptography         38.0.3
cycler               0.11.0
Cython               0.29.32
dash                 2.7.0
dash-core-components 2.0.0
dash-html-components 2.0.0
dash-table           5.0.0
debugpy              1.6.3
decorator            5.1.1
dill                 0.3.6
entrypoints          0.4
executing            1.2.0
fastjsonschema       2.16.2
filelock             3.8.0
Fiona                1.8.22
Flask                2.2.2
fonttools            4.38.0
geopandas            0.11.1
globus-sdk           3.14.0
idna                 3.4
importlib-metadata   5.0.0
ipykernel            6.18.0
ipython              8.6.0
ipywidgets           8.0.2
itsdangerous         2.1.2
jedi                 0.18.1
Jinja2               3.1.2
joblib               1.2.0
jsonschema           4.17.0
jupyter_client       7.4.7
jupyter_core         5.0.0
jupyterlab-widgets   3.0.3
kiwisolver           1.4.4
laspy                2.3.0
llvmlite             0.39.1
lz4                  4.0.2
MarkupSafe           2.1.1
matplotlib           3.6.2
matplotlib-inline    0.1.6
morecantile          3.2.1
munch                2.5.0
nbformat             5.5.0
nest-asyncio         1.5.6
numba                0.56.4
numpy                1.22.4
open3d               0.16.0
packaging            21.3
pandas               1.5.1
paramiko             2.12.0
parsl                2022.11.14
parso                0.8.3
pdgpy3dtiles         0.0.1
pdgraster            0.1.0
pdgstaging           0.1.0
pexpect              4.8.0
pickleshare          0.7.5
Pillow               9.3.0
pip                  22.2.2
platformdirs         2.5.4
plotly               5.11.0
prompt-toolkit       3.0.33
psutil               5.9.4
psycopg2-binary      2.9.5
ptyprocess           0.7.0
pure-eval            0.2.2
pycparser            2.21
pydantic             1.10.2
pygeos               0.13
Pygments             2.13.0
PyJWT                2.6.0
PyNaCl               1.5.0
pyparsing            3.0.9
pyproj               3.4.0
pyquaternion         0.9.9
pyrsistent           0.19.2
python-dateutil      2.8.2
pytz                 2022.6
PyYAML               6.0
pyzmq                24.0.1
rasterio             1.3.4
requests             2.28.1
Rtree                0.9.7
scikit-learn         1.1.3
scipy                1.9.3
setproctitle         1.3.2
setuptools           65.5.0
Shapely              1.8.5.post1
six                  1.16.0
snuggs               1.4.7
stack-data           0.6.1
tblib                1.7.0
tenacity             8.1.0
threadpoolctl        3.1.0
tornado              6.2
tqdm                 4.64.1
traitlets            5.5.0
triangle             20220202
typeguard            2.13.3
typing_extensions    4.4.0
urllib3              1.26.12
viz-3dtiles          0.0.1
wcwidth              0.2.5
Werkzeug             2.2.2
wheel                0.37.1
widgetsnbextension   4.0.3
zipp                 3.10.0

@robyngit
Copy link
Member Author

@julietcohen Although you also found an IndexError, it looks like this one is being caused by something other than the pygeos library. I was able to reproduce the error thanks for your very helpful notebook! Since it occurred whether or not pygeos was installed, I moved this to its own issue in the viz-staging repo (PermafrostDiscoveryGateway/viz-staging#12)

@julietcohen
Copy link
Collaborator

@robyngit Thank you for making a new issue for this. I've been troubleshooting with no resolution so far, but I'll document what I've tried in that new issue. Based on your suggestion, I created a new environment to install my locally cloned repo viz-staging, and I'm editing TileStager.py

@robyngit
Copy link
Member Author

robyngit commented Nov 22, 2022

Rasterization GEOSException error

The GEOSException: IllegalArgumentException: Argument must be Polygonal or LinearRing error with overlay has been reported in geopandas/geopandas#2551. Upgrading GEOS from 3.11.0 to 3.11.1 on my Mac did not resolve the issue as the first commenter suggested, but switching to the pre-release Shapely version 2.0b2 does resolve the issue.

Shapely 2+ merges PyGEOS and Shapely into a single library. Geopandas encourages switching from PyGEOS to Shapely 2 since PyGEOS will no longer be supported in future versions. For this reason, I think that requiring the new version of Shapely is a good solution to this part of the issue.

3D Tiles IndexError

Using Shapely 2 does not resolve the issue in generating 3D tiles.

Here is a minimal example where we can reproduce the error:

this example causes the same error because we haven't added a third (z) dimension to the polygon
from shapely.geometry import box, MultiPolygon
from py3dtiles import TriangleSoup
multipolygon = MultiPolygon([box(0, 0, 1, 1)])
ts = TriangleSoup.from_wkb_multipolygon(multipolygon.wkb)
# IndexError: index 2 is out of bounds for axis 0 with size 2

@robyngit
Copy link
Member Author

Update on the 3D Tiles error

reproducing the error:

  1. Install py3dtiles, shapely < 2, and pygeos:
pip install git+https://github.com/PermafrostDiscoveryGateway/py3dtiles geopandas==0.12.1 pygeos==0.13 shapely==1.8.5.post1
  1. Run this test script, which creates a 3D box (with z-coordinate 10), re-projects it to the Cesium EPSG, and then "triangulates" the polygon as we do when generating our 3D tiles:
# test.py
from shapely.geometry import box, Polygon, MultiPolygon
from py3dtiles import TriangleSoup
import geopandas as gpd

# print(gpd.show_versions())

polygon = box(0, 0, 1, 1)
g = gpd.GeoDataFrame(geometry = [polygon], crs=4326)
g['geometry'] = g['geometry'].apply(lambda p: Polygon([(x, y, 10) for x, y in p.exterior.coords]))
g = g.to_crs(epsg=4978)
print(g) # <- ⛔️ POLYGON Z without pygeos; POLYGON with pygeos
polygon=Polygon(g["geometry"][0])
multipolygon = MultiPolygon([polygon])
multipolygon_wkb = multipolygon.wkb
ts = TriangleSoup.from_wkb_multipolygon(multipolygon_wkb)

print('🎉 SUCCESS!')
  1. See the output, where the reprojected Geodataframe contains a POLYGON rather than the expected POLYGON Z (so it has 2 dimensions rather than three), and for that reason, the conversion fails:
                                            geometry
0  POLYGON ((6377165.57884 111313.83924, 6376200....
Traceback (most recent call last):
 ....
packages/py3dtiles/wkb_utils.py", line 210, in triangulate
    (curr_edge[1] - next_edge[1]) * (next_edge[2] + curr_edge[2]),
IndexError: index 2 is out of bounds for axis 0 with size 2
  1. uninstall pygeos and run the test script again again.
pip uninstall pygeos
python test.py
  1. See that now we do get a POLYGON Z and the transformation happens successfully:
                                            geometry
0  POLYGON Z ((6377175.57732 111314.01376 0.00000...
🎉 SUCCESS!

Notes

@robyngit
Copy link
Member Author

From the docs:

The use of PyGEOS is experimental! Although it is passing all tests, there might still be issues and not all functions of GeoPandas will already benefit from speedups (one known issue: the to_crs coordinate transformations lose the z coordinate)

The issue geopandas/geopandas#1345 confirms that this problem persists with Shapely >= 2.0, but it seems the developers are working on a fix: geopandas/geopandas#2641

@robyngit robyngit self-assigned this Nov 23, 2022
robyngit added a commit to PermafrostDiscoveryGateway/viz-3dtiles that referenced this issue Nov 23, 2022
robyngit added a commit to PermafrostDiscoveryGateway/viz-3dtiles that referenced this issue Nov 23, 2022
robyngit added a commit to PermafrostDiscoveryGateway/viz-staging that referenced this issue Nov 23, 2022
robyngit added a commit to PermafrostDiscoveryGateway/viz-raster that referenced this issue Nov 23, 2022
@robyngit
Copy link
Member Author

robyngit commented Nov 23, 2022

For viz_3dtiles, found a work around that transforms the geometries themselves using shapely, rather than converting the CRS with geopandas, where we are somehow losing the z-coordinate. I've tested this method with a small sample and everything looks good.

In the other PDG packages, I removed the requirement for pygeos, and upgraded version requirements to geopandas >= 0.12 and shapely shapely >= 2.0b2.

Changes in all packages are in branches named feature-shapely2-support

Left TODO:

  • Test all changes with a larger data sample (e.g. part of IWP)
  • Measure time difference in creating 3D tiles (is it slower than before?)

@robyngit
Copy link
Member Author

Updates to viz_3dtiles and Shapely 2+ & GeoPandas 0.12+ requirements are merged into the main branches now. The official shapely 2.0 release is expected to come in 1-2 weeks, so the timing is good to have our packages ready for those changes.

robyngit added a commit to PermafrostDiscoveryGateway/viz-3dtiles that referenced this issue Dec 12, 2022
- Revert some changes from commit eb92102, which was a workaround for the fix released in geopandas 0.12.2
- Change requirements to officialy shapely 2 release, and geopandas 0.12.2

Relates to PermafrostDiscoveryGateway/viz-workflow#10
robyngit added a commit to PermafrostDiscoveryGateway/viz-staging that referenced this issue Dec 12, 2022
robyngit added a commit to PermafrostDiscoveryGateway/viz-raster that referenced this issue Dec 12, 2022
@robyngit
Copy link
Member Author

Although this issue is closed, I wanted to note that geopandas just released a fix for the workaround I described above, and shapely officially released version 2. There are PRs in all three viz-* packages to reflect these changes, including reverting the workaround to make use of the geopandas fix.

@julietcohen
Copy link
Collaborator

Those 3 PRs have now been merged. Nice job keeping up with geopandas and shapely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants