Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes, geojson2coco improvements, restrict_by_aoi, and nodata filling #331

Merged
merged 28 commits into from
Apr 3, 2020
Merged

Bug fixes, geojson2coco improvements, restrict_by_aoi, and nodata filling #331

merged 28 commits into from
Apr 3, 2020

Conversation

rbavery
Copy link
Contributor

@rbavery rbavery commented Feb 3, 2020

Description

This PR addresses the following issues:

#327 by allowing an option to set areas outside an aoi_boundary to the src image's nodata value when tiling.

#328 by adding a method to the RasterTiler class that fills tiles with the mean of the src image. If this is run after restrict_by_aoi, all nodata values outside of the aoi are filled, along with other nodata values in the image.

geojson2coco would fail if the geodataframe input had any MultiPolygon or GeometryCollection types. I added an optional argument to ignore these when converting the geodataframe to COCO format

There was a bug in the raster_tiler's use of the split_geom function, where the aoi boundary was not intersected with the src_image boundary prior to tiling, causing many nodata areas outside the src_img boundary to be tiled.

EDIT: Here's a list of all changes, from the CHANGELOG:

Changed

  • 20200401, rbavery: new tiler method fill_all_nodata to fill nodata with "mean" or custom value ([FEATURE]: Handling nodata values generally, should we toss, fill, set to 0? #328)
  • 20200401, rbavery: option to ignore MultiPolygon and GeometryCollection types in geojson2coco since these cannot be converted to COCO.
  • 20200401, rbavery: new function solaris.vector.mask.geojsons_to_masks_and_fill_nodata, which rasterized vector labels according to raster tile extents. Fills nodata areas in raster tile and corresponding rasterized label raster.
  • 20200401, rbavery: new test/example of tiling and creating instance masks with nodata values filled in tile outputs

Fixed

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected - these changes will not be merged until major releases!)

Checklist:

  • My PR has a descriptive title
  • My code follows PEP8
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new errors
  • I have added tests that prove my fix is effective or that my feature works
  • My PR passes Travis CI tests
  • My PR does not reduce coverage in Codecov

If your PR does not fulfill all of the requirements in the checklist above, that's OK! Just prepend [WIP] to the PR title until they are all satisfied. If you need help, @-mention a maintainer and/or add the Status: Help Needed label.

@nrweir
Copy link
Contributor

nrweir commented Feb 11, 2020

Hi @rbavery,

Thanks for working on all of this stuff.

Just as a note, the CosmiQ team is currently working with counsel to ensure that solaris doesn't conflict with this new US export control guideline. I won't be able to merge PRs until we get the all-clear (hopefully in the next week or so). I'll post another comment here once that's clarified on our end and we can resume open source work.

@nrweir
Copy link
Contributor

nrweir commented Feb 11, 2020

Another thought: since geopandas will be switching to pyproj.CRS objects for tracking CRSs in the 0.7.0 release (see here), I'm thinking these may end up being a better object to track CRS with through solaris rather than the rasterio ones. This is a pain given all the hard work you just put into implementing the rasterio ones though. Thoughts?

@rbavery
Copy link
Contributor Author

rbavery commented Feb 12, 2020

Oof (wrt to that interim rule). No problem though.

I saw that update from geopandas, it's good news overall. I think it makes sense to for vectors to internally use pyproj.CRS. But rasterio uses it's own rasterio.crs.CRS class. I'm not sure if we can consistently use the same CRS object for both rasters and vectors at all points in the code. And it's unclear to me if reprojecting geopandas gdf's using a rasterio.crs.CRS will still work, so I've put out a question on twitter. But in any case where it makes sense to revamp the CRS handling again I can probably get to it.

@nrweir
Copy link
Contributor

nrweir commented Feb 12, 2020

Yeah, that makes sense. As far as I can tell (from trying myself), geopandas gdfs can't handle rasterio CRS objects.

I'm about to start some work that will require me to get a bunch of stuff resolved (including CRSs in vector objects), and word on the street is that we'll be able to start pushing code up here again very soon. I'll create an issue for vector CRSs and assign myself to it if I start working on it.

@rbavery
Copy link
Contributor Author

rbavery commented Feb 12, 2020

Sounds good. Joris, one of the geopandas maintainers, says that geopandas gdfs should be able to handle the rasterio crs class since it has a to_wkt method. https://twitter.com/jorisvdbossche/status/1227492555811504129

@nrweir
Copy link
Contributor

nrweir commented Mar 5, 2020

Hey @rbavery. I spent a bunch of time digging into what works well and what doesn't for interconverting between CRSs, and I think I've decided that passing around pyproj CRS objects is going to work better. Check out this gist for an example. I'm going to start working towards passing everything around as pyproj objects.

@rbavery
Copy link
Contributor Author

rbavery commented Mar 6, 2020

@nrweir Got it sounds good. I read the rasterio groups thread, it make sense to standardize on the pyproj class. I'll have time next week to work on wrapping up this PR with some tests. Would you prefer I wait and rebase off of your pyproj CRS updates? In any case, I'll remove this change I made in this PR since I think the missing CRS issue I was having might be fixed by your update.

... I wasn't able to save a geopandas geodataframe with a CRS that didn't have an EPSG code, which caused it to default to 4326. I added an optional argument to override the CRS in the geojson_to_px function with the CRS of the src image.

@nrweir
Copy link
Contributor

nrweir commented Mar 6, 2020

@rbavery that'd be great (rebasing off the new version). I'm going to do a 0.2.2 release today.

…uts, test for instance masking, handling necessary use of rasterio crs in _check_crs
@codecov
Copy link

codecov bot commented Apr 2, 2020

Codecov Report

Merging #331 into dev will decrease coverage by 2.61%.
The diff coverage is 80.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #331      +/-   ##
==========================================
- Coverage   67.17%   64.56%   -2.62%     
==========================================
  Files          72       72              
  Lines        5204     5328     +124     
==========================================
- Hits         3496     3440      -56     
- Misses       1708     1888     +180     
Impacted Files Coverage Δ
setup.py 0.00% <0.00%> (ø)
solaris/raster/image.py 53.00% <ø> (+0.05%) ⬆️
tests/test_nets/test_datagen.py 100.00% <ø> (ø)
solaris/vector/polygon.py 79.35% <45.45%> (-3.76%) ⬇️
solaris/eval/challenges.py 84.74% <50.00%> (-2.98%) ⬇️
solaris/data/coco.py 73.70% <62.50%> (-0.69%) ⬇️
solaris/tile/raster_tile.py 67.96% <65.62%> (-0.83%) ⬇️
solaris/tile/vector_tile.py 77.66% <75.00%> (-0.55%) ⬇️
solaris/utils/core.py 76.62% <88.88%> (+3.06%) ⬆️
solaris/utils/geo.py 70.19% <89.47%> (+3.53%) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94ac29c...b64a097. Read the comment docs.

@rbavery
Copy link
Contributor Author

rbavery commented Apr 2, 2020

Hey @nrweir, thanks for the patience, I think this PR is finally ready for review! Instead of converting pyproj.CRs to rasterio.crs.CRS using epsg codes I followed @snowman2's advice and used this code snippet he suggested:

import rasterio
if LooseVersion(rasterio.__gdal_version__) >= LooseVersion("3.0.0"):
    rio_crs = rasterio.crs.CRS.from_wkt(pyproj_crs.to_wkt())
else:
    rio_crs = rasterio.crs.CRS.from_wkt(pyproj_crs.to_wkt("WKT1_GDAL"))

When I ran your gist with this if/else, it worked to properly convert where it failed before.

@nrweir nrweir mentioned this pull request Apr 2, 2020
12 tasks
@nrweir
Copy link
Contributor

nrweir commented Apr 2, 2020

@rbavery thanks, this looks awesome - I'll review today.

Copy link
Contributor

@nrweir nrweir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great additional functionality and cleanup of some of the existing problems. A few minor changes requested.

solaris/eval/base.py Outdated Show resolved Hide resolved
solaris/tile/raster_tile.py Outdated Show resolved Hide resolved
solaris/data/coco.py Show resolved Hide resolved
solaris/tile/raster_tile.py Show resolved Hide resolved
solaris/tile/vector_tile.py Outdated Show resolved Hide resolved
solaris/utils/core.py Outdated Show resolved Hide resolved
solaris/utils/geo.py Show resolved Hide resolved
solaris/vector/mask.py Show resolved Hide resolved
solaris/vector/mask.py Show resolved Hide resolved
@rbavery
Copy link
Contributor Author

rbavery commented Apr 2, 2020

Can you hold off on merging @nrweir ? I found that their is still an issue when using geojson_to_px_gdf, where if the geojsons were saved with a CRS with no EPSG code, the function fails to load any projection information. I need to add the override_crs option back in. I thought this would be solved by the other CRS updates we made, but it looks like geojsons may only be able to store EPSG code CRS information?

The vector_tiler maintains this CRS info right up until the gdfs are saved, so I think it is an issue with the geojson format.

@rbavery
Copy link
Contributor Author

rbavery commented Apr 2, 2020

Ok, I think this should be good to go, if the way I addressed the reviews makes sense.

Copy link
Contributor

@nrweir nrweir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more small change to the assert statement in sol.tile.vector_tile and we're good to go! Thanks for all of your hard work.

@@ -103,6 +103,7 @@ def tile(self, src, tile_bounds, tile_bounds_crs=None, geom_type='Polygon',
output_ext))
self.tile_paths.append(dest_path)
if len(tile_gdf) > 0:
assert tile_gdf.crs is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this. Can we add an expression for what the error will tell users? e.g.,

if len(tile_gdf) > 0:
    assert tile_gdf.crs is not None, "No CRS was found. A CRS must be present to tile the object."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not raise an exception?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating that approach too. I guess I see the exception as probably being more transparent, so that might be a better approach @rbavery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually a mistake to leave that in, thanks for catching it! I was using the assert to debug whether a tile_gdf has a crs before it is saved (it did) and forgot to delete that line. I think it's still a good idea to check if src gdf's supplied to the tiler have a crs attribute that is not None, so I've moved this sort of check to before the tile_generator is run.

solaris/utils/geo.py Show resolved Hide resolved
Copy link
Contributor

@nrweir nrweir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go!

@nrweir nrweir merged commit 6e1ef25 into CosmiQ:dev Apr 3, 2020
@nrweir nrweir changed the title [WIP] Bug fixes, geojson2coco improvements, restrict_by_aoi, and nodata filling Bug fixes, geojson2coco improvements, restrict_by_aoi, and nodata filling Apr 3, 2020
@rbavery rbavery deleted the fix-split-geom branch April 3, 2020 19:16
nrweir added a commit that referenced this pull request Apr 20, 2020
* Add files via upload

* implementing new CRS management with pyproj

* loosening version restrictions

* removing WIP GeoTIFFDataset added by accident

* implementing new CRS management with pyproj

* Update environment-gpu.yml

* loosening version restrictions

* removing WIP GeoTIFFDataset added by accident

* pinning to dev version of shapely

* debugging syntax

* debugging requirements and setup.py - diffs btwn conda and pypi

* pinning shapely to 1.7.1dev

* adding dependency link

* finally got shapely installed from gh

* why is this still not working

* fixing pip install flags to accommodate git install

* trying another way to get shapely...

* fixing pyyaml reqt

* hotfix split_multi_geoms()

* hotfix split_multi_geoms()

* fixing split_multi_geometries and adding test

* debugged split_multi_geometries and wrote new tests

* adding dependency_link for shapely in setup.py

* fixing version regex to include characters, e.g. "0.2.2dev0"

* [WIP] Bug fixes, geojson2coco improvements, restrict_by_aoi, and nodata filling (#331)

* added nodata threshold option for raster tiler for cases where large portions of image have nodata

* fixed issue where the aoi boundary was not intersected with src_img bounds prior to generating tile bounds list and added warnign about multipoly

* reverse sign on nodata threshold check and add a docstring to tile function

* impements restrict_to_aoi in rastertiler

* fixed naming of raster tiles in case restrict_to_aoi is used

* original tests pass for raster tiler

* added remove multipolygon option to geojson2coco, they cause it to fail

* fill_no_data func for tiler with options for mean, arbitrary value

* added remove multipolygon function for geojson2coco

* option to override crs in geojson_to_px_gdf to fix for epsg being used for wkt projections when gdfs are saved as geojson and fix for missing crs info in vector tiler

* add override options for coco

* bug fixes for tiler when fill no data is used, nodata_percentage is calculated correctly now

* added comment on not cropping in tiler

* return fill values from raster tiler

* bug fix for fill_all_nodata and instance mask should return empty arr with same shape as reference_im

* added filtering for geometry collections to remove_multipolygon function and updated description for geojson2coco

* bug fix for case where fill value is not the mean

* new function for nodata aware instance mask conversion with tiler inputs, test for instance masking, handling necessary use of rasterio crs in _check_crs

* Fixing torchvision pin in environment.yml

* fixed test

* update CHANGELOG, environments for shapely dev for np support, removed coco crs override, update tqdm

* small updates, tests pass

* review updates: tqdm import switch, explode multipoly option, _check_crs change

* fixed get_overlapping_subset to use new return_Rasterio flag

* added override crs option back in to geojson_to_coco

* add chceck for src gdf to have a crs attribute

Co-authored-by: Ubuntu <ryan@cropmask-mgpu-18-v2.03lau3lhyktubpzetswi1ohm1h.xx.internal.cloudapp.net>
Co-authored-by: Nick Weir <nrweir@users.noreply.github.com>

* Pinning GDAL version to resolve compatibility issue with Fiona

* Fix stage of val_datagen in Trainer (#345)

* Fix stage of val_datagen in Trainer

* updating changelog and solaris version

Co-authored-by: jshermeyer <jss5102@gmail.com>
Co-authored-by: Ryan Avery <ravery@ucsb.edu>
Co-authored-by: Ubuntu <ryan@cropmask-mgpu-18-v2.03lau3lhyktubpzetswi1ohm1h.xx.internal.cloudapp.net>
Co-authored-by: Daniel Hogan <6313241+dphogan@users.noreply.github.com>
Co-authored-by: Kaizaburo Chubachi <kaizaburo_chubachi@shino.ecei.tohoku.ac.jp>
@dphogan dphogan mentioned this pull request Jul 16, 2020
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants