-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fixes, geojson2coco improvements, restrict_by_aoi, and nodata filling #331
Conversation
…portions of image have nodata
…ounds prior to generating tile bounds list and added warnign about multipoly
…d for wkt projections when gdfs are saved as geojson and fix for missing crs info in vector tiler
…alculated correctly now
… with same shape as reference_im
…ion and updated description for geojson2coco
Hi @rbavery, Thanks for working on all of this stuff. Just as a note, the CosmiQ team is currently working with counsel to ensure that |
Another thought: since |
Oof (wrt to that interim rule). No problem though. I saw that update from geopandas, it's good news overall. I think it makes sense to for vectors to internally use |
Yeah, that makes sense. As far as I can tell (from trying myself), geopandas gdfs can't handle rasterio I'm about to start some work that will require me to get a bunch of stuff resolved (including CRSs in vector objects), and word on the street is that we'll be able to start pushing code up here again very soon. I'll create an issue for vector CRSs and assign myself to it if I start working on it. |
Sounds good. Joris, one of the geopandas maintainers, says that geopandas gdfs should be able to handle the rasterio crs class since it has a |
Hey @rbavery. I spent a bunch of time digging into what works well and what doesn't for interconverting between CRSs, and I think I've decided that passing around pyproj CRS objects is going to work better. Check out this gist for an example. I'm going to start working towards passing everything around as pyproj objects. |
@nrweir Got it sounds good. I read the rasterio groups thread, it make sense to standardize on the pyproj class. I'll have time next week to work on wrapping up this PR with some tests. Would you prefer I wait and rebase off of your pyproj CRS updates? In any case, I'll remove this change I made in this PR since I think the missing CRS issue I was having might be fixed by your update.
|
@rbavery that'd be great (rebasing off the new version). I'm going to do a 0.2.2 release today. |
…uts, test for instance masking, handling necessary use of rasterio crs in _check_crs
Codecov Report
@@ Coverage Diff @@
## dev #331 +/- ##
==========================================
- Coverage 67.17% 64.56% -2.62%
==========================================
Files 72 72
Lines 5204 5328 +124
==========================================
- Hits 3496 3440 -56
- Misses 1708 1888 +180
Continue to review full report at Codecov.
|
Hey @nrweir, thanks for the patience, I think this PR is finally ready for review! Instead of converting pyproj.CRs to rasterio.crs.CRS using epsg codes I followed @snowman2's advice and used this code snippet he suggested: import rasterio
if LooseVersion(rasterio.__gdal_version__) >= LooseVersion("3.0.0"):
rio_crs = rasterio.crs.CRS.from_wkt(pyproj_crs.to_wkt())
else:
rio_crs = rasterio.crs.CRS.from_wkt(pyproj_crs.to_wkt("WKT1_GDAL")) When I ran your gist with this if/else, it worked to properly convert where it failed before. |
@rbavery thanks, this looks awesome - I'll review today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great additional functionality and cleanup of some of the existing problems. A few minor changes requested.
Can you hold off on merging @nrweir ? I found that their is still an issue when using The vector_tiler maintains this CRS info right up until the gdfs are saved, so I think it is an issue with the geojson format. |
Ok, I think this should be good to go, if the way I addressed the reviews makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more small change to the assert
statement in sol.tile.vector_tile
and we're good to go! Thanks for all of your hard work.
solaris/tile/vector_tile.py
Outdated
@@ -103,6 +103,7 @@ def tile(self, src, tile_bounds, tile_bounds_crs=None, geom_type='Polygon', | |||
output_ext)) | |||
self.tile_paths.append(dest_path) | |||
if len(tile_gdf) > 0: | |||
assert tile_gdf.crs is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this. Can we add an expression for what the error will tell users? e.g.,
if len(tile_gdf) > 0:
assert tile_gdf.crs is not None, "No CRS was found. A CRS must be present to tile the object."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not raise an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was debating that approach too. I guess I see the exception as probably being more transparent, so that might be a better approach @rbavery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually a mistake to leave that in, thanks for catching it! I was using the assert to debug whether a tile_gdf has a crs before it is saved (it did) and forgot to delete that line. I think it's still a good idea to check if src gdf's supplied to the tiler have a crs attribute that is not None, so I've moved this sort of check to before the tile_generator is run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go!
* Add files via upload * implementing new CRS management with pyproj * loosening version restrictions * removing WIP GeoTIFFDataset added by accident * implementing new CRS management with pyproj * Update environment-gpu.yml * loosening version restrictions * removing WIP GeoTIFFDataset added by accident * pinning to dev version of shapely * debugging syntax * debugging requirements and setup.py - diffs btwn conda and pypi * pinning shapely to 1.7.1dev * adding dependency link * finally got shapely installed from gh * why is this still not working * fixing pip install flags to accommodate git install * trying another way to get shapely... * fixing pyyaml reqt * hotfix split_multi_geoms() * hotfix split_multi_geoms() * fixing split_multi_geometries and adding test * debugged split_multi_geometries and wrote new tests * adding dependency_link for shapely in setup.py * fixing version regex to include characters, e.g. "0.2.2dev0" * [WIP] Bug fixes, geojson2coco improvements, restrict_by_aoi, and nodata filling (#331) * added nodata threshold option for raster tiler for cases where large portions of image have nodata * fixed issue where the aoi boundary was not intersected with src_img bounds prior to generating tile bounds list and added warnign about multipoly * reverse sign on nodata threshold check and add a docstring to tile function * impements restrict_to_aoi in rastertiler * fixed naming of raster tiles in case restrict_to_aoi is used * original tests pass for raster tiler * added remove multipolygon option to geojson2coco, they cause it to fail * fill_no_data func for tiler with options for mean, arbitrary value * added remove multipolygon function for geojson2coco * option to override crs in geojson_to_px_gdf to fix for epsg being used for wkt projections when gdfs are saved as geojson and fix for missing crs info in vector tiler * add override options for coco * bug fixes for tiler when fill no data is used, nodata_percentage is calculated correctly now * added comment on not cropping in tiler * return fill values from raster tiler * bug fix for fill_all_nodata and instance mask should return empty arr with same shape as reference_im * added filtering for geometry collections to remove_multipolygon function and updated description for geojson2coco * bug fix for case where fill value is not the mean * new function for nodata aware instance mask conversion with tiler inputs, test for instance masking, handling necessary use of rasterio crs in _check_crs * Fixing torchvision pin in environment.yml * fixed test * update CHANGELOG, environments for shapely dev for np support, removed coco crs override, update tqdm * small updates, tests pass * review updates: tqdm import switch, explode multipoly option, _check_crs change * fixed get_overlapping_subset to use new return_Rasterio flag * added override crs option back in to geojson_to_coco * add chceck for src gdf to have a crs attribute Co-authored-by: Ubuntu <ryan@cropmask-mgpu-18-v2.03lau3lhyktubpzetswi1ohm1h.xx.internal.cloudapp.net> Co-authored-by: Nick Weir <nrweir@users.noreply.github.com> * Pinning GDAL version to resolve compatibility issue with Fiona * Fix stage of val_datagen in Trainer (#345) * Fix stage of val_datagen in Trainer * updating changelog and solaris version Co-authored-by: jshermeyer <jss5102@gmail.com> Co-authored-by: Ryan Avery <ravery@ucsb.edu> Co-authored-by: Ubuntu <ryan@cropmask-mgpu-18-v2.03lau3lhyktubpzetswi1ohm1h.xx.internal.cloudapp.net> Co-authored-by: Daniel Hogan <6313241+dphogan@users.noreply.github.com> Co-authored-by: Kaizaburo Chubachi <kaizaburo_chubachi@shino.ecei.tohoku.ac.jp>
Description
This PR addresses the following issues:
#327 by allowing an option to set areas outside an aoi_boundary to the src image's nodata value when tiling.
#328 by adding a method to the
RasterTiler
class that fills tiles with the mean of the src image. If this is run after restrict_by_aoi, all nodata values outside of the aoi are filled, along with other nodata values in the image.geojson2coco
would fail if the geodataframe input had any MultiPolygon or GeometryCollection types. I added an optional argument to ignore these when converting the geodataframe to COCO formatThere was a bug in the raster_tiler's use of the split_geom function, where the aoi boundary was not intersected with the src_image boundary prior to tiling, causing many nodata areas outside the src_img boundary to be tiled.
EDIT: Here's a list of all changes, from the CHANGELOG:
Changed
fill_all_nodata
to fill nodata with"mean"
or custom value ([FEATURE]: Handling nodata values generally, should we toss, fill, set to 0? #328)geojson2coco
since these cannot be converted to COCO.solaris.vector.mask.geojsons_to_masks_and_fill_nodata
, which rasterized vector labels according to raster tile extents. Fills nodata areas in raster tile and corresponding rasterized label raster.Fixed
tqdm
fromtqdm.auto
) #335)_check_crs
to convertpyproj.CRS
or pass throughrasterio.crs.CRS
class when rasterio crs object is required (for example, reprojecting in the tilers)Please delete options that are not relevant.
Checklist:
If your PR does not fulfill all of the requirements in the checklist above, that's OK! Just prepend [WIP] to the PR title until they are all satisfied. If you need help, @-mention a maintainer and/or add the Status: Help Needed label.