-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors in subset_nhdplus()
#376
Comments
Hi Lindsay -- these are icky catchments that are causing the geometry validation stuff to throw warnings. (https://github.com/DOI-USGS/nhdplusTools/blob/main/R/subset_nhdplus.R#L498) Have you been able figure out why your pipeline is bombing? The first issue is due to a degenerate polygon being converted into a line -- I'll see if I can deal with that issue soon. The second issue is odd -- I'm guessing some geometry validation is trying to clean up a multipolygon into a single polygon ring and failing. Will have to see if I can clean that up. One thing to try is to turn off S2 geometry: Will post back here when I've had a chance to take a look. |
I'm still not completely sure I understand why some things that were warnings when I ran outside the pipeline caused the pipeline build to error. I ended up adding tryCatch({
sf::sf_use_s2(FALSE)
subset_nhdplus(comids = as.integer(comids),
output_file = out_file,
nhdplus_data = "download",
flowline_only = FALSE,
return_data = FALSE,
overwrite = TRUE)
}, error = function(e) {
if(grepl('st_cast for MULTIGEOMETRYCOLLECTION not supported', e$message) |
grepl('nrow(x) == length(value) is not TRUE', e$message, fixed=T) |
grepl('Loop 0 is not valid: Edge 11 crosses edge 13', e$message) |
grepl('arguments have different crs', e$message)) {
warning(sprintf('Caught error but could continue ... %s', e$message))
return(out_file)
} else {
stop(e$message)
}
}, warning = function(w) {
warning(sprintf('Caught error but could continue ... %s', w$message))
}) This has allowed me to progress through downloading 500 sets of ~1235 COMIDs at a time. Only one of my groups is failing, which I have yet to figure out. It works when I break the 1235 COMIDs into three different sets - none of them fail. But I didn't think I was hitting any kind of cap since all other 499 groups of that same number of COMIDs worked. I'm attempting to pinpoint what the root cause is. The error it fails with is |
Seems like network gremlins. If you can isolate it to one example that we con reproduce, I can try to chaise it down. Otherwise, I'd just try again and hope, unfortunately. |
OK, I've dug back into this again. I do not think it is simply network gremlins as I have tried quite a few times. There seems to be a slight difference when I use A reprex with just one of those failing COMIDs showing that it works for flowlines but not catchments using either function: library(nhdplusTools) # I have v1.0.0
example_comid <- 6078795
get_nhdplus(comid = example_comid, realization = "flowline")
subset_nhdplus(comids = example_comid, flowline_only = TRUE, nhdplus_data = "download")
get_nhdplus(comid = example_comid, realization = "catchment")
subset_nhdplus(comids = example_comid, flowline_only = FALSE, nhdplus_data = "download") Console output showing warnings about no catchment info from > example_comid <- 6078795
> get_nhdplus(comid = example_comid, realization = "flowline")
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
Simple feature collection with 1 feature and 137 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -71.84389 ymin: 42.44597 xmax: -71.84381 ymax: 42.44614
Geodetic CRS: WGS 84
# A tibble: 1 × 138
comid fdate resolution gnis_id gnis_name lengthkm reachcode
<int> <dttm> <chr> <chr> <chr> <dbl> <chr>
1 6078795 2008-07-28 23:00:00 Medium 610437 Babcock Brook 0.02 01070004…
# ℹ 131 more variables: flowdir <chr>, wbareacomi <int>, ftype <chr>,
# fcode <int>, shape_length <dbl>, streamleve <int>, streamorde <int>,
# streamcalc <int>, fromnode <dbl>, tonode <dbl>, hydroseq <dbl>,
# levelpathi <dbl>, pathlength <dbl>, terminalpa <dbl>, arbolatesu <dbl>,
# divergence <int>, startflag <int>, terminalfl <int>, dnlevel <int>,
# uplevelpat <dbl>, uphydroseq <dbl>, dnlevelpat <dbl>, dnminorhyd <int>,
# dndraincou <int>, dnhydroseq <dbl>, frommeas <dbl>, tomeas <dbl>, …
# ℹ Use `colnames()` to see all variable names
> subset_nhdplus(comids = example_comid, flowline_only = TRUE, nhdplus_data = "download")
All intersections performed in latitude/longitude.
Reading NHDFlowline_Network
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
Writing NHDFlowline_Network
$NHDFlowline_Network
Simple feature collection with 1 feature and 137 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -71.84389 ymin: 42.44597 xmax: -71.84381 ymax: 42.44614
Geodetic CRS: NAD83
# A tibble: 1 × 138
comid fdate resolution gnis_id gnis_name lengthkm reachcode
* <int> <dttm> <chr> <chr> <chr> <dbl> <chr>
1 6078795 2008-07-28 23:00:00 Medium 610437 Babcock Brook 0.02 01070004…
# ℹ 131 more variables: flowdir <chr>, wbareacomi <int>, ftype <chr>,
# fcode <int>, shape_length <dbl>, streamleve <int>, streamorde <int>,
# streamcalc <int>, fromnode <dbl>, tonode <dbl>, hydroseq <dbl>,
# levelpathi <dbl>, pathlength <dbl>, terminalpa <dbl>, arbolatesu <dbl>,
# divergence <int>, startflag <int>, terminalfl <int>, dnlevel <int>,
# uplevelpat <dbl>, uphydroseq <dbl>, dnlevelpat <dbl>, dnminorhyd <int>,
# dndraincou <int>, dnhydroseq <dbl>, frommeas <dbl>, tomeas <dbl>, …
# ℹ Use `colnames()` to see all variable names
> get_nhdplus(comid = example_comid, realization = "catchment")
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
list()
Warning message:
No catchment features found
> subset_nhdplus(comids = example_comid, flowline_only = FALSE, nhdplus_data = "download")
All intersections performed in latitude/longitude.
Reading NHDFlowline_Network
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
Writing NHDFlowline_Network
Reading CatchmentSP
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
$NHDFlowline_Network
Simple feature collection with 1 feature and 137 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -71.84389 ymin: 42.44597 xmax: -71.84381 ymax: 42.44614
Geodetic CRS: NAD83
# A tibble: 1 × 138
comid fdate resolution gnis_id gnis_name lengthkm reachcode
* <int> <dttm> <chr> <chr> <chr> <dbl> <chr>
1 6078795 2008-07-28 23:00:00 Medium 610437 Babcock Brook 0.02 01070004…
# ℹ 131 more variables: flowdir <chr>, wbareacomi <int>, ftype <chr>,
# fcode <int>, shape_length <dbl>, streamleve <int>, streamorde <int>,
# streamcalc <int>, fromnode <dbl>, tonode <dbl>, hydroseq <dbl>,
# levelpathi <dbl>, pathlength <dbl>, terminalpa <dbl>, arbolatesu <dbl>,
# divergence <int>, startflag <int>, terminalfl <int>, dnlevel <int>,
# uplevelpat <dbl>, uphydroseq <dbl>, dnlevelpat <dbl>, dnminorhyd <int>,
# dndraincou <int>, dnhydroseq <dbl>, frommeas <dbl>, tomeas <dbl>, …
# ℹ Use `colnames()` to see all variable names
Warning messages:
1: No catchment features found
2: In value[[3L]](cond) : error getting catchment from nhdplus_data
Error in UseMethod("st_bbox"): no applicable method for 'st_bbox' applied to an object of class "NULL" |
The flowline 6078795 has a drainage area of 0 -- so it is just not going to have a catchment. What behavior would you expect here where you are requesting one flowline that doesn't have a polygon associated with it? Happy to make the change if there's a better way to handle it. |
Ahhh OK, so I didn't realize there wasn't any drainage area associated with that flowline. Everything makes sense now! How is that even possible? I assume that is the case with all the other failing ones. I found these COMIDs by gathering all upstream COMIDs for the reaches where I have ~100 gages in order to include their entire upstream watershed. I just assumed every flowline had a catchment, even if super tiny. I am trying to think about the behavior that would have made this more clear from the function output. I think that |
Unfortunately there are actually a lot of very short flowlines that don't have a catchment. Basically, since the DEM they used was 30m resolution, there are flowlines that receive less than half a grid cell of flow and just don't get any catchment area. I'll leave this in my inbox for follow up and pop open a debugger for these cases. I could try to return a 0 row frame or something, but that can cause other issues. Will see what seems to make more sense. This is an edge case that should be handled more elegantly atleast. |
Excellent! Along these same lines, I was adding the following command to remove any COMIDS whose drainage area returns 0. There is one site that has thrown an actual error as if it doesn't have any catchment characteristics at all.
It has a catchment area of 0.0027 km2, though:
|
One more edge case. This COMID seems to have a drainage area of 0, so no catchment polygon that I can pull BUT it does have catchment attributes, including the NLCD ones. Should it not have these attributes?
|
So I'm chatting with Mike W. about why the second one is returning values -- it doesn't make sense that a connector in a waterbody would be returning those NLCD values. |
OK -- Mike confirmed that there was a fill in option for very short connector flowlines. So this flowline, which just connects from the edge of a waterbody to its centerline just got the value from its upstream neighbor. |
I am running
subset_nhdplus()
for a bunch of COMIDs. I started callingsubset_nhdplus()
separately for each COMID so that I can more easily pinpoint the errors I was seeing. One strange thing to me was that some of these appear to say "warning" but caused my pipeline that was using them to stop. Each COMID still return the data I need, so I have implemented sometryCatch()
calls to get past them. I thought I would still log the issues I was seeing here in case they are obvious or easy fixes on your end.The text was updated successfully, but these errors were encountered: