Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in subset_nhdplus() #376

Open
lindsayplatt opened this issue Jan 27, 2024 · 11 comments
Open

Errors in subset_nhdplus() #376

lindsayplatt opened this issue Jan 27, 2024 · 11 comments

Comments

@lindsayplatt
Copy link

I am running subset_nhdplus() for a bunch of COMIDs. I started calling subset_nhdplus() separately for each COMID so that I can more easily pinpoint the errors I was seeing. One strange thing to me was that some of these appear to say "warning" but caused my pipeline that was using them to stop. Each COMID still return the data I need, so I have implemented some tryCatch() calls to get past them. I thought I would still log the issues I was seeing here in case they are obvious or easy fixes on your end.

library(nhdplusTools)
library(sf)

# A tiny wrapper I wrote around the function
get_catchment <- function(comid) {
  subset_nhdplus(comids = as.integer(comid),
                 output_file = tempfile(fileext = '.gpkg'),
                 nhdplus_data = "download", 
                 flowline_only = FALSE,
                 return_data = TRUE, 
                 overwrite = TRUE)
}

# st_cast for MULTIGEOMETRYCOLLECTION not supported
catchment1 <- get_catchment('1897280')

# nrow(x) == length(value) is not TRUE
catchment2 <- get_catchment('1889676')
catchment3 <- get_catchment('1889664')

# Loop 0 is not valid: Edge 11 crosses edge 13
catchment4 <- get_catchment('21972746')

# All catchment shapes are still returned ...
par(mfrow=c(2,2), mar=c(1,1,1,1))
plot(st_geometry(catchment1$CatchmentSP), col='blue', main=catchment1$NHDFlowline_Network$comid)
plot(st_geometry(catchment2$CatchmentSP), col='purple', main=catchment2$NHDFlowline_Network$comid)
plot(st_geometry(catchment3$CatchmentSP), col='forestgreen', main=catchment3$NHDFlowline_Network$comid)
plot(st_geometry(catchment4$CatchmentSP), col='salmon', main=catchment4$NHDFlowline_Network$comid)

image

@dblodgett-usgs
Copy link
Collaborator

Hi Lindsay -- these are icky catchments that are causing the geometry validation stuff to throw warnings. (https://github.com/DOI-USGS/nhdplusTools/blob/main/R/subset_nhdplus.R#L498) Have you been able figure out why your pipeline is bombing?

The first issue is due to a degenerate polygon being converted into a line -- I'll see if I can deal with that issue soon.

The second issue is odd -- I'm guessing some geometry validation is trying to clean up a multipolygon into a single polygon ring and failing. Will have to see if I can clean that up.

One thing to try is to turn off S2 geometry: sf::sf_use_s2(FALSE)

Will post back here when I've had a chance to take a look.

@lindsayplatt
Copy link
Author

lindsayplatt commented Feb 5, 2024

I'm still not completely sure I understand why some things that were warnings when I ran outside the pipeline caused the pipeline build to error. I ended up adding tryCatch() to my pipeline function for known errors that seemed to still produce a geopackage file output.

  tryCatch({
    sf::sf_use_s2(FALSE)
    subset_nhdplus(comids = as.integer(comids),
                   output_file = out_file,
                   nhdplus_data = "download", 
                   flowline_only = FALSE,
                   return_data = FALSE, 
                   overwrite = TRUE)
  }, error = function(e) {
    if(grepl('st_cast for MULTIGEOMETRYCOLLECTION not supported', e$message) |
       grepl('nrow(x) == length(value) is not TRUE', e$message, fixed=T) |
       grepl('Loop 0 is not valid: Edge 11 crosses edge 13', e$message) |
       grepl('arguments have different crs', e$message)) {
      warning(sprintf('Caught error but could continue ... %s', e$message))
      return(out_file)
    } else {
      stop(e$message)
    }
  }, warning = function(w) {
    warning(sprintf('Caught error but could continue ... %s', w$message))
  })

This has allowed me to progress through downloading 500 sets of ~1235 COMIDs at a time. Only one of my groups is failing, which I have yet to figure out. It works when I break the 1235 COMIDs into three different sets - none of them fail. But I didn't think I was hitting any kind of cap since all other 499 groups of that same number of COMIDs worked. I'm attempting to pinpoint what the root cause is.

The error it fails with is Error in curl::curl_fetch_memory(url, handle = handle): transfer closed with outstanding read data remaining. Unfortunately, I can't recreate this outside of my pipeline, so I am wondering if this is an error that sounds familiar?

@dblodgett-usgs
Copy link
Collaborator

Seems like network gremlins. If you can isolate it to one example that we con reproduce, I can try to chaise it down. Otherwise, I'd just try again and hope, unfortunately.

@lindsayplatt
Copy link
Author

OK, I've dug back into this again. I do not think it is simply network gremlins as I have tried quite a few times. There seems to be a slight difference when I use subset_nhdplus() compared to get_nhdplus() in how it handles empty catchments. I have about 7000k+ COMIDs that appear to be doing something similar where they have a flowline, but don't return a catchment shape. Do you know what might be the reason behind this?

A reprex with just one of those failing COMIDs showing that it works for flowlines but not catchments using either function:

library(nhdplusTools) # I have v1.0.0

example_comid <- 6078795

get_nhdplus(comid = example_comid, realization = "flowline")
subset_nhdplus(comids = example_comid, flowline_only = TRUE, nhdplus_data = "download")

get_nhdplus(comid = example_comid, realization = "catchment")
subset_nhdplus(comids = example_comid, flowline_only = FALSE, nhdplus_data = "download")

Console output showing warnings about no catchment info from get_nhdplus() and errors from subset_nhdplus():

> example_comid <- 6078795
> get_nhdplus(comid = example_comid, realization = "flowline")
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
Simple feature collection with 1 feature and 137 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -71.84389 ymin: 42.44597 xmax: -71.84381 ymax: 42.44614
Geodetic CRS:  WGS 84
# A tibble: 1 × 138
    comid fdate               resolution gnis_id gnis_name     lengthkm reachcode
    <int> <dttm>              <chr>      <chr>   <chr>            <dbl> <chr>    
1 6078795 2008-07-28 23:00:00 Medium     610437  Babcock Brook     0.02 01070004# ℹ 131 more variables: flowdir <chr>, wbareacomi <int>, ftype <chr>,
#   fcode <int>, shape_length <dbl>, streamleve <int>, streamorde <int>,
#   streamcalc <int>, fromnode <dbl>, tonode <dbl>, hydroseq <dbl>,
#   levelpathi <dbl>, pathlength <dbl>, terminalpa <dbl>, arbolatesu <dbl>,
#   divergence <int>, startflag <int>, terminalfl <int>, dnlevel <int>,
#   uplevelpat <dbl>, uphydroseq <dbl>, dnlevelpat <dbl>, dnminorhyd <int>,
#   dndraincou <int>, dnhydroseq <dbl>, frommeas <dbl>, tomeas <dbl>, …
# ℹ Use `colnames()` to see all variable names
> subset_nhdplus(comids = example_comid, flowline_only = TRUE, nhdplus_data = "download")
All intersections performed in latitude/longitude.
Reading NHDFlowline_Network
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
Writing NHDFlowline_Network
$NHDFlowline_Network
Simple feature collection with 1 feature and 137 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -71.84389 ymin: 42.44597 xmax: -71.84381 ymax: 42.44614
Geodetic CRS:  NAD83
# A tibble: 1 × 138
    comid fdate               resolution gnis_id gnis_name     lengthkm reachcode
*   <int> <dttm>              <chr>      <chr>   <chr>            <dbl> <chr>    
1 6078795 2008-07-28 23:00:00 Medium     610437  Babcock Brook     0.02 01070004# ℹ 131 more variables: flowdir <chr>, wbareacomi <int>, ftype <chr>,
#   fcode <int>, shape_length <dbl>, streamleve <int>, streamorde <int>,
#   streamcalc <int>, fromnode <dbl>, tonode <dbl>, hydroseq <dbl>,
#   levelpathi <dbl>, pathlength <dbl>, terminalpa <dbl>, arbolatesu <dbl>,
#   divergence <int>, startflag <int>, terminalfl <int>, dnlevel <int>,
#   uplevelpat <dbl>, uphydroseq <dbl>, dnlevelpat <dbl>, dnminorhyd <int>,
#   dndraincou <int>, dnhydroseq <dbl>, frommeas <dbl>, tomeas <dbl>, …
# ℹ Use `colnames()` to see all variable names

> get_nhdplus(comid = example_comid, realization = "catchment")
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
list()
Warning message:
No catchment features found 
> subset_nhdplus(comids = example_comid, flowline_only = FALSE, nhdplus_data = "download")
All intersections performed in latitude/longitude.
Reading NHDFlowline_Network
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
Writing NHDFlowline_Network
Reading CatchmentSP
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
$NHDFlowline_Network
Simple feature collection with 1 feature and 137 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -71.84389 ymin: 42.44597 xmax: -71.84381 ymax: 42.44614
Geodetic CRS:  NAD83
# A tibble: 1 × 138
    comid fdate               resolution gnis_id gnis_name     lengthkm reachcode
*   <int> <dttm>              <chr>      <chr>   <chr>            <dbl> <chr>    
1 6078795 2008-07-28 23:00:00 Medium     610437  Babcock Brook     0.02 01070004# ℹ 131 more variables: flowdir <chr>, wbareacomi <int>, ftype <chr>,
#   fcode <int>, shape_length <dbl>, streamleve <int>, streamorde <int>,
#   streamcalc <int>, fromnode <dbl>, tonode <dbl>, hydroseq <dbl>,
#   levelpathi <dbl>, pathlength <dbl>, terminalpa <dbl>, arbolatesu <dbl>,
#   divergence <int>, startflag <int>, terminalfl <int>, dnlevel <int>,
#   uplevelpat <dbl>, uphydroseq <dbl>, dnlevelpat <dbl>, dnminorhyd <int>,
#   dndraincou <int>, dnhydroseq <dbl>, frommeas <dbl>, tomeas <dbl>, …
# ℹ Use `colnames()` to see all variable names

Warning messages:
1: No catchment features found 
2: In value[[3L]](cond) : error getting catchment from nhdplus_data
 Error in UseMethod("st_bbox"): no applicable method for 'st_bbox' applied to an object of class "NULL"

@dblodgett-usgs
Copy link
Collaborator

The flowline 6078795 has a drainage area of 0 -- so it is just not going to have a catchment.

What behavior would you expect here where you are requesting one flowline that doesn't have a polygon associated with it? Happy to make the change if there's a better way to handle it.

@lindsayplatt
Copy link
Author

Ahhh OK, so I didn't realize there wasn't any drainage area associated with that flowline. Everything makes sense now! How is that even possible?

I assume that is the case with all the other failing ones. I found these COMIDs by gathering all upstream COMIDs for the reaches where I have ~100 gages in order to include their entire upstream watershed. I just assumed every flowline had a catchment, even if super tiny.

I am trying to think about the behavior that would have made this more clear from the function output. I think that subset_nhdplus() erroring is certainly surprising. I would hope it could handle the NULL output and print a warning instead. Also, when I run get_nhdarea(id = 6078795) it returns NULL and maybe 0 would be more the behavior that would help? I will code in a step in my workflow that removes any of these upstream COMIDs with no drainage area.

@dblodgett-usgs
Copy link
Collaborator

Unfortunately there are actually a lot of very short flowlines that don't have a catchment. Basically, since the DEM they used was 30m resolution, there are flowlines that receive less than half a grid cell of flow and just don't get any catchment area.

I'll leave this in my inbox for follow up and pop open a debugger for these cases. I could try to return a 0 row frame or something, but that can cause other issues. Will see what seems to make more sense. This is an edge case that should be handled more elegantly atleast.

@lindsayplatt
Copy link
Author

Excellent! Along these same lines, I was adding the following command to remove any COMIDS whose drainage area returns 0. There is one site that has thrown an actual error as if it doesn't have any catchment characteristics at all.

get_catchment_characteristics(varname = 'CAT_BASIN_AREA', ids = 12168057) 
NULL
Warning message:
In get_catchment_characteristics(varname = "CAT_BASIN_AREA", ids = 12168057) :
  Issue getting characteristics. Error was: 
replacement has 1 row, data has 0Issue getting characteristics. Error was: 
`$<-.data.frame`(x, name, value)

It has a catchment area of 0.0027 km2, though:

get_nhdplus(comid = 12168057, realization = "flowline") %>% select(areasqkm)
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
Simple feature collection with 1 feature and 1 field
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -87.89588 ymin: 43.0247 xmax: -87.89581 ymax: 43.02528
Geodetic CRS:  WGS 84
# A tibble: 1 × 2
  areasqkm                                geometry
     <dbl>                        <LINESTRING [°]>
1   0.0027 (-87.89581 43.0247, -87.89588 43.02528)

@lindsayplatt
Copy link
Author

One more edge case. This COMID seems to have a drainage area of 0, so no catchment polygon that I can pull BUT it does have catchment attributes, including the NLCD ones. Should it not have these attributes?

library(nhdplusTools)
example_comid <- 4672393

# Returns a flowline and `areasqkm` says 0
get_nhdplus(comid = example_comid, realization = "flowline")

# Nothing returned for catchment
get_nhdplus(comid = example_comid, realization = "catchment")

# BUT confusingly, I can pull attributes that are dependent on having a nonzero drainage basin 
# such as these NLCD land use classes for forest types deciduous (41), evergreen (42), and mixed (43)
nhdplusTools::nhdplusTools_data_dir(tools::R_user_dir("nhdplusTools"))
get_catchment_characteristics(varname = sprintf('CAT_NLCD19_%s', 41:43), ids = example_comid)

   characteristic_id   comid characteristic_value percent_nodata
1:     CAT_NLCD19_41 4672393                45.08              0
2:     CAT_NLCD19_42 4672393                 0.19              0
3:     CAT_NLCD19_43 4672393                 2.01              0

@dblodgett-usgs
Copy link
Collaborator

So 12168057 is a coastline. Coastal flowlines do have catchments, but they must not be part of the accumulated variables.

I'm chatting with Mike W. about why the second one is returning values -- it doesn't make sense that a connector in a waterbody would be returning those NLCD values.

@dblodgett-usgs
Copy link
Collaborator

OK -- Mike confirmed that there was a fill in option for very short connector flowlines. So this flowline, which just connects from the edge of a waterbody to its centerline just got the value from its upstream neighbor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants