Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GDAL driver detection via {vapour} #16

Merged
merged 2 commits into from
Mar 2, 2024
Merged

Conversation

brownag
Copy link
Owner

@brownag brownag commented Mar 2, 2024

This implements better handling of file paths to detect the necessary GDAL drivers. It is able to distinguish between raster and vector in most cases.

To be consistent with prior behavior, for now, the GDAL CSV driver is ignored as a possible vector source (used for non-spatial attributes only) and the GPKG driver is only used as a vector source.

  • It appears that {terra} can not properly read and write vector data from CSV anyway.
  • It is questionable whether GPKG (or any other multilayer sources) should be allowed in the current scheme. If there is more than one spatial layer in the source the current 1:1 mapping of source to new table becomes ambiguous without the ability to either select a specific layer, or transfer all layers.

There are several other drivers (which previously did not work as their extensions were not in the hard coded list) which can serve as both vector and raster sources.

drv <- vapour::vapour_all_drivers()
drv$driver[drv$vector & drv$raster]
#> [1] "FITS"        "PCIDSK"      "netCDF"      "PDS4"        "VICAR"       "JP2OpenJPEG"
#> [7] "PDF"         "MBTiles"     "BAG"         "OGCAPI"      "GPKG"        "OpenFileGDB"
#> [13] "CAD"         "PLSCENES"    "NGW"         "HTTP"       

There may need to be some specific handling of the above, and then the set of decisions documented in the documentation Details. The option is always available to read from the source using the correct format before passing to gpkg_write(), this would only affect file path source driver detection.

@brownag
Copy link
Owner Author

brownag commented Mar 2, 2024

Example of {terra} behavior reading/writing CSV vector sources. I think this could potentially be "fixed" in {terra} but CSV is really not a good storage medium for vector data so I do not think I should suggest it. I think it is probably best to just treat CSV files as "attributes only" for the purposes of gpkg_write().

  • There is currently no way to specify the geometry column in the terra::vect(<character>) method, so the source is read as attributes only with 0 geometries.
  • Specifying the geometry column via geom is only supported after the CSV has been read in to a data.frame and then the terra::vect(<data.frame>) method can be used
  • Writing a SpatVector to CSV fails
library(terra)
#> terra 1.7.73

x <- vect(system.file("ex", "lux.shp", package="terra"))

write.csv(as.data.frame(x), "test.csv", row.names = FALSE)
a <- vect("test.csv")
a
#>  class       : SpatVector 
#>  geometry    : none 
#>  dimensions  : 0, 6  (geometries, attributes)
#>  extent      : 0, 0, 0, 0  (xmin, xmax, ymin, ymax)
#>  source      : test.csv
#>  coord. ref. :  
#>  names       :  ID_1 NAME_1  ID_2 NAME_2  AREA   POP
#>  type        : <chr>  <chr> <chr>  <chr> <chr> <chr>

write.csv(as.data.frame(x, geom = "WKT"), "test.csv", row.names = FALSE)
b <- vect("test.csv")
b
#>  class       : SpatVector 
#>  geometry    : none 
#>  dimensions  : 0, 7  (geometries, attributes)
#>  extent      : 0, 0, 0, 0  (xmin, xmax, ymin, ymax)
#>  source      : test.csv
#>  coord. ref. :  
#>  names       :  ID_1 NAME_1  ID_2 NAME_2  AREA   POP geometry
#>  type        : <chr>  <chr> <chr>  <chr> <chr> <chr>    <chr>

d <- vect(read.csv("test.csv"), geom = "geometry")
d
#>  class       : SpatVector 
#>  geometry    : polygons 
#>  dimensions  : 12, 6  (geometries, attributes)
#>  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
#>  coord. ref. :  
#>  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP
#>  type        : <int>    <chr> <int>    <chr> <int> <int>
#>  values      :     1 Diekirch     1 Clervaux   312 18081
#>                    1 Diekirch     2 Diekirch   218 32543
#>                    1 Diekirch     3  Redange   259 18664

writeVector(x, "test2.csv")
#> Error: [writeVector] cannot guess filetype from filename

@brownag
Copy link
Owner Author

brownag commented Mar 2, 2024

I added documentation about CSV and GPKG sources, and some general guidance for how to handle multilayer sources (i.e. read the specific layers you want in to R before attempting to write them to a new GeoPackage). I am not inclined to add any additional handling for sources other than these.

In the future I may may decide to either disallow GPKG in input file path sources, or provide a custom method for GPKG only that allows transfer of multiple tables using the list item name, or file basename, as a prefix for the new table name(s)--but will do that in a separate PR.

@brownag brownag marked this pull request as ready for review March 2, 2024 17:46
@brownag brownag linked an issue Mar 2, 2024 that may be closed by this pull request
@brownag brownag merged commit 338febb into main Mar 2, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.gpkg_process_sources() should be more robust to various OGR sources
1 participant