Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crs structure is going to change #49

Closed
edzer opened this issue Feb 4, 2020 · 16 comments
Closed

crs structure is going to change #49

edzer opened this issue Feb 4, 2020 · 16 comments

Comments

@edzer
Copy link

edzer commented Feb 4, 2020

This is just a heads up: since sfheaders does not depend on sf, it is not checked in my rev dep checks. Following drastic PROJ and GDAL changes, upcoming sf (branch SetFromUserInput in github) will have a different structure for crs objects; see r-spatial/sf#1244 and r-spatial/sf#1225 . It looks like sfheaders hard-codes them to be as they used to be.

@dcooley
Copy link
Owner

dcooley commented Feb 5, 2020

Thanks @edzer - I'll make the changes ready for the next release.

dcooley added a commit that referenced this issue Feb 6, 2020
dcooley added a commit that referenced this issue Feb 6, 2020
@dcooley
Copy link
Owner

dcooley commented Feb 6, 2020

From r-spatial/sf#1225

In short, in this branch crs objects have two character fields:

input: the string with which the object was initialised by the user, or the string that characterizes the CRS briefly when an object was read through GDAL; it is meant to be the human-readable version of the CRS (as opposed to WKT).
wkt: the wkt (or wkt2 in GDAL3/PROJ6) description of the CRS, which is meant to be communicate the entire CRS to and from GDAL

I have changed the crs object to be two Rcpp::Strings

Rcpp::String crs_input,
Rcpp::String crs_wkt,

// attribute::crs
Rcpp::List crs = Rcpp::List::create(
  Rcpp::Named("input") = crs_input,
  Rcpp::Named("wkt") = crs_wkt
);
df <- data.frame(x = 1, y = 2)
sf <- sfheaders::sf_point(df)
a <- attributes( sf$geometry )
expect_true( is.character( a$crs$input ) )
expect_true( is.character( a$crs$wkt ) )

Will test when new sf is released.

@edzer
Copy link
Author

edzer commented Feb 6, 2020

Since this package does not depend in any way on sf, (how) are you going to make sure that it aligns with the right version?

@Robinlovelace
Copy link

Robinlovelace commented Feb 8, 2020

The tests should handle this ideally, I would imagine. FYI I have a minimal reprex that shows how the current version of sfheaders works but produces non-identical results. One could add basic tests to check if objects returned by both pkgs are identical (I think Suggested pkgs can be used in tests):

library(sfheaders)
x <- matrix( c(1:4), ncol = 2 )
xsfc <- sfc_linestring( x )
sf::st_crs(xsfc)
#> Coordinate Reference System: NA
xsfc <- sfc_linestring( x , crs = 4326)
#> Error in sfc_linestring(x, crs = 4326): unused argument (crs = 4326)

# with sf
xsfc2 <- sf::st_sfc(sf::st_linestring(x))
identical(xsfc, xsfc2)
#> [1] FALSE

# crss
attributes(xsfc)
#> $n_empty
#> [1] 0
#> 
#> $crs
#> Coordinate Reference System: NA
#> 
#> $class
#> [1] "sfc_LINESTRING" "sfc"           
#> 
#> $precision
#> [1] 0
#> 
#> $bbox
#> xmin ymin xmax ymax 
#>    1    3    2    4 
#> 
#> $z_range
#> zmin zmax 
#>   NA   NA 
#> 
#> $m_range
#> mmin mmax 
#>   NA   NA
attr(xsfc, which = "crs") = 4326
sf::st_crs(xsfc)
#> [1] 4326
xsfc2 <- sf::st_sfc(sf::st_linestring(x), crs = 4326)
sf::st_crs(xsfc2)
#> Coordinate Reference System:
#>   EPSG: 4326 
#>   proj4string: "+proj=longlat +datum=WGS84 +no_defs"

identical(xsfc, xsfc2)
#> [1] FALSE

Created on 2020-02-08 by the reprex package (v0.3.0)

Great to see efforts to align sf and sfheaders, things could get very confusing if they diverged! As long as the current version of the package more-or-less tracks the current version of sf it should be OK I would imagine. Thanks for updating sf to support upstream changes Edzer, will keep R up-to-date or maybe even give it a head-start compared with other languages and thanks for sfheaders David!

@edzer
Copy link
Author

edzer commented Feb 26, 2020

This aligning could happen but has to happened here.

Currently, sf is not in sfheaders's Suggests: field. This means that sfheaders cannot have checks where its output is compared to that of sf, and also that when sf changes, when I check changes on reverse dependencies using tools::check_packages_in_dir such tests are not run. I ran into this case by accident.

AFAICT, the goal of sfheaders is to not have sf in its Depends: or Imports: fields, but not to remain under the radar when sf changes something because of upstream (OSGEO) changes.

@Robinlovelace
Copy link

May be 'none of my business' but, having just read the comment above, I'd advocate adding sf as a Suggests.

@SymbolixAU
Copy link
Contributor

Since this package does not depend in any way on sf, (how) are you going to make sure that it aligns with the right version?

@edzer I don't have any specific mechanisms in place other than following your output and keeping on top of it.

the goal of sfheaders is to not have sf in its Depends: or Imports: fields, but not to remain under the radar when sf changes something because of upstream (OSGEO) changes.

yes exactly right.

Perhaps the solution is to 'suggest' sf and include tests against object structures (thanks for the suggestion @Robinlovelace ).

SymbolixAU pushed a commit that referenced this issue Mar 24, 2020
dcooley added a commit that referenced this issue Mar 28, 2020
@mdsumner
Copy link
Contributor

These structures should stay as they were. Just add slots, that's how you maintain compatibility, and friends. sp built enormous good will, despite limitations

@mpadge
Copy link

mpadge commented Apr 12, 2020

In response to @edzer's comments

Since this package does not depend in any way on sf, (how) are you going to make sure that it aligns with the right version?

and

AFAICT, the goal of sfheaders is ... not to remain under the radar when sf changes something because of upstream (OSGEO) changes.

Open source communities fragment, diverge, and function best in the absence of attempts to enforce centralised control. The sf package is not the gatekeeper for the open standard OGC SF format, regardless of the central role it plays in the R spatial world. Alternative implementations should be welcomed, not spurned, and it is up to the developers themselves to ensure ongoing compliance with external factors (OSGEO, whatever). Again, i do not view it as necessary or helpful to have one package act as gatekeeper. An insightful comparison:

ap <- available.packages ()
ap <- ap [grep ("^gg", ap [, 1]), ]
dep <- ap [, grep ("Depends", colnames (ap))]
sug <- ap [, grep ("Suggests", colnames (ap))]

ggdep <- grepl ("ggplot", dep)
ggsug <- grepl ("ggplot", sug)

message ("Depends: ", length (which (ggdep)), " packages, or ",
         signif (100 * length (which (ggdep)) / nrow (ap), 3), "%")
#> Depends: 46 packages, or 38.7%
message ("Suggests: ", length (which (ggsug)), " packages, or ",
         signif (100 * length (which (ggsug)) / nrow (ap), 3), "%")
#> Suggests: 5 packages, or 4.2%

n_neither <- nrow (ap) - length (which (ggdep)) - length (which (ggsug))
message ("Neither Depends nor Suggests = ",
         n_neither, ", or ", signif (100 * n_neither / nrow (ap), 3), "%")
#> Neither Depends nor Suggests = 68, or 57.1%

Created on 2020-04-12 by the reprex package (v0.3.0)

The majority of gg_* packages neither Depend nor Suggest the "original" package itself, and each of them must then be individually responsible for ensuring compliance (or not, as the case may be).


Edit: I forgot Imports there, which is 62 of those final 68%, but that still leaves > 5% neither Depending, Importing, or Suggesting.

@edzer
Copy link
Author

edzer commented Apr 12, 2020

Mark, fair enough, but this is not about gate keeping representations, this is about user expectations. I'm entirely OK if any package goes its own way representing simple features (or whatever spatial data structures) the way they want, it is just that if users want to plot stuff with ggplot2::geom_sf, or write stuff with sf::st_write, that the data should be in a format sf expects. Creating objects of class sf creates this expectation. Users will expect sf objects to work with package sf. If not, why call it sf, why not give it a different name and provide an st_as_sf() method? Package spatstat (another one from down under) has never used sp or sf, and probably never will; I worked hard with @rubak to get conversion methods in place, both ways.

Think of it like this: suppose your package creates its own grouped tibbles and give them class c("group_df", "tbl_df", "tbl", "data.frame"), and lets the grouping variable start at value 0 because package tibble did this two years ago. Now tibble moved on having it start at 1 (or vice versa, whatever), but you don't follow this. Would you keep on claiming that your package does users a service by creating objects of that class? And how do you envision users would find out where the problems originate when dplyr doesn't work anymore for them?

Now you and @mdsumner seem to have strong opinions about this that apparently differ from mine, but I would be interested to hear what @dcooley thinks about what users of sfheaders should expect from sf objects created by sfheaders. I came here to help, not keep gates.

@edzer
Copy link
Author

edzer commented Apr 12, 2020

An example of a diverging simple feature implementation done non-disruptively by the great @paleolimbot : https://twitter.com/paleolimbot/status/1249142366776315904

@paleolimbot
Copy link

@edzer - I might frame that :)

@dcooley - you could also set up a GH action that checks development sf in addition to CRAN sf...given the goal of this package, it seems like a reasonable step to ensure compatibility.

@dcooley
Copy link
Owner

dcooley commented Apr 13, 2020

Thanks for the input everyone. I think for the R side of sfheaders I should follow the output of sf, as that's what I designed it to do. So if sf only allows character slots for a particular attribute, I need to follow suit. However, if and where possible I do play a bit looser with some of these restrictions. So if I can still keep the slots for the old structure, without it breaking anything in the latest sf version I will try and accommodate it.


Going further, 90% of the code in this sfheaders library is all about reshaping R structures into other R structures, through Rcpp. And I've abstracted this away from the specific sf-creation code. So I see no reason why this general code can't again be split into a more general package whose only job is to reshape data.frames and matrices into required geometrical representations.

Then anyone can use this general reshaping code, put their own R-class on top of it and use it in their own library / structures.

If @mdsumner @edzer @Robinlovelace @paleolimbot @mpadge (or other interested party reading this thread) have any thoughts on this I'd bee keen to hear it.


You could also set up a GH action that checks development sf in addition to CRAN sf...given the goal of this package, it seems like a reasonable step to ensure compatibility.

@paleolimbot - do you know how to set this up?

@paleolimbot
Copy link

I think you can stick

remotes::install_github("r-spatial/sf")

Before or after this (I forget which is more likely to work): https://github.com/dcooley/sfheaders/blob/master/.github/workflows/R-CMD-check.yaml#L68 ...in a copy of your R-CMD-check.yaml file.

If you're going return an object of class sf or sfc_*, you should probably expect_identical() your output with that of sf...they're sf's objects.

If there is an in-memory structure that's faster to create (from a data frame or otherwise), you could give it another class ("sfheaders"?) and implement st_as_sfc()/st_as_sf() using sf's constructors. You can dynamically register those methods to avoid requiring sf in Imports (see here: https://github.com/paleolimbot/geovctrs/blob/master/R/zzz.R#L21 ).

@mdsumner
Copy link
Contributor

This as a general tool used by narrower scope packages is exactly how things should work in my opinion. This is a core piece that sf should import ... It's redundant and insulting to need to say so, and no one should feel that pressure. Abstraction from format is an obvious requirement. Not everything is a simple feature and those are only sometimes useful, especially when only one mode is available. What else is there to say ...

Thanks @dcooley amazing bit of kit!

The wkt thing is a problem, no one is ignoring it but this is software we can have flexibility not just one size fits all if we want to. Diverse approaches that want to generate a commonly used format at lightning speed from a wide range of inputs is a massive win win for everyone in my view. It's incredible to me to enforce a cm precision crs string at all times, with the transformation library always present - but then not use them for map plotting and binary ops, but I'm happy for diverse approaches as long as they don't limit what others can do. (Sorry, totally redundant and obvious again ...)

@dcooley
Copy link
Owner

dcooley commented May 13, 2020

Closing this issue because I've made the changes, and need to get a patch on CRAN for a bug I found. But have moved the "test against sf objects" to a new issue because it's still worthwhile, but I don't have the time to do it before release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants