Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proj_create_crs_to_crs(): need for area of use ? #559

Closed
rouault opened this issue Aug 23, 2017 · 12 comments
Closed

proj_create_crs_to_crs(): need for area of use ? #559

rouault opened this issue Aug 23, 2017 · 12 comments

Comments

@rouault
Copy link
Member

rouault commented Aug 23, 2017

Jumping from QGIS discussion qgis/QGIS-Enhancement-Proposals#100 to here, I'm wondering if proj_create_crs_to_crs() shouldn't have a parameter to express the area of use, so that we can select a better transform from the EPSG database when we will consult it. Or perhaps this will be a extented proj_create_crs_to_crs_ex() version that will do that ?

@kbevers
Copy link
Member

kbevers commented Aug 23, 2017

I can see the value in such functionality but I am not sure how feasible it is to implement in practice. For instance, transformations from ITRF20xx to ETRS89 are defined differently in Scandinavia than in, say, Germany. As far as I am aware this sort of level of detail is not available in the EPSG-dataset. Of course it might be in the future.

This example is maybe a bit esoteric - are there better and more plausible scenarios where an area of interest is needed for the correct transformation?

How would you change the function prototype in practice?

PJ *proj_create_crs_to_crs (PJ_CONTEXT *ctx, const char *srid_from, const char *srid_to, double lat_min, double_latmax, double lon_min, double lon_max)

perhaps? Of course there's the issue of what the bounding box coordinates refer to, but I guess it is okay as long as they are in roughly the desired area.

@rouault
Copy link
Member Author

rouault commented Aug 23, 2017

As far as I am aware this sort of level of detail is not available in the EPSG-dataset

It is. The EPSG database proposes different transformations for the same (source_crs, target_crs) tuple, based on the area of use. Here for Pulkovo 42 to WGS84, you can see different transformation per country (typically different towgs84 parameters, see https://www.epsg-registry.org/export.htm?wkt=urn:ogc:def:coordinateOperation:EPSG::1645 and https://www.epsg-registry.org/export.htm?wkt=urn:ogc:def:coordinateOperation:EPSG::15496 )

$ ogrinfo pg:dbname=epsg -sql "select coord_op_code, coord_op_name, area_of_use_code, a.area_name, a.area_south_bound_lat, a.area_north_bound_lat, a.area_west_bound_lon, a.area_east_bound_lon from epsg_coordoperation c join epsg_area a on c.area_of_use_code = a.area_code where c.source_crs_code = 4179 and c.target_crs_code = 4326" -q

Layer name: sql_statement
OGRFeature(sql_statement):0
  coord_op_code (Integer) = 1645
  coord_op_name (String) = Pulkovo 1942(58) to WGS 84 (1)
  area_of_use_code (Integer) = 3293
  area_name (String) = Poland - onshore
  area_south_bound_lat (Real) = 49
  area_north_bound_lat (Real) = 54.89
  area_west_bound_lon (Real) = 14.14
  area_east_bound_lon (Real) = 24.15

OGRFeature(sql_statement):1
  coord_op_code (Integer) = 15496
  coord_op_name (String) = Pulkovo 1942(58) to WGS 84 (18)
  area_of_use_code (Integer) = 1197
  area_name (String) = Romania
  area_south_bound_lat (Real) = 43.44
  area_north_bound_lat (Real) = 48.27
  area_west_bound_lon (Real) = 20.26
  area_east_bound_lon (Real) = 31.41

OGRFeature(sql_statement):2
  coord_op_code (Integer) = 15497
  coord_op_name (String) = Pulkovo 1942(58) to WGS 84 (9)
  area_of_use_code (Integer) = 1197
  area_name (String) = Romania
  area_south_bound_lat (Real) = 43.44
  area_north_bound_lat (Real) = 48.27
  area_west_bound_lon (Real) = 20.26
  area_east_bound_lon (Real) = 31.41

OGRFeature(sql_statement):3
  coord_op_code (Integer) = 15995
  coord_op_name (String) = Pulkovo 1942(58) to WGS 84 (19)
  area_of_use_code (Integer) = 1197
  area_name (String) = Romania
  area_south_bound_lat (Real) = 43.44
  area_north_bound_lat (Real) = 48.27
  area_west_bound_lon (Real) = 20.26
  area_east_bound_lon (Real) = 31.41

OGRFeature(sql_statement):4
  coord_op_code (Integer) = 15997
  coord_op_name (String) = Pulkovo 1942(58) to WGS 84 (4)
  area_of_use_code (Integer) = 3293
  area_name (String) = Poland - onshore
  area_south_bound_lat (Real) = 49
  area_north_bound_lat (Real) = 54.89
  area_west_bound_lon (Real) = 14.14
  area_east_bound_lon (Real) = 24.15

OGRFeature(sql_statement):5
  coord_op_code (Integer) = 15999
  coord_op_name (String) = Pulkovo 1942(58) to WGS 84 (8)
  area_of_use_code (Integer) = 3212
  area_name (String) = Albania - onshore
  area_south_bound_lat (Real) = 39.64
  area_north_bound_lat (Real) = 42.67
  area_west_bound_lon (Real) = 19.22
  area_east_bound_lon (Real) = 21.06

PJ *proj_create_crs_to_crs (PJ_CONTEXT *ctx, const char *srid_from, const char *srid_to, double lat_min, double_latmax, double lon_min, double lon_max)

Would make sense, but we would need to make the BBOX optional, so perhaps an _ex() version would do for that purpose. Anyway this is probably premature as we don't have the needed infrastructure to make use of it yet

@rouault
Copy link
Member Author

rouault commented Aug 23, 2017

Interstingly in the above example there are 3 transformations for the Romania area of use :

$ ogrinfo  pg:dbname=epsg -sql "select * from epsg_coordoperation where coord_op_code in (15496, 15497, 15995)" -al -q

Layer name: sql_statement
OGRFeature(sql_statement):0
  coord_op_code (Integer) = 15496
  coord_op_name (String) = Pulkovo 1942(58) to WGS 84 (18)
  coord_op_type (String) = transformation
  source_crs_code (Integer) = 4179
  target_crs_code (Integer) = 4326
  coord_tfm_version (String) = Shell-Rom
  coord_op_variant (Integer) = 18
  area_of_use_code (Integer) = 1197
  coord_op_scope (String) = Oil exploration
  coord_op_accuracy (Real) = 10
  coord_op_method_code (Integer) = 9603
  uom_code_source_coord_diff (Integer) = (null)
  uom_code_target_coord_diff (Integer) = (null)
  remarks (String) = 
  information_source (String) = Shell SIEP
  data_source (String) = OGP
  revision_date (Date) = 2003/11/28
  change_id (String) = 
  show_operation (Integer) = 1
  deprecated (Integer) = 0

OGRFeature(sql_statement):1
  coord_op_code (Integer) = 15497
  coord_op_name (String) = Pulkovo 1942(58) to WGS 84 (9)
  coord_op_type (String) = transformation
  source_crs_code (Integer) = 4179
  target_crs_code (Integer) = 4326
  coord_tfm_version (String) = NIMA-Rom
  coord_op_variant (Integer) = 9
  area_of_use_code (Integer) = 1197
  coord_op_scope (String) = For military purposes. Accuracy 3m, 5m and 3m in X, Y and Z axes.
  coord_op_accuracy (Real) = 7
  coord_op_method_code (Integer) = 9603
  uom_code_source_coord_diff (Integer) = (null)
  uom_code_target_coord_diff (Integer) = (null)
  remarks (String) = Derived at 4 stations.
  information_source (String) = U.S. National Imagery and Mapping Agency TR8350.2 revision of October 1997; http://earth-info.nga.mil/GandG/tr8350/tr8350_2.html
  data_source (String) = OGP
  revision_date (Date) = 2014/11/19
  change_id (String) = 2014.058
  show_operation (Integer) = 1
  deprecated (Integer) = 0

OGRFeature(sql_statement):2
  coord_op_code (Integer) = 15995
  coord_op_name (String) = Pulkovo 1942(58) to WGS 84 (19)
  coord_op_type (String) = transformation
  source_crs_code (Integer) = 4179
  target_crs_code (Integer) = 4326
  coord_tfm_version (String) = OGP-Rom
  coord_op_variant (Integer) = 19
  area_of_use_code (Integer) = 1197
  coord_op_scope (String) = Accuracy of 1.5 to 3 metres horizontal, 3 to 5m vertical.
  coord_op_accuracy (Real) = 3
  coord_op_method_code (Integer) = 9607
  uom_code_source_coord_diff (Integer) = (null)
  uom_code_target_coord_diff (Integer) = (null)
  remarks (String) = Parameter values taken from Pulkovo 1942(58) to ETRS89 (4) (code 15994) assuming that ETRS89 is equivalent to WGS 84 within the accuracy of the transformation.
  information_source (String) = OGP
  data_source (String) = OGP
  revision_date (Date) = 2008/09/24
  change_id (String) = 
  show_operation (Integer) = 1
  deprecated (Integer) = 0

EPSG:15995 is the one with the best advertized accuracy (coord_op_accuracy (Real) = 3). The current https://trac.osgeo.org/geotiff/browser/trunk/libgeotiff/csv/build_pcs.py script that parses the EPSG database to build the GDAL .csv files and ultimately proj epsg file takes into account the 'deprecated' column too (it currently ignores the coord_op_accuracy, since its logic is to select a transform with the largest area of use (as it must pick one and only one transform for a source,dest tuple), to minimize the average error...)

@kbevers
Copy link
Member

kbevers commented Aug 23, 2017

it is.

Okay, then I think we should definitely add a bounding box option.

Would make sense, but we would need to make the BBOX optional, so perhaps an _ex() version would do for that purpose. Anyway this is probably premature as we don't have the needed infrastructure to make use of it yet

Could also be something like

typedef struct   {LP lowerleft; LP upperright; } PJ_BBOX;
PJ *proj_create_crs_to_crs (PJ_CONTEXT *ctx, const char *srid_from, const char *srid_to, PJ_BBOX area) {
...
if (area) {
    /* look for the best option */
} else {
    /* use the most likely transformation available */
}
...

This way it is only one extra argument to the function call. Pass NULL if you don't need it. In this first implementation it should obviously be ignored either way.
I prefer having just the one function over two almost identical functions. I can live with the bounding box being non-functional for now. It does however send a signal that we will be working towards better usage of the EPSG database in the future.

@rouault
Copy link
Member Author

rouault commented Aug 23, 2017

A PJ_BBOX* pointer then.

I guess that proj_pj_info() would return in its definition the choice that has been made ? To avoid the black box effect...

@kbevers
Copy link
Member

kbevers commented Aug 23, 2017

A PJ_BBOX* pointer then.

Yes, of course. Typing faster than I can think...

I guess that proj_pj_info() would return in its definition the choice that has been made ? To avoid the black box effect...

Good point. Got a suggestion on how to do it? Could be as simple as storing the EPSG coord_op_code as a variable in the PJ object, and just passing that along with proj_pj_info(), but how useful that is in practice I don't know. You'd have to look up the database yourself to see what the difference is.

What if the transformation is based on a different source than EPSG? Ideally the infrastructure is generic so other sources can be used in the future (who knows how long the EPSG is around?).

@rouault
Copy link
Member Author

rouault commented Aug 23, 2017

Got a suggestion on how to do it?

Was expecting to get the full expanded definition/pipeline in the "PJ_PROJ_INFO.definition" member "+proj=longlat +ellps=... +towgs84=.... +to +proj=longlat +datum=WGS84"

who knows how long the EPSG is around?

Probably at least as long the last drop of oil would have been extracted from this planet ;-)

@kbevers
Copy link
Member

kbevers commented Aug 23, 2017

Was expecting to get the full expanded definition/pipeline in the "PJ_PROJ_INFO.definition" member "+proj=longlat +ellps=... +towgs84=.... +to +proj=longlat +datum=WGS84"

Of course the actual definition will be there, yes. That should happen automatically already. It may however not be apparent to users that a specific option out of several has been used. I thought you were talking about being even more transparent about it.

Perhaps the transformation bounding box could be added to the PJ object if it is used to decide the transformation parameters? Will also let users know in which area they can expect the given transformation to be valid.

@busstoptaktik
Copy link
Member

Note: we have already (in projects.h )

struct PJ_REGION_S;
typedef struct PJ_REGION_S  PJ_Region;

struct PJ_REGION_S {
    double ll_long;        /* lower left corner coordinates (radians) */
    double ll_lat;
    double ur_long;        /* upper right corner coordinates (radians) */
    double ur_lat;
};

It is, however, used only a handful of places, and only in projects.h and pj_gridcatalog.c, so we probably could be forgiven for changing the typedef to typedef struct PJ_REGION_S PJ_BBOX

Also, I think the BBOX should be included in the PJ, rather than returned as a separate entity.

And finally, we should have an idiomatic way of specifying "I don't care", since this is part of the idea of having this function: Letting the user get away with caring as little as possible, by trying to make sure that he/she will shoot her/himself only in the toe, not in the foot.

@busstoptaktik
Copy link
Member

Also note that EPSG allows region of validity to be specified with words (as in ROMANIA, SVENDBORG, GREENLAND). We could provide a (small) number of predefined globals (EUROPE, EURASIA, NORTH_AMERICA, SOUTH_AMERICA) somewhat simplifying this, by basically defining bboxes for the major plates.

In general, however, aside from making sure we keep our options open, I think we should try to gain some experience from real world usage before fleshing out the details

@kbevers
Copy link
Member

kbevers commented Aug 23, 2017

It is, however, used only a handful of places, and only in projects.h and pj_gridcatalog.c, so we probably could be forgiven for changing the typedef to typedef struct PJ_REGION_S PJ_BBOX

I was thinking about that one, but couldn't remember the name of the top of my head. Since this is proj.h territory we can do whatever we want. I don't particularly like the way PJ_Region is defined. I'd prefer if we had a PJ_LONLAT that's equivalent to LP but with the members lon and lat instead of lam and phi (I always get lambda and phi mixed up), and then make a typedef struct { PJ_LONLAT ll; PJ_LONLAT ur; } PJ_BBOX.

And finally, we should have an idiomatic way of specifying "I don't care", since this is part of the idea of having this function: Letting the user get away with caring as little as possible, by trying to make sure that he/she will shoot her/himself only in the toe, not in the foot.

Isn't that exactly what you do when you set the BBOX-pointer to NULL?

In general, however, aside from making sure we keep our options open, I think we should try to gain some experience from real world usage before fleshing out the details

Yeah, I think the named regions can wait for now. But I think adding the bounding box to proj_crs_to_crs is the right thing to do. Although it does put some pressure on us to actually implement that stuff later on :-)

@kbevers kbevers closed this as completed Aug 23, 2017
@kbevers kbevers reopened this Aug 23, 2017
@rouault
Copy link
Member Author

rouault commented Aug 23, 2017

In general, however, aside from making sure we keep our options open, I think we should try to gain some experience from real world usage before fleshing out the details

Yeah, perhaps I'm overengineering things... The current complaints (that translate in GDAL / libgeotiff / proj tickets) of users are related to the fact that for projected CRS the current logic in the libgeotiff build_pcs.py script to select the TOWGS84 parameter uses the TOWGS84 parameter of the underlying GEOGCRS (for example Pulkovo 42 that apply to all Eastern europe, so the TOWGS84 selected is the one with the largest area of use) whereas the area of use of the projected CRS might be just for Poland, so they get parameters valid for the wole Eastern Europe, but not the more precise one that apply for Poland. So in practice a TOWGS84 parameter per CRS would do for them and they wouldn't need to explicitly specify an area of use since the one that comes with the CRS is probably good enough (at least while they work with projected CRS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants