Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: SubdatasetInfo #8155

Merged
merged 32 commits into from
Oct 12, 2023
Merged

Conversation

elpaso
Copy link
Collaborator

@elpaso elpaso commented Jul 28, 2023

Fix #7261

Initial GDALSubdatasetInfo API.

@elpaso
Copy link
Collaborator Author

elpaso commented Jul 28, 2023

@rouault I'd like a general review of the API before I get through all the formats, for now the only implemented one is GPKG.

I will also probably need a list of the drivers that support subdatasets.

@coveralls
Copy link
Collaborator

coveralls commented Jul 28, 2023

Coverage Status

coverage: 67.607% (-0.003%) from 67.61% when pulling e94dbb4 on elpaso:subdatasetinfo-api-func-pointers into 7f18ca5 on OSGeo:master.

gcore/gdal.h Outdated Show resolved Hide resolved
gcore/gdal.h Outdated Show resolved Hide resolved
gcore/gdal.h Outdated Show resolved Hide resolved
gcore/gdal.h Outdated Show resolved Hide resolved
gcore/gdal_priv.h Show resolved Hide resolved
ogr/ogrsf_frmts/gpkg/ogrgeopackagedriver.cpp Outdated Show resolved Hide resolved
swig/include/SubdatasetInfo.i Outdated Show resolved Hide resolved
swig/include/SubdatasetInfo.i Outdated Show resolved Hide resolved
swig/include/SubdatasetInfo.i Outdated Show resolved Hide resolved
swig/include/SubdatasetInfo.i Outdated Show resolved Hide resolved
gcore/gdalsubdatasetinfo.h Outdated Show resolved Hide resolved
gcore/gdalsubdatasetinfo.cpp Outdated Show resolved Hide resolved
gcore/gdal.h Outdated Show resolved Hide resolved
@elpaso
Copy link
Collaborator Author

elpaso commented Sep 4, 2023

@rouault I agree there is a little confusion in the terminology, here is my proposal to clarify and avoid ambiguity:

The complete file name (e.g. GPKG:/path_to.gpkg:layer_name:var_name) will be named "File name" and the methods and variables will be renamed accordingly.

The parsed components will be named:

  • driver prefix e.g. GPKG -> "Prefix component"
  • file path: e.g. /path_to.gpkg -> "Path component"
  • subdataset: e.g. layer_name -> "Subdataset component"

So, the methods will become:

GetFileName() -> returns the original complete file name e.g. GPKG:/path_to.gpkg:layer_name:var_name
GetPathComponent() -> returns the path component e.g. /path_to.gpkg
GetPrefixComponent() -> returns the prefix component e.g. GPKG
GetSubdatasetComponent() -> returns the subdataset component e.g. layer_name
ModifyPathComponent( newPathComponent) -> replaces the path component

Or, the shorter versions without Component:
GetFileName()
GetPath()
GetPrefix()
GetSubdataset()
ModifyPath(newPath)

@rouault
Copy link
Member

rouault commented Sep 4, 2023

GetSubdataset()

or maybe GetSuffix() ?

@elpaso
Copy link
Collaborator Author

elpaso commented Sep 4, 2023

GetSubdataset()

or maybe GetSuffix() ?

I don't follow: isn't subdataset more appropriate to describe it? Suffix is really generic and if we have variables it might not even be the last component in the descriptor.

@rouault
Copy link
Member

rouault commented Sep 4, 2023

I don't follow: isn't subdataset more appropriate to describe it? Suffix is really generic and if we have variables it might not even be the last component in the descriptor.

ok, let's keep GetSubdataset(). "There are only two hard things in Computer Science: cache invalidation and naming things. " :-)

@elpaso elpaso force-pushed the subdatasetinfo-api-func-pointers branch from e57c11c to df1227d Compare September 5, 2023 10:54
frmts/gtiff/geotiff.cpp Outdated Show resolved Hide resolved
frmts/gtiff/geotiff.cpp Outdated Show resolved Hide resolved

// GDALSubdatasetInfo interface
private:
void parseFileName() override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about the opportunity of using this in the GTiffDataset::OpenDir() method to avoid duplication of parsing ? (this could apply to other drivers as well)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look, I prefer a method that does not have to open the file though, ideally this API should just manipulate strings without I/O overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a method that does not have to open the file though, ideally this API should just manipulate strings without I/O overhead.

yes, totally agreed. I mean to update the existing Open() methods to use the new parseFileName()

{
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a more esoteric supported variant: GTIFF_DIR:off:<dir_offset>:filename
If we reused this class for GTiffDataset::OpenDir(), we'd have to support it

frmts/postgisraster/postgisrasterdataset.cpp Outdated Show resolved Hide resolved
frmts/wms/wmsdriver.cpp Outdated Show resolved Hide resolved
ogr/ogrsf_frmts/gpkg/ogrgeopackagedriver.cpp Outdated Show resolved Hide resolved
frmts/gtiff/geotiff.cpp Outdated Show resolved Hide resolved
frmts/gtiff/geotiff.cpp Outdated Show resolved Hide resolved
ogr/ogrsf_frmts/gpkg/ogrgeopackagedriver.cpp Outdated Show resolved Hide resolved
ogr/ogrsf_frmts/gpkg/ogrgeopackagedriver.cpp Outdated Show resolved Hide resolved
@sgillies
Copy link
Contributor

sgillies commented Sep 6, 2023

@rouault @elpaso how did we arrive at using : as a delimiter between dataset and subdataset name? Does this have to be extended to drivers where it doesn't exist already? I think : is a poor choice and leads to people putting quotes around dataset URLs within a GDAL filename.

@elpaso
Copy link
Collaborator Author

elpaso commented Sep 6, 2023

@rouault @elpaso how did we arrive at using : as a delimiter between dataset and subdataset name? Does this have to be extended to drivers where it doesn't exist already? I think : is a poor choice and leads to people putting quotes around dataset URLs within a GDAL filename.

I don't know where the colon separator comes from, but I totally agree that it's a poor choice, the QGIS counterpart QgsDatasourceUri is a little bit better engineered, for most datasources it uses URL encoding for the KVP and it allows to encode the opening options.

@rouault
Copy link
Member

rouault commented Sep 6, 2023

@rouault @elpaso how did we arrive at using : as a delimiter between dataset and subdataset name? Does this have to be extended to drivers where it doesn't exist already? I think : is a poor choice and leads to people putting quotes around dataset URLs within a GDAL filename.

@sgillies Before my involvement with the project. I can see that this syntax was already used in the initial version of the HDF4 driver back in 2002: 7c56e20797c17d2bcb3f173d652ec1fdf590d558
While this is indeed not ideal, changing the subdataset syntax in existing drivers would have backward compatibility issues. I suspect a number of people have VRT files referencing the old syntax, or some code manually composing them.
The new API itself in this PR makes no assumption on the separator, so I don't think it is a step backwards, and it should be hopefully compatible with better subsdataset syntax.
That said the replacement logic in ModifyPathComponent() is fully string based currently. if at some point a driver would URI-encode the filename, that wouldn't work. But if that occured, the method could be made virtual and implemented by the driver that would deal with the URI encoding/decoding.

@elpaso I believe to keep this PR manageable we should target the most popular file-based drivers that use subsdataset (others might follow in subsequent PR): netCDF, HDF4, HDF5, GPKG . Anything else ?
I'd be interested however to see frmts/vrt/vrtsources.cpp updated to remove the NETCDF releated hacks in favor of the new API to demonstrate its usefulness (hum I see that to remove all of them we should also tackle NITF, PDF, RASTERLITE and TILEDB)

@elpaso
Copy link
Collaborator Author

elpaso commented Sep 6, 2023

@rouault @elpaso how did we arrive at using : as a delimiter between dataset and subdataset name? Does this have to be extended to drivers where it doesn't exist already? I think : is a poor choice and leads to people putting quotes around dataset URLs within a GDAL filename.

@sgillies Before my involvement with the project. I can see that this syntax was already used in the initial version of the HDF4 driver back in 2002: 7c56e20797c17d2bcb3f173d652ec1fdf590d558 While this is indeed not ideal, changing the subdataset syntax in existing drivers would have backward compatibility issues. I suspect a number of people have VRT files referencing the old syntax, or some code manually composing them. The new API itself in this PR makes no assumption on the separator, so I don't think it is a step backwards, and it should be hopefully compatible with better subsdataset syntax. That said the replacement logic in ModifyPathComponent() is fully string based currently. if at some point a driver would URI-encode the filename, that wouldn't work. But if that occured, the method could be made virtual and implemented by the driver that would deal with the URI encoding/decoding.

Yeah.

@elpaso I believe to keep this PR manageable we should target the most popular file-based drivers that use subsdataset (others might follow in subsequent PR): netCDF, HDF4, HDF5, GPKG . Anything else ? I'd be interested however to see frmts/vrt/vrtsources.cpp updated to remove the NETCDF releated hacks in favor of the new API to demonstrate its usefulness (hum I see that to remove all of them we should also tackle NITF, PDF, RASTERLITE and TILEDB)

I have actually already implemented WMS and PostGIS raster because I wanted to test the API with a DB and a webservice. I will proceed according to your priority list.

@elpaso elpaso force-pushed the subdatasetinfo-api-func-pointers branch from acab4ab to 2481e97 Compare September 11, 2023 09:15
@elpaso
Copy link
Collaborator Author

elpaso commented Sep 14, 2023

@rouault here is another iteration. I have removed the double quotes from the logic by storing both the original unmodified path (used only internally to modify the path) and I am storing isQuoted to determine if the new path should be quoted (if it is already quoted it is not quoted twice).

When returning the path it is always unquoted and unescaped.

I have added a few new test cases.

@elpaso elpaso marked this pull request as ready for review September 14, 2023 16:22
@rouault
Copy link
Member

rouault commented Sep 15, 2023

@sgillies Do you think we can merge this PR or do you want the RFC you mention in https://lists.osgeo.org/pipermail/gdal-dev/2023-September/057641.html to be a pre-condition for it ?

@rouault rouault added this to the 3.8.0 milestone Oct 11, 2023
@rouault
Copy link
Member

rouault commented Oct 11, 2023

@elpaso Can you rebase on top of latest master and fix conflict in autotest/gdrivers/netcdf.py?

@elpaso elpaso force-pushed the subdatasetinfo-api-func-pointers branch from 2cd9a69 to 1f63b55 Compare October 11, 2023 12:29
@elpaso
Copy link
Collaborator Author

elpaso commented Oct 11, 2023

@elpaso Can you rebase on top of latest master and fix conflict in autotest/gdrivers/netcdf.py?

done

@rouault rouault merged commit e6baccb into OSGeo:master Oct 12, 2023
27 of 29 checks passed
bradh pushed a commit to bradh/gdal that referenced this pull request Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add API for GDAL users to be able to extract information / transform subdataset names
4 participants