-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API for GDAL users to be able to extract information / transform subdataset names #7261
Comments
would this include getting the sds names in the beginning? (because I'd meant to pursue that, would like to be able to name them in "vrt://...?sds=var1" for example and it's a bit laborious atm)👌 |
What do you mean exactly? |
to generate the set of "NETCDF:file.nc:var1" etc from "file.nc" last I looked it's quite involved |
involved as in iterating over |
ah, well that aspects is in the "works" category for me. The Python bindings have typically a GetSubDatasets() helper that re-arranges things as an array of (name, description) pairs:
|
We'd also probably need a GDALIsSubdatasetSyntax() to check if a string is a subdataset syntax. I presume those functions should actually iterate over drivers and call function pointers at the driver level (similarly to Identify(), Open() etc) |
@rouault I started sketching the API, if I get this right the goal is to determine if the file name is possibly a subdataset and for the second function to strip the dataset information and return the file path without dataset information. Is it correct that for performance reasons both functions should not open the dataset but only examine the file name string? I have a few questions about the input and the implementation of the methods, you mentioned that we should loop through all the registered drivers that support subdatasets so I guess that we must handle the case when we don't know the driver in advance (because the file name is not in the form The question is if it does make sense to loop through all drivers (when the driver is not known in advance) and ask the driver to determine if the file name is a subdataset without actually opening the dataset, how can the driver know if it can handle that file without actually trying to open it? I guess in a few cases it could examine the file extension but I am worried that this won't work in all situations (for example in case of API URLs). Is there a logic I can borrow? Maybe from OpenEx? This is the top level function I am working on: bool CPL_STDCALL GDALIsSubdatasetSyntax(const char *pszFileName)
{
// Iterate all drivers
GDALDriverManager *poDM = GetGDALDriverManager();
const int nDriverCount = poDM->GetDriverCount();
for (int iDriver = 0; iDriver < nDriverCount; ++iDriver)
{
GDALDriver *poDriver = poDM->GetDriver(iDriver);
char **papszMD = GDALGetMetadata(poDriver, nullptr);
if (! CPLFetchBool(papszMD, GDAL_DMD_SUBDATASETS, false) || ! poDriver->pfnIsDatasetSyntax)
{
continue;
}
// Ask the driver if this is a subdataset descriptor
if( poDriver->pfnIsDatasetSyntax( pszFileName ))
{
return TRUE;
}
}
return FALSE;
} |
yes ... ideally ... I don't have in mind situations where we'd need to open it, but perhaps I'm missing something
your looping logic looks good to me. As we might need 3 functonality (is this is a subdaset name, get the filename if there's one, get a new subsdataset URI with this filename instead), I'm wondering if we shouldn't have a single pfnGetSubdatasetInfo function pointer that would return an object that would have methods GetFilename, ModifyFilename. Potentially it could also return driver specific info as key/value pairs, similarly to QGIS decodeUri()/encodeUri() |
@rouault so I can assume that we always have a file name that starts with |
There might be several prefixes handled by the same driver. Eg the SENTINEL2 one. So the general logic should not make any assumption regarding this. |
@rouault I tried the function pointer approach but I've got lost in casting fp to void* (which is forbidden). Before I loose more time on that road, I was thinking that we should define the API a little better: as far as I understand there are two different required client use cases:
Now, I understand that many methods in GDAL are implemented with function pointers (I'm not very familiar with that pattern because we don't use it much in QGIS or in other C++ projects I've been working with, like underpass), I tried anyway to define an opaque handle for the function pointer but I've got stuck with the above mentioned error. So, I was thinking at a different approach: make SubdatasetInfo and abstract interface (with pure virtual methods) which must be implemented by the drivers that offer this functionality. When Do you see any problem with this approach especially for the If we stick to the use case 1 and with the separate methods API I've sketched in #7261 (comment) I think I know how to proceed but that's different from what you asked in #7261 (comment) |
why did you need to cast fp to void* ? And why do you actually need a file pointer in the API ? It might help if you showed your draft and the actual error you got
I would hope we can have a single API for both use cases. I would say that even if we know the driver, it is probably not that much an issue to loop over driver to find it again. I don't anticipate those subdataset related methods to be in particular hot performance code paths.
I agree this is a bit of a odd pattern. My understand is this was done to easily test if a driver implements an interface or not, without actually calling the method. So for example if pfnCreateCopy is defined then you can adverize a GDAL_DCAP_CREATECOPY=YES driver metadata item. And as a consequence most drivers don't need to actually subclass GDALDriver. They just instanciate a new object and set function pointers.
Having SubdatasetInfo a class with pure virtual methods sounds OK to me. It shouldn't cause any issue for the C interface and SWIG bindings. Pretty similar to GDALDataset, GDALRasterBand, OGRLayer and so on. |
I meant function pointer. |
@rouault Here is my broken and unfinished initial attempt https://github.com/elpaso/gdal/tree/subdatasetinfo-api-func-pointers |
ah ok
and making the C++ class actually a |
@rouault would it be possible as part of this work to deprecate and stop promoting the use of subdataset names with extra quotes in them? For example, |
I wouldn't want to put too much on @elpaso shoulders as the amount of changes might quickly go out of control. We also have backward compatibility concerns with people potentially forging subdataset names based on their knowledge of the current implementation.
This is the very aim of this ticket to make subdataset name structure opaque to QGIS and give it functions to manipulate them easily without having to know anything about netCDF, HDF5 and the like. |
Cf https://github.com/qgis/QGIS/pull/51901/files#r1109095423 for the context
Perhaps GDALGetFilenameFromSubdatasetName(), GDALModifyFilenameInSubdatasetName(), etc.
The text was updated successfully, but these errors were encountered: