Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdf vector driver cannot read current geopdfs #9870

Closed
Off-Tracker opened this issue May 7, 2024 · 10 comments · Fixed by #9874
Closed

pdf vector driver cannot read current geopdfs #9870

Off-Tracker opened this issue May 7, 2024 · 10 comments · Fixed by #9874
Assignees

Comments

@Off-Tracker
Copy link

Off-Tracker commented May 7, 2024

What is the bug?

GDAL programs using the pdf vector driver cannot read current geopdf files created in Esri ArcGIS 12.9.3.32739. This means all current maps from NSW SIX, for example. eg 9641-3S Pottsville 4th Edn CollarOn_2022 (at https://portal.spatial.nsw.gov.au/portal/apps/webappviewer/index.html?id=06e3c2e0de1e4efda863854048c613c6) fails to open.

Steps to reproduce the issue

Try to open as vector any GDA2020 map from https://portal.spatial.nsw.gov.au/portal/apps/webappviewer/index.html?id=06e3c2e0de1e4efda863854048c613c6

eg try ogrinfo
gives: ogrinfo failed - unable to open

Versions and provenance

Win 10 64 bit ver 22H2, GDAL 3.8.5, released 2024/04/02 (under OSGeo4W)

Additional context

These files open as usual, with full display of map frames, layers and georeferencing, in Acrobat Reader DC (v. 2021.****).
No response

@jratike80
Copy link
Collaborator

jratike80 commented May 7, 2024

ogrinfo 9031-4S+SIX+BROTHERS.pdf --debug on
PDF: This is a raster-only PDF dataset, but it has been opened in vector-only mode

I wonder it the error message should be printed even without debug, but it seems to be a raster map. Use GDAL raster tools instead of vector ones.

gdal_translate 9031-4S+SIX+BROTHERS.pdf 9031-4S+SIX+BROTHERS.tif

image

The map does have a few layers but I do not know if GDAL is supposed to know how to deal with them individually

image

@Off-Tracker
Copy link
Author

Thanks, but I don't think these are raster pdfs (though they have some raster content, possibly small overviews).
They are vector files with a lot of layers:
Capture

@jratike80
Copy link
Collaborator

jratike80 commented May 7, 2024

Maybe we are downloading different maps?. This looks like a raster map for GDAL https://portal.spatial.nsw.gov.au/download/NSWTopographicMaps/DTDB_GeoReferenced_Raster_CollarOn_161070/2022/25k/9641-3S+POTTSVILLE.pdf

But you are right, Acrobat Reader does find vectors.

This finds something, but not much. Inspired by https://gdal.org/drivers/vector/pdf.html#vector-support

ogrinfo 9641-3S+POTTSVILLE.pdf --config OGR_PDF_READ_NON_STRUCTURED NO -al

@Off-Tracker
Copy link
Author

Off-Tracker commented May 7, 2024

Sorry I do not know too much about your systems. I followed your example on -- debug and got:

C:\OSGeo4W>ogrinfo C:\temp\POTTSVILLE.pdf --debug on pdf
GDAL: Auto register C:\OSGeo4W\apps\gdal\lib\gdalplugins\gdal_ECW_JP2ECW.dll using GDALRegister_ECW_JP2ECW.
GDAL: Auto register C:\OSGeo4W\apps\gdal\lib\gdalplugins\gdal_GEOR.dll using GDALRegister_GEOR.
GDAL: Auto register C:\OSGeo4W\apps\gdal\lib\gdalplugins\gdal_HDF5.dll using GDALRegister_HDF5.
GDAL: Auto register C:\OSGeo4W\apps\gdal\lib\gdalplugins\gdal_MrSID.dll using GDALRegister_MrSID.
GDAL: Auto register C:\OSGeo4W\apps\gdal\lib\gdalplugins\ogr_MSSQLSpatial.dll using RegisterOGRMSSQLSpatial.
GDAL: Auto register C:\OSGeo4W\apps\gdal\lib\gdalplugins\ogr_OCI.dll using RegisterOGROCI.
GDAL: Auto register C:\OSGeo4W\apps\gdal\lib\gdalplugins\ogr_SOSI.dll using RegisterOGRSOSI.
PDF: Found UserUnit in Page --> DPI = 72
PDF: Adobe ISO32000 style Geospatial PDF perhaps ?
PDF: VP length = 2
PDF: Subtype = GEO
PDF: Name = NSW_TopoMap_Layers
PDF: Subtype = GEO
PDF: Name = Locator Map
PDF: Largest BBox in VP array is element 0
PDF: Subtype = GEO
PDF: Bounds[0] = 0.000000
PDF: Bounds[1] = 1.000000
PDF: Bounds[2] = 0.000000
PDF: Bounds[3] = 0.000000
PDF: Bounds[4] = 1.000000
PDF: Bounds[5] = 0.000000
PDF: Bounds[6] = 1.000000
PDF: Bounds[7] = 1.000000
PDF: GPTS[0] = -28.374890000000004164
PDF: GPTS[1] = 153.499949999999984129
PDF: GPTS[2] = -28.500109999999999388
PDF: GPTS[3] = 153.499869999999987158
PDF: GPTS[4] = -28.500109999999999388
PDF: GPTS[5] = 153.625200000000006639
PDF: GPTS[6] = -28.374890000000004164
PDF: GPTS[7] = 153.625130000000012842
PDF: LPTS[0] = 0.000000
PDF: LPTS[1] = 1.000000
PDF: LPTS[2] = 0.000000
PDF: LPTS[3] = 0.000000
PDF: LPTS[4] = 1.000000
PDF: LPTS[5] = 0.000000
PDF: LPTS[6] = 1.000000
PDF: LPTS[7] = 1.000000
PDF: GCS.Type = PROJCS
PDF: GCS.WKT = PROJCS["GDA2020_MGA_Zone_56",GEOGCS["GDA2020",DATUM["GDA2020",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",10000000.0],PARAMETER["Central_Meridian",153.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]
PDF: This is a raster-only PDF dataset, but it has been opened in vector-only mode
PDF: Found UserUnit in Page --> DPI = 72
PDF: Adobe ISO32000 style Geospatial PDF perhaps ?
PDF: VP length = 2
PDF: Subtype = GEO
PDF: Name = NSW_TopoMap_Layers
PDF: Subtype = GEO
PDF: Name = Locator Map
PDF: Largest BBox in VP array is element 0
PDF: Subtype = GEO
PDF: Bounds[0] = 0.000000
PDF: Bounds[1] = 1.000000
PDF: Bounds[2] = 0.000000
PDF: Bounds[3] = 0.000000
PDF: Bounds[4] = 1.000000
PDF: Bounds[5] = 0.000000
PDF: Bounds[6] = 1.000000
PDF: Bounds[7] = 1.000000
PDF: GPTS[0] = -28.374890000000004164
PDF: GPTS[1] = 153.499949999999984129
PDF: GPTS[2] = -28.500109999999999388
PDF: GPTS[3] = 153.499869999999987158
PDF: GPTS[4] = -28.500109999999999388
PDF: GPTS[5] = 153.625200000000006639
PDF: GPTS[6] = -28.374890000000004164
PDF: GPTS[7] = 153.625130000000012842
PDF: LPTS[0] = 0.000000
PDF: LPTS[1] = 1.000000
PDF: LPTS[2] = 0.000000
PDF: LPTS[3] = 0.000000
PDF: LPTS[4] = 1.000000
PDF: LPTS[5] = 0.000000
PDF: LPTS[6] = 1.000000
PDF: LPTS[7] = 1.000000
PDF: GCS.Type = PROJCS
PDF: GCS.WKT = PROJCS["GDA2020_MGA_Zone_56",GEOGCS["GDA2020",DATUM["GDA2020",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",10000000.0],PARAMETER["Central_Meridian",153.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]
PDF: This is a raster-only PDF dataset, but it has been opened in vector-only mode
ogrinfo failed - unable to open 'C:\temp\POTTSVILLE.pdf'.

So could gdal think it is (only) raster because gdal can not handle all of the vector content?
That was the bug I reported.
From what I know (still learning) lots of vector geopdf files have some raster content.
The NSW geopdf files are very high resolution (look in Acrobat Reader), but gdalinfo is only seeing the low-res raster (note the small size):

C:\OSGeo4W>gdalinfo C:\temp\POTTSVILLE.pdf
Driver: PDF/Geospatial PDF
Files: C:\temp\POTTSVILLE.pdf
Size is 1984, 1701

@Off-Tracker
Copy link
Author

Yes, I previously tried --config OGR_PDF_READ_NON_STRUCTURED NO -al -so and noted it gave some, but little, information.
I took that as confirmation that the file is indeed vector, but the gdal driver struggles with it.
I have tried to find whatever version of the PDF spec the help file is referring to, without much luck about structured (vs ordered?) pdf file content. As I said, Adobe handles it fine and shows all of the extensive vector layers.

@Off-Tracker
Copy link
Author

Here is a screen-grab from Reader DC at 6400%. No pixelation. This is a vector geopdf.
Captur2

@jratike80 jratike80 reopened this May 7, 2024
@jratike80
Copy link
Collaborator

I found from the documentation this https://gdal.org/drivers/raster/pdf.html#layers-metadata-domain

gdalinfo 9641-3S+POTTSVILLE.pdf -mdd layers is listing 160 layers.

If I understand right, GDAL can mostly only render the vector PDF into raster. There are some options which affect the rendering https://gdal.org/drivers/raster/pdf.html#raster-pdf.

@rouault rouault self-assigned this May 7, 2024
rouault added a commit to rouault/gdal that referenced this issue May 7, 2024
rouault added a commit to rouault/gdal that referenced this issue May 7, 2024
…rse OGCs in Resources.XObject.Resources.Properties as generated by ArcGIS 12.9 (fixes OSGeo#9870)
rouault added a commit to rouault/gdal that referenced this issue May 7, 2024
rouault added a commit to rouault/gdal that referenced this issue May 7, 2024
…rse OGCs in Resources.XObject.Resources.Properties as generated by ArcGIS 12.9 (fixes OSGeo#9870)
@rouault
Copy link
Member

rouault commented May 7, 2024

There were 3 different issues (including a fundamental 12 year long bug with the order of matrix multiplication that was wrong!) to be able to read that PDF. Now all fixed per #9874

rouault added a commit to rouault/gdal that referenced this issue May 7, 2024
rouault added a commit to rouault/gdal that referenced this issue May 7, 2024
rouault added a commit to rouault/gdal that referenced this issue May 7, 2024
@Off-Tracker
Copy link
Author

Thank you for all the consideration. If I understand correctly, ogr vector programs aim to do much more than convert vector to raster (if only they can open the vector files correctly). There were multiple problems blocking this for some modern vector geopdf files, all of which should be corrected in the latest release (thanks to all the work by rouault).

rouault added a commit to rouault/gdal that referenced this issue May 7, 2024
rouault added a commit to rouault/gdal that referenced this issue May 8, 2024
rouault added a commit to rouault/gdal that referenced this issue May 8, 2024
@Off-Tracker
Copy link
Author

Off-Tracker commented May 9, 2024

Hopefully the revision process will get the geopdf vector driver to report on map frames, which are central to the geodf spec / best practice.
https://portal.ogc.org/files/?artifact_id=40537
At present these frames seem to be ignored, with the reports going straight to lower layers in the structure.
TerraGo toolbar installed (free) in Acrobat Resder DC does a nice job, if it helps to have a comparison.
Capture3

rouault added a commit that referenced this issue May 9, 2024
rouault added a commit that referenced this issue May 9, 2024
…rse OGCs in Resources.XObject.Resources.Properties as generated by ArcGIS 12.9 (fixes #9870)
rouault added a commit that referenced this issue May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants