Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in gdal.VectorTranslate for explodecollections=True + some weird python logging behaviour #8523

Closed
theroggy opened this issue Oct 6, 2023 · 10 comments · Fixed by #8539
Assignees

Comments

@theroggy
Copy link
Contributor

theroggy commented Oct 6, 2023

Expected behavior and actual behavior.

This issue raises several things that could be improved:

  1. When gdal.VectorTranslate is used on a gpkg file containing multipolygons with explodeCollections=True, the default behaviour is to preserve the fid's. However, when explodecollections is being applied, normally is will be impossible to preserve fid's because duplicate fid's will appear. So, an error is thrown.
  2. The Exception thrown doesn't contain the error needed to understand what is happening.
  3. There is some strange interference happening between using gdal.UseExceptions() and enabling debug logging (e.g. by setting os.environ["CPL_DEBUG"] = "ON"). More details below.

Steps to reproduce the problem.

Code snippet

import os
from osgeo import gdal

# There is some weird behaviour here:
#   - If only gdal.UseExceptions() is used, an exception is thrown, but not an
#     interesting one:
#   - If gdal.UseExceptions() + os.environ["CPL_DEBUG"] = "ON", the debug logging stops
#     being written too soon and no exception is thrown.
#   - If only os.environ["CPL_DEBUG"] = "ON", the debug logging contains all necessary
#     information, but obviously no exception is thrown.

gdal.UseExceptions()
# os.environ["CPL_DEBUG"] = "ON"

src = "https://github.com/geofileops/geofileops/raw/main/tests/data/polygon-parcel.gpkg"
dst = "/vsimem/output.gpkg"
dst = "C:/Temp/output.gpkg"
options = gdal.VectorTranslateOptions(explodeCollections=True)
ds_output = gdal.VectorTranslate(srcDS=src, destNameOrDestDS=dst, options=options)

When gdal.UseExceptions() is called and "CPL_DEBUG" is not set

Like this, an exception is thrown, but it doesn't contain the error needed to understand the problem.

The error thrown is RuntimeError: sqlite3_exec(CREATE TRIGGER "trigger_delete_feature_count_parcels" AFTER DELETE ON "parcels" BEGIN UPDATE gpkg_ogr_contents SET feature_count = feature_count - 1 WHERE lower(table_name) = lower('parcels'); END;) failed: no such table: main.parcels

The useful error to be thrown would be ERROR 1: failed to execute insert : UNIQUE constraint failed: parcels.fid

Traceback (most recent call last):
  File "C:\Users\Gebruiker\Documents\GitHub\pysnippets\pysnippets\bug_reproducing\gdal_error_explode.py", line 7, in <module>
    ds_output = gdal.VectorTranslate(srcDS=src, destNameOrDestDS=dst, options=options)
  File "C:\Tools\mambaforge\envs\pysnippets\lib\site-packages\osgeo\gdal.py", line 1381, in VectorTranslate
    return wrapper_GDALVectorTranslateDestName(destNameOrDestDS, srcDS, opts, callback, callback_data)
  File "C:\Tools\mambaforge\envs\pysnippets\lib\site-packages\osgeo\gdal.py", line 5438, in wrapper_GDALVectorTranslateDestName
    return _gdal.wrapper_GDALVectorTranslateDestName(*args)
RuntimeError: sqlite3_exec(CREATE TRIGGER "trigger_delete_feature_count_parcels" AFTER DELETE ON "parcels" BEGIN UPDATE gpkg_ogr_contents SET feature_count = feature_count - 1 WHERE lower(table_name) = lower('parcels'); END;) failed: no such table: main.parcels

When gdal.UseExceptions() is NOT called but "CPL_DEBUG" is set to "ON"

Like this, the logging is OK, but no exception thrown (logically).

...
ERROR 1: failed to execute insert : UNIQUE constraint failed: parcels.fid
ERROR 1: Unable to write feature 9 from layer parcels.
ERROR 1: Terminating translation prematurely after failed
translation of layer parcels (use -skipfailures to skip errors)
GPKG: 9 rows inserted into rtree_parcels_geom
GPKG: Creating insert/delete feature_count triggers
ERROR 1: sqlite3_exec(CREATE TRIGGER "trigger_insert_feature_count_parcels" AFTER INSERT ON "parcels" BEGIN UPDATE gpkg_ogr_contents SET feature_count = feature_count + 1 WHERE lower(table_name) = lower('parcels'); END;) failed: no such table: main.parcels
ERROR 1: sqlite3_exec(CREATE TRIGGER "trigger_delete_feature_count_parcels" AFTER DELETE ON "parcels" BEGIN UPDATE gpkg_ogr_contents SET feature_count = feature_count - 1 WHERE lower(table_name) = lower('parcels'); END;) failed: no such table: main.parcels
...

When gdal.UseExceptions() is called AND "CPL_DEBUG" is set to "ON"

With this combo, the logging stops being written before the interesting stuff happend and no exception is thrown.

...
GDALVectorTranslate: Using FID=fid and -preserve_fid
GPKG: ResetStatement(SELECT m."fid", m."geom", m."OIDN", m."UIDN", m."index", m."HFDTLT", m."LBLHFDTLT", m."GEWASGROEP", m."PM", m."LBLPM", m."LENGTE", m."OPPERVL", m."DATUM" FROM "parcels" m)
Warning 1: Non-conformant content for record 1 in column DATUM, 2020-05-01T00:00:00Z, successfully parsed
GPKG: 9 rows inserted into rtree_parcels_geom
GPKG: Creating insert/delete feature_count triggers
GDAL: GDALClose(C:/Temp/output.gpkg, this=000001EFB7BAE050)
GDAL: GDALClose(https://github.com/geofileops/geofileops/raw/main/tests/data/polygon-parcel.gpkg, this=000001EFB75DB2D0)
GDAL: In GDALDestroy - unloading GDAL shared library.

Operating system

Windows 10

GDAL version and provenance

Version 3.7.2 installed via conda-forge.

@theroggy theroggy changed the title Error in gdal.VectorTranslate for explodecollections=True + some weird behaviour Error in gdal.VectorTranslate for explodecollections=True + some weird python logging behaviour Oct 9, 2023
@elpaso elpaso self-assigned this Oct 11, 2023
@elpaso
Copy link
Collaborator

elpaso commented Oct 11, 2023

I can reproduce the issue(s), I have a question about the expected behavior though: the insert is failing because of the unique constraint violation and this is expected but an (unexpected?) consequence is that the output table is not created either, which leads to the second error when the dataset cache is flushed and the driver attempts to create the triggers on the non-existent table.

The interaction between CPL_DEBUG and the exceptions is IMO a separate issue.

@rouault
Copy link
Member

rouault commented Oct 11, 2023

I have a question about the expected behavior though

we should disable the fid preservation when explodecollections is specified (ie opt automatically for unsetFid mode)

@elpaso
Copy link
Collaborator

elpaso commented Oct 11, 2023

I have a question about the expected behavior though

we should disable the fid preservation when explodecollections is specified (ie opt automatically for unsetFid mode)

Agreed, should we issue a warning in this case? Or just a debug msg?

@rouault
Copy link
Member

rouault commented Oct 11, 2023

We should probably error out if both -preserve_fid and -explode_collections are specified. And if -explode_collections is specified, but neither -preserved_fid nor -unsetFid, a CPLDebug() will be sufficient

@jratike80
Copy link
Collaborator

jratike80 commented Oct 11, 2023

From user perspective there should be one more option -keep_fid_as_copy. I mean that if -explodecollections is forcing -unsetFid then the original fid column is dropped from the output and user can't connect the new features with the source ones. A workaround to use SQL select fid as native_fid, *... is tricky.

When it comes to SQL I believe that it is impossible to predict if the query will make duplicate fids. User should also in that case get an understandable error message. However, I cannot discover any good example about such query.

@rouault
Copy link
Member

rouault commented Oct 11, 2023

From user perspective there should be one more option -keep_fid_as_copy

I'd argue we have enough options already related to FID and ogr2ogr code is already super tricky, so I'm -0 on adding a new one for that case

I mean that if -explodecollections is forcing -unsetFid then the original fid column is dropped from the output and user can't connect the new features with the source ones. A workaround to use SQL select fid as native_fid, *... is tricky.

I would be perfectly fine with the workaround. explodeCollections + preserving fid is very advanced usage. I'd suggest to just add a note in the doc of -explodeCollections in ogr2ogr doc page to use the above type of SQL if FID preservation in some form is desired.

@jratike80
Copy link
Collaborator

After a second thought I agree. Note in the documentation is enough. By a quick search from gis.exchange I found one question that clearly deals with fids which were duplicated and user did not understand why.

@theroggy
Copy link
Contributor Author

theroggy commented Oct 11, 2023

Sounds all great! Thanks for looking into this.

The interaction between CPL_DEBUG and the exceptions is IMO a separate issue.

I think there are indeed 3 +- seperate problems, but because it is a single case to reproduce them all, I opted to put them together in one issue.

@theroggy
Copy link
Contributor Author

theroggy commented Oct 12, 2023

Oops, now the testcase doesn't work anymore for the other 2 issues?

Especially the 3rd is quite annoying and is probably a general issue also in other situations, but the second would have been great to understand why this was happening as well, because probably also applicable in other situations.

@elpaso
Copy link
Collaborator

elpaso commented Oct 12, 2023

@theroggy would you please open a new ticket for the other issues (you can link to this one for the details)?

Having "atomic" tickets simplifies developers work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants