Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distinguish between a column in WKB and a column in native 'geometry' format? #299

Closed
cboettig opened this issue Apr 9, 2024 · 1 comment

Comments

@cboettig
Copy link

cboettig commented Apr 9, 2024

Sorry if this is a silly question, but what is the best way to tell if a geometry column in the native geom representation or the WKB blob?

  • I read a given dataset in from geoparquet serialization. It has a geometry column in WKB, but because it is parquet, it read by the standard parquet parser, and typed as a blob, not a wkb_blob. To use spatial functions, I will first need to use st_GeomFromWKB(geometry), and if I want to write this back out as geoparquet (or allow other gdal-based parsers like R's sf package to read it) I will then need to turn it back from geometry using st_AsWKB(). Notably, I can't call st_AsWKB() if it is already in WKB, because I will get function not found error. If I don't operate on the geometry column, I can leave it as it came in and write it back out of course.

  • If I read the same dataset in as flatgeobuf say (or any other vector format supported by gdal, e.g. via ogr2ogr conversion), it will be parsed by duckdb so I will already have a duckdb native geometry, and I operate on it directly with st_ functions, but must use st_AsWKB() before writing it out as parquet or passing it other gdal tools like sf. (even if I don't operate on the geometry column, I must st_asWKB() before I can pass it to something else).

Did I get that right? As such, in writing a workflow or toolchain that is designed to operate on either possible input format, I'm a bit lost as to how best to decide when need to do conversion to a native format and when I already have one. Naively, it feels like the simplest solution would be if the coercions didn't fail if the input column was already in the desired format, e.g. if st_AsWKB() was happy doing a no-op when given something that was already WKB, and vice versa?

Apologies if I'm thinking about this all wrong, advice appreciated!

@Maxxen
Copy link
Member

Maxxen commented Apr 9, 2024

Did I get that right?

Yes. We've been talking about adding something like a generic try() operator that would enable similar behavior to TRY_CAST but for non-cast conversion functions. I guess for now you could also roll your own with something like

SELECT 
  CASE 
    WHEN typeof(geom_or_blob) = 'GEOMETRY' 
    THEN ST_AsWKB(geom_or_blob) 
    ELSE geom_or_blob 
  END AS wkb
FROM t1;

Additionally, ST_Read takes an additional keep_wkb = true parameter that will make it output geometries as WKB_BLOB type, which is just an alias over BLOB that are implicitly castable to BLOB as well as GEOMETRY. You can pass these to COPY (FORMAT GDAL) too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants