Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for creating WKB/WKT attributes #1912

Merged
merged 11 commits into from
May 16, 2024
Merged

Conversation

jp-dark
Copy link
Contributor

@jp-dark jp-dark commented Mar 5, 2024

Features:

  • Add WKB/WKT as possible attribute types

Fixes:

  • Fix getting fill value for BLOB, WKB, and WKT types

@jp-dark jp-dark requested a review from perrygeo March 5, 2024 15:49
tiledb/datatypes.py Outdated Show resolved Hide resolved
Co-authored-by: Matthew Perry <perrygeo@users.noreply.github.com>
@perrygeo
Copy link
Contributor

perrygeo commented Mar 18, 2024

I've been testing locally and this works great. This unblocks a whole bunch of testing I need to do! Thanks @jp-dark

I'm using the code below to create a geometry array compatible with GDAL/OGR metadata conventions. This is probably too application-specific for this repo's test suite but I'll leave it here for reference.

import os

import numpy as np
from shapely import box

import tiledb

path = "/tmp/geom_array"
schema = tiledb.ArraySchema(
    domain=tiledb.Domain(
        # Globe in Longitude, Latitude coordinates, 1 degree square tiles
        tiledb.Dim(name="_X", domain=(-180.0, 180.0), tile=1.0, dtype=np.float64),
        tiledb.Dim(name="_Y", domain=(-90.0, 90.0), tile=1.0, dtype=np.float64),
    ),
    attrs=[
        tiledb.Attr(name="fid", dtype=np.int64),
        # GDAL currently only supports blob
        # tiledb.Attr(name="wkb_geometry", dtype="blob"),
        tiledb.Attr(name="wkb_geometry", dtype="wkb"),
    ],
    allows_duplicates=True,
    cell_order="row-major",
    tile_order="row-major",
    capacity=100_000,
    sparse=True,
)

geoms = [
    box(2, 2, 4, 4),
    box(1, 1, 3, 3),
    box(0, 0, 2, 2),
]
wkbs = [g.wkb for g in geoms]
bboxes = [g.bounds for g in geoms]
fids = np.array([i for i, _ in enumerate(geoms)])
n = len(geoms)

# use the centroid of the bounding box as dimensions
xs = [(b[0] + b[2]) / 2 for b in bboxes]
ys = [(b[1] + b[3]) / 2 for b in bboxes]

# use half the max dimensions for padding
max_width = max(b[2] - b[0] for b in bboxes)
max_height = max(b[3] - b[1] for b in bboxes)
padx = max_width / 2
pady = max_height / 2

tiledb.Array.create(path, schema)

# Write OGR Metadata
with tiledb.open(path, "w") as arr:
    # arr.meta["CRS"] = CRS
    arr.meta["GeometryType"] = "Polygon"
    arr.meta["GEOMETRY_ATTRIBUTE_NAME"] = "wkb_geometry"
    arr.meta["LAYER_EXTENT_MINX"] = -180.0
    arr.meta["LAYER_EXTENT_MINY"] = -90.0
    arr.meta["LAYER_EXTENT_MAXX"] = 180.0
    arr.meta["LAYER_EXTENT_MAXY"] = 90.0
    arr.meta["PAD_X"] = padx
    arr.meta["PAD_Y"] = pady
    arr.meta["FEATURE_COUNT"] = n

# Write attributes themselves
with tiledb.open(path, mode="w") as arr:
    data = {
        "fid": fids,
        "wkb_geometry": np.array(
            wkbs,
            dtype="O",
        ),
    }
    arr[xs, ys] = data

Using the blob type results in a valid geometry array according to ogrinfo. Using the wkb type does not, yet.

$ ogrinfo -al /tmp/geom_array
INFO: Open of `/tmp/geom_array'
      using driver `TileDB' successful.
Layer name: geom_array
Geometry: Polygon
Feature Count: 3
Extent: (-180.000000, -90.000000) - (180.000000, 90.000000)
fid: Integer64 (0.0) NOT NULL
wkb_geometry: String (0.0) NOT NULL
ERROR 1: std::get: wrong index for variant

I'll start looking into the GDAL code to see what it will take to support reading/writing the GEOM_WKB type instead of BLOB - It should be doable in a backwards compatible way.

@jp-dark jp-dark changed the title Add support for creating WKB/WKT attributes [WIP] Add support for creating WKB/WKT attributes Mar 25, 2024
@jp-dark
Copy link
Contributor Author

jp-dark commented Mar 25, 2024

Note: This PR requires an update to how WKT is handled (should support strings -> binary blob) and tests for round tripping data for both WKT and WKB.

tiledb/attribute.py Show resolved Hide resolved
Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I'd like to get a ✅ from @jp-dark (time-permitting) before we merge. In particular, is there anything we should re-test here, and do you know if we need to build ogrinfo from source to be able to repro that?

@jp-dark
Copy link
Contributor Author

jp-dark commented Apr 25, 2024

In particular, is there anything we should re-test here, and do you know if we need to build ogrinfo from source to be able to repro that?

Those test were all @perrygeo. I haven't been directly involved in any of the GDAL support or testing. Maybe @rouault can answer about the orginfo changes?

@jp-dark jp-dark requested a review from ihnorton April 25, 2024 14:50
@rouault
Copy link

rouault commented Apr 25, 2024

Maybe @rouault can answer about the orginfo changes?

not sure what the question is :-) I've a queued GDAL pull request in OSGeo/gdal#9726 that will use GEOM_WKB when creating TileDB vector datasets with TileDB >= 2.21. This is independent from TileDB-Py support for GEOM_WKB. I guess that you're unit tests that read/write GEOM_WKB attributes should be more than enough to ensure interoperability with datasets created by GDAL. Or just try the reproducer of #1956

It is not planned that the GDAL TileDB driver will handle GEOM_WKT for now, as this is a rather inefficient way of storing geometries.

But I agree with @jp-dark that on the Python side, I would expect a GEOM_WKT to be dealt as a Python string rather than a blob type

@kounelisagis kounelisagis requested a review from rouault May 15, 2024 10:02
Copy link
Contributor Author

@jp-dark jp-dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jp-dark jp-dark changed the title [WIP] Add support for creating WKB/WKT attributes Add support for creating WKB/WKT attributes May 15, 2024
@jp-dark jp-dark marked this pull request as ready for review May 15, 2024 17:29
@jp-dark jp-dark requested review from rouault and removed request for perrygeo and rouault May 15, 2024 17:30
@kounelisagis kounelisagis merged commit 3abea77 into dev May 16, 2024
34 checks passed
@kounelisagis kounelisagis deleted the jpd/wkt-wkb-support branch May 16, 2024 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants