Skip to content

allixender/igeo7_duckdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DuckDB IGEO7/Z7 Extension

A DuckDB loadable extension providing native scalar functions for the IGEO7 (Z7) hexagonal discrete global grid system. Wraps the C++ Z7 library for high-performance bit-level index manipulation and neighbour traversal directly inside DuckDB queries.

Background (wrenoud/Z7/ and IGEO7)

This is a work-in-progress to explore the possibility of neighbor traversal in the Z7 domain. Z7 is an indexing system for aperature 7 hexagonal discrete global grid systems, as coined in Kmoch et al. (2025). It is based on the Generalized Balanced Ternary (GBT) numeral system described in Lucas, Gibson (1982), van Roessel (1988), and Wikipedia (2025).

Rotation Pattern

Lucas and Gibson (1982) gave examples of GBT with exclusively counter-clockwise (CCW) rotation. The disadvantage of this approach is that the orientation of the hexagons at the different hierarchy levels quickly diverge from each other. White et al. (1992) proposed an alternating rotation direction (CW, CCW, CW, ...) to maintain an alignment of hexagon orientations across hierarchy levels. Kmoch et al. (2025) adopted this alternating rotation pattern for Z7 assigning CW to odd resolutions and CCW to even resolutions.

  1. Kmoch, A., Sahr, K., Chan, W. T., & Uuemaa, E. (2025). IGEO7: A new hierarchically indexed hexagonal equal-area discrete global grid system. AGILE: GIScience Series, 6(32). https://doi.org/10.5194/agile-giss-6-32-2025
  2. Lucas, D., & Gibson, L. (1982). Automated Analysis of imagery. Air Force Office of Scientific Research. https://apps.dtic.mil/sti/tr/pdf/ADA125710.pdf. (No. AFOSRTR830055).
  3. Sahr, K. (2011). Hexagonal discrete global grid systems for geospatial computing. Archives of Photogrammetry Cartography and Remote Sensing, 22, 363–376.
  4. van Roessel, J. W. (1988). Conversion of Cartesian coordinates from and to Generalized Balanced Ternary addresses. Photogrammetric Engineering & Remote Sensing, 54(11), 1565–1570.
  5. White, D., Kimerling, J. A., & Overton, S. W. (1992). Cartographic and Geometric Components of a Global Sampling Design for Environmental Monitoring. Cartography and Geographic Information Systems, 19(1), 5-22. https://doi.org/10.1559/152304092783786636
  6. Wikipedia Contributors. (2025, December 16). Generalized balanced ternary. Wikipedia; Wikimedia Foundation.

Index format

Z7 packs every cell into a single UBIGINT:

bits [63:60]  base cell  (4 bits, values 0–11)
bits [59:57]  digit 1    (3 bits, values 0–6; 7 = padding / unused slot)
bits [56:54]  digit 2
...
bits  [2:0]   digit 20

The compact string form (igeo7_to_string) is a 2-digit zero-padded base cell followed by the digit characters, stopping before the first 7: 0800432 → base 08, digits 0,0,4,3,2, resolution 5.

The verbose debug form (igeo7_decode_str, SQL macro) always shows all 20 digit slots separated by dots: 08-0.0.4.3.2.7.7.7.7.7.7.7.7.7.7.7.7.7.7.7.


C++ extension functions

Function Signature Description
igeo7_from_string (VARCHAR) → UBIGINT Parse Z7 string to packed index
igeo7_to_string (UBIGINT) → VARCHAR Compact Z7 string (stops before first 7)
igeo7_encode (base, d1…d20) → UBIGINT Pack base cell + 20 digit slots; accepts UTINYINT or INTEGER literals
igeo7_get_resolution (UBIGINT) → INTEGER Resolution (0–20) of an index
igeo7_get_base_cell (UBIGINT) → UTINYINT Base cell ID (0–11)
igeo7_get_digit (UBIGINT, INTEGER) → UTINYINT Extract the i-th digit (1–20)
igeo7_parent (UBIGINT) → UBIGINT Parent index (one level up)
igeo7_parent_at (UBIGINT, INTEGER) → UBIGINT Ancestor at specific resolution
igeo7_get_neighbours (UBIGINT) → UBIGINT[] All 6 neighbours (invalid neighbours = UINT64_MAX)
igeo7_get_neighbour (UBIGINT, INTEGER) → UBIGINT Single neighbour by direction (1–6)
igeo7_first_non_zero (UBIGINT) → INTEGER Position of first non-zero digit slot
igeo7_is_valid (UBIGINT) → BOOLEAN FALSE when index equals the invalid sentinel (UINT64_MAX)

SQL companion macros (sql/igeo7_macros.sql)

Source this file after loading the extension. These macros add convenience operations that compose or extend the C++ functions above.

Macro Signature Description
igeo7_decode_str (UBIGINT) → VARCHAR Verbose base-d1.d2…d20 form showing all 20 digit slots
igeo7_encode_at_resolution (base, res, d1…d20) → UBIGINT Encode then truncate to resolution (igeo7_encode + igeo7_parent_at)
igeo7_string_parent (VARCHAR) → VARCHAR Drop the last character of a compact Z7 string
igeo7_string_local_pos (VARCHAR) → VARCHAR Last character of a compact Z7 string (local position digit)
igeo7_string_is_center (VARCHAR) → BOOLEAN True when the last digit is '0' (center of parent)

Prerequisites

  • CMake ≥ 3.14
  • C++17 compiler (GCC ≥ 9, Clang ≥ 10, MSVC ≥ 19.20)
  • DuckDB v1.5.1 source tree (headers + built library + platform file)

Building

Step 1: Get DuckDB

Clone and build DuckDB at the matching tag. The build produces the headers, shared library, and duckdb_platform_out file needed for the extension metadata footer.

git clone --branch v1.5.1 https://github.com/duckdb/duckdb.git
cd duckdb
BUILD_EXTENSIONS="" make release
cd ..

Step 2: Build the extension

Always pass an absolute path to -DDUCKDB_DIR. A relative path is resolved against the CMake source directory (not the build directory) and can silently pick up the wrong DuckDB installation.

If DuckDB was cloned inside duckdb_igeo7/ (the layout in this repo):

cd duckdb_igeo7
mkdir build && cd build

cmake .. -DDUCKDB_DIR=$(pwd)/../duckdb \
         -DCMAKE_BUILD_TYPE=Release

make -j$(nproc)      # Linux / macOS: nproc or sysctl -n hw.logicalcpu

For a DuckDB checkout elsewhere, use its absolute path directly:

cmake .. -DDUCKDB_DIR=/absolute/path/to/duckdb \
         -DCMAKE_BUILD_TYPE=Release

This produces build/igeo7.duckdb_extension with the required binary metadata footer automatically appended post-build.

Build the standalone Z7 test (no DuckDB required)

cmake .. -DBUILD_TESTS=ON -DDUCKDB_DIR=$(pwd)/../duckdb
make igeo7_test
./igeo7_test

Loading

DuckDB v1.5.1 only loads files ending in .duckdb_extension. Because this extension is not signed by the DuckDB team, pass -unsigned when starting the CLI (or run SET allow_unsigned_extensions = true; inside an existing session).

# One-liner from the duckdb_igeo7/ directory
duckdb/build/release/duckdb -unsigned \
  -c "LOAD 'build/igeo7.duckdb_extension'; SELECT igeo7_to_string(igeo7_from_string('0800432'));"

Inside a session or SQL script:

LOAD '/absolute/path/to/build/igeo7.duckdb_extension';

Loading the SQL companion macros

Pipe the LOAD statement and the macro file together:

cat <(echo "LOAD 'build/igeo7.duckdb_extension';") sql/igeo7_macros.sql \
  | duckdb/build/release/duckdb -unsigned

Or in the interactive shell, source them with .read after loading the extension:

.read sql/igeo7_macros.sql

Usage examples

String ↔ integer round-trip

SELECT igeo7_to_string(igeo7_from_string('0800432'));
-- '0800432'

SELECT igeo7_get_resolution(igeo7_from_string('0800432'));
-- 5

SELECT igeo7_get_base_cell(igeo7_from_string('0800432'));
-- 8

Inspect a cell by its packed integer ID

igeo7_to_string produces the compact form; igeo7_decode_str (SQL macro) produces the verbose dot-separated form showing all 20 slots.

-- compact form
SELECT
    32023330408103935::UBIGINT    AS cell_id,
    igeo7_get_base_cell(cell_id)  AS base,
    igeo7_get_resolution(cell_id) AS res,
    igeo7_to_string(cell_id)      AS compact;

-- ┌───────────────────┬──────┬─────┬─────────────────┐
-- │      cell_id      │ base │ res │     compact     │
-- │      uint64       │ uint8│ int │     varchar     │
-- ├───────────────────┼──────┼─────┼─────────────────┤
-- │ 32023330408103935 │    0 │  13 │ 000161612062413 │
-- └───────────────────┴──────┴─────┴─────────────────┘

-- verbose form (requires sql/igeo7_macros.sql to be sourced first)
SELECT igeo7_decode_str(32023330408103935::UBIGINT);
-- '0-0.1.6.1.6.1.2.0.6.2.4.1.3.7.7.7.7.7.7.7'

Encode a cell from its components

igeo7_encode is the inverse of igeo7_get_base_cell + igeo7_get_digit. Integer literals are accepted without explicit casts. Use 7 for every padding slot beyond the target resolution.

-- Reconstruct '0800432': base=8, resolution=5, digits 0,0,4,3,2
SELECT igeo7_to_string(
    igeo7_encode(8, 0,0,4,3,2, 7,7,7,7,7, 7,7,7,7,7, 7,7,7,7,7)
);
-- '0800432'

-- Must exactly equal igeo7_from_string for the same cell
SELECT igeo7_encode(8, 0,0,4,3,2, 7,7,7,7,7, 7,7,7,7,7, 7,7,7,7,7)
     = igeo7_from_string('0800432') AS equal;
-- true

-- Known cell round-trip: base=0, resolution=13
SELECT (igeo7_encode(0, 0,1,6,1,6,1,2,0,6,2,4,1,3, 7,7,7,7,7,7,7)
        = 32023330408103935::UBIGINT)::BOOLEAN AS same;
-- true

Extract individual digits

SELECT
    igeo7_get_digit(igeo7_from_string('0800432'), 1) AS d1,  -- 0
    igeo7_get_digit(igeo7_from_string('0800432'), 2) AS d2,  -- 0
    igeo7_get_digit(igeo7_from_string('0800432'), 3) AS d3,  -- 4
    igeo7_get_digit(igeo7_from_string('0800432'), 4) AS d4,  -- 3
    igeo7_get_digit(igeo7_from_string('0800432'), 5) AS d5;  -- 2

Parent / ancestor traversal

-- One level up
SELECT igeo7_to_string(igeo7_parent(igeo7_from_string('0800432')));
-- '080043'

-- Ancestor at resolution 3 (keeps digits 1–3, fills the rest with 7)
SELECT igeo7_to_string(igeo7_parent_at(igeo7_from_string('0800432'), 3));
-- '08004'

Neighbours

-- All 6 neighbours as compact strings
SELECT igeo7_to_string(unnest(igeo7_get_neighbours(igeo7_from_string('0800432'))));
-- 0800433, 0800651, 0800064, 0800436, 0800430, 0800655

-- Single neighbour by direction (1–6); invalid neighbours = UINT64_MAX
SELECT igeo7_to_string(igeo7_get_neighbour(igeo7_from_string('0800432'), 1));
-- '0800433'

-- Pentagon check: one direction is always invalid at resolution 2
SELECT
    generate_series                                                            AS dir,
    igeo7_to_string(igeo7_get_neighbour(igeo7_from_string('0800'), generate_series::INTEGER)) AS neighbour,
    igeo7_is_valid(igeo7_get_neighbour(igeo7_from_string('0800'), generate_series::INTEGER))  AS valid
FROM generate_series(1, 6);

Resolution hierarchy walk (recursive CTE)

WITH RECURSIVE hierarchy AS (
    SELECT igeo7_from_string('0800432') AS cell, 0 AS step
    UNION ALL
    SELECT igeo7_parent(cell), step + 1
    FROM   hierarchy
    WHERE  igeo7_get_resolution(cell) > 0
)
SELECT step, igeo7_to_string(cell) AS readable
FROM hierarchy;
-- step 0 → '0800432'
-- step 1 → '080043'
-- step 2 → '08004'
-- step 3 → '0800'   (resolution 2)
-- ...
-- step 5 → '08'     (resolution 0)

Validate before processing

SELECT cell_id, igeo7_to_string(cell_id) AS readable
FROM   my_table
WHERE  igeo7_is_valid(cell_id);

Parquet: decode in one pass

LOAD '/absolute/path/to/build/igeo7.duckdb_extension';

SELECT
    cell_id,
    igeo7_get_base_cell(cell_id)  AS base,
    igeo7_get_resolution(cell_id) AS res,
    igeo7_to_string(cell_id)      AS readable
FROM read_parquet('data/*.parquet')
WHERE igeo7_is_valid(cell_id)
LIMIT 5;

Parquet: partition by parent resolution

COPY (
    SELECT *, igeo7_parent_at(cell_id, 3) AS parent_3
    FROM   read_parquet('data/*.parquet')
    WHERE  igeo7_is_valid(cell_id)
)
TO 'output/' (FORMAT PARQUET, PARTITION_BY (parent_3));

Stencil / kernel query (neighbourhood average)

WITH cells AS (
    SELECT cell_id, value
    FROM   read_parquet('data/*.parquet')
    WHERE  igeo7_is_valid(cell_id)
),
expanded AS (
    SELECT cell_id, value,
           unnest(igeo7_get_neighbours(cell_id)) AS neighbour_id
    FROM   cells
)
SELECT
    e.cell_id,
    igeo7_to_string(e.cell_id) AS center_readable,
    e.value                    AS center_value,
    AVG(c2.value)              AS neighbour_avg
FROM expanded e
JOIN cells c2
  ON  c2.cell_id = e.neighbour_id
 AND  igeo7_is_valid(e.neighbour_id)
GROUP BY e.cell_id, e.value;

Project structure

duckdb_igeo7/
├── CMakeLists.txt                  # Build system; appends metadata footer post-build
├── README.md                       # This file
├── src/
│   ├── igeo7_extension.cpp         # Extension entry point and all C++ functions
│   └── z7/                         # Z7 C++ core library (upstream, unmodified)
│       ├── library.h
│       ├── library.cpp
│       ├── BitFieldProxy.h
│       └── util.h
├── sql/
│   └── igeo7_macros.sql            # Companion SQL macros (source after LOAD)
└── test/
    ├── test_z7.cpp                 # Standalone C++ smoke test (no DuckDB needed)
    └── test_extension.sql          # SQL smoke tests for all extension functions

Troubleshooting

Unsigned extension rejected on LOAD DuckDB v1.5.1 rejects extensions without a cryptographic signature by default. Start the CLI with -unsigned or run SET allow_unsigned_extensions = true; before loading.

ABI / entrypoint mismatch The extension must be built and run against the same DuckDB version. This extension targets v1.5.1 and uses the ExtensionLoader API and extern "C" DUCKDB_CPP_EXTENSION_ENTRY entry point introduced when ExtensionUtil was removed in PR #17772.

"Metadata invalid" on LOAD or INSTALL DuckDB v1.5.1 validates a binary metadata footer at the end of every .duckdb_extension file. This footer is appended automatically by the CMake post-build step when -DDUCKDB_DIR=... is provided. If DUCKDB_DIR is omitted the footer is skipped and loading will fail with a metadata error.

CMake finds the wrong DuckDB (e.g. a Homebrew install) Always pass an absolute path to -DDUCKDB_DIR. A relative path like ../duckdb is resolved from the CMake source directory, not the build directory, and can silently point elsewhere. Use $(pwd)/../duckdb to force shell expansion before CMake sees it.

Hex literal errors in SQL DuckDB does not support 0x hex literals. Use decimal equivalents (e.g., 15 instead of 0x0F).


Performance notes

Bit-manipulation functions (igeo7_get_resolution, igeo7_get_digit, igeo7_parent_at, igeo7_encode, etc.) are essentially free — a handful of shifts and masks per row, fully vectorised inside DuckDB's execution engine.

igeo7_get_neighbours performs GBT table lookups and carry propagation for each of 6 directions across up to 20 digit slots, all in native C++. Expect tens of millions of neighbour computations per second on a single core.

SQL macros (igeo7_decode_str, string helpers) expand to DuckDB SQL expressions at plan time. igeo7_decode_str in particular emits 20 shift/mask operations and a multi-argument CONCAT per row. For bulk inspection of millions of cells, prefer igeo7_to_string (C++) and reserve igeo7_decode_str for small result sets or debugging.


License

The Z7 core library files retain their original SPDX headers (see individual source files). The DuckDB extension wrapper (src/igeo7_extension.cpp, CMakeLists.txt, sql/) is currently AGPL licensed (due to DGGRID reference, could be revisited for later versions)

About

An experimental IGEO7 / Z7 extension for Duckdb

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors