Skip to content

Table.coordinate_system and Table.interval_type are silently ignored by spatial-predicate transpilation #88

@conradbzura

Description

@conradbzura

Description

The Table schema configuration class (src/giql/table.py) accepts coordinate_system ("0based" | "1based") and interval_type ("half_open" | "closed") keyword arguments, and the schema-mapping documentation (docs/transpilation/schema-mapping.rst:166-172) advertises them as the way to describe tables that store intervals in a non-default convention (e.g. VCF-style 1-based closed). In practice the transpiler silently ignores these settings for almost every operator, so the generated SQL is off-by-one on start and uses the wrong inequality on end whenever a table's storage convention differs from the default 0-based half-open.

Reproduction:

from giql import transpile
from giql.table import Table

sql = transpile(
    "SELECT * FROM variants WHERE interval INTERSECTS 'chr1:100-200'",
    tables=[Table("variants", coordinate_system="1based", interval_type="closed")],
)
print(sql)

The generated SQL uses start < 200 AND end > 100 (half-open math) even though the table is declared as 1-based closed, so a variant at start=200, end=200 (a real overlap in 1-based closed) is excluded, and a variant at start=100, end=100 (also a real overlap) is included only by accident of the > boundary.

Affected operators: INTERSECTS, CONTAINS, WITHIN — both range-literal form (_generate_range_predicate) and column-join form (_generate_column_join). DISTANCE and NEAREST honor interval_type but still ignore coordinate_system.

Expected behavior

A Table("variants", coordinate_system="1based", interval_type="closed") configuration should produce SQL that returns the same logical result set as the equivalent 0-based-half-open table for every spatial operator. Either the transpiler honors the declared convention by adjusting literal-side normalization and inequality operators, or — as a stopgap — Table.__post_init__ raises NotImplementedError for any non-default value so that users do not get silently incorrect results.

Root cause

  • coordinate_system is completely unused. Defined and validated at src/giql/table.py:79,84, but grep -rn coordinate_system src/ crates/ finds zero references in any code-generation path. The only non-table.py reference is a docstring in src/giql/mcp/server.py:509.
  • interval_type is honored only for DISTANCE and NEAREST gap arithmetic. src/giql/generators/base.py:208 (NEAREST) and :343 (DISTANCE) add + 1 to the gap formula when interval_type == "closed". Tests in tests/generators/test_base.py cover those two paths.
  • interval_type is ignored for all spatial predicates. _generate_range_predicate (src/giql/generators/base.py:487-545) and _generate_column_join (:547-588) emit hard-coded half-open comparison operators (<, >, <=, >=) without ever reading self._current_table.interval_type. The user-supplied range literal is normalized to 0-based half-open at base.py:480 via RangeParser.parse(...).to_zero_based_half_open(), but the table-side columns are then compared against those normalized values regardless of how the table actually stores its intervals.

The relevant code in _generate_range_predicate:

# src/giql/generators/base.py:513-543
if op_type == "intersects":
    return (
        f"({chrom_col} = '{chrom}' "
        f"AND {start_col} < {end} "
        f"AND {end_col} > {start})"
    )
elif op_type == "contains":
    ...
elif op_type == "within":
    ...

No branch in this helper consults self._current_table.coordinate_system or self._current_table.interval_type.

Acceptance criteria

  • A Table(..., coordinate_system="1based", interval_type="closed") configuration produces SQL that returns the same logical result set as the equivalent 0-based-half-open table for INTERSECTS / CONTAINS / WITHIN, in both range-literal and column-join forms.
  • New tests (in tests/generators/test_base.py or a sibling) assert that the generated SQL changes when coordinate_system / interval_type change, for each spatial operator.
  • Existing DISTANCE/NEAREST tests continue to pass.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions