Skip to content

Table.coordinate_system and Table.interval_type are silently ignored by spatial-predicate transpilation — Closes #88#90

Merged
conradbzura merged 2 commits into
mainfrom
88-honor-table-coordinate-system-and-interval-type
Apr 27, 2026
Merged

Table.coordinate_system and Table.interval_type are silently ignored by spatial-predicate transpilation — Closes #88#90
conradbzura merged 2 commits into
mainfrom
88-honor-table-coordinate-system-and-interval-type

Conversation

@conradbzura
Copy link
Copy Markdown
Collaborator

Summary

Make INTERSECTS, CONTAINS, and WITHIN honor Table.coordinate_system and Table.interval_type so tables that store intervals in non-default conventions (e.g. VCF-style 1-based closed) produce correct SQL. Previously the table side of every spatial comparison was hard-coded for 0-based half-open columns, so any non-default Table configuration silently produced off-by-one results in both range-literal and column-join forms; SpatialSetPredicate (INTERSECTS ANY/ALL) inherited the same bug because it routes through the same helper.

The fix introduces three private helpers in BaseGIQLGenerator that wrap raw start/end column expressions to yield their canonical 0-based half-open equivalents, then wires those helpers into _generate_range_predicate and _generate_column_join. The literal side already arrives in canonical form via RangeParser.to_zero_based_half_open, so once both sides are canonicalized the existing comparison operators (<, >, <=, >=) are correct without per-operator branching.

Conversion table the helpers implement:

coordinate_system interval_type start_canonical end_canonical
0based half_open start_col end_col
0based closed start_col (end_col + 1)
1based half_open (start_col - 1) (end_col - 1)
1based closed (start_col - 1) end_col

DISTANCE and NEAREST still ignore coordinate_system and conflate interval_type with bedtools-compatibility math; that is tracked separately in #89 and out of scope here.

Closes #88

Proposed changes

src/giql/generators/base.py

Add three helpers near _get_column_refs:

  • _resolve_table(column_ref, table_name=None) — resolve the Table config that backs a column reference, mirroring the alias-resolution logic of _get_column_refs so each side of a column-join can be canonicalized independently.
  • _canonical_start(raw_start, table) — staticmethod; wrap a raw start column expression to yield canonical 0-based half-open start. Returns (start_col - 1) for 1-based tables, the raw expression otherwise.
  • _canonical_end(raw_end, table) — staticmethod; wrap a raw end column expression to yield canonical 0-based half-open end. Returns (end_col + 1) for 0-based-closed, (end_col - 1) for 1-based-half-open, raw expression for 0-based-half-open and 1-based-closed.

Modify two existing methods:

  • _generate_range_predicate — call _resolve_table(column_ref, self._current_table), pass the result through _canonical_start / _canonical_end, then run the unchanged comparison-operator branches.
  • _generate_column_join — same pattern, but resolve a Table for each side independently to support cross-convention joins.

tests/generators/test_base.py

Add four fixtures and eleven tests asserting the canonicalization behavior for every (coordinate_system, interval_type) combination across all three predicates and both forms (range-literal and column-join), plus the SpatialSetPredicate flow-through. Tests follow the BDD naming pattern test_<method>_should_<outcome>_when_<condition> and the AAA structure with Given/When/Then docstrings.

Test cases

# Test Suite Given When Then Coverage Target
1 TestBaseGIQLGenerator A 1-based-closed table INTERSECTS is called with a literal range The table-side start is wrapped as (start - 1) and end stays raw INTERSECTS literal under 1-based-closed
2 TestBaseGIQLGenerator A 1-based-closed table CONTAINS is called with a literal range The table-side start is wrapped as (start - 1) and end stays raw CONTAINS literal under 1-based-closed
3 TestBaseGIQLGenerator A 1-based-closed table WITHIN is called with a literal range The table-side start is wrapped as (start - 1) and end stays raw WITHIN literal under 1-based-closed
4 TestBaseGIQLGenerator A 0-based-half-open table joined against a 1-based-closed table INTERSECTS is called between columns from each table Each side is canonicalized independently — 1-based-closed side gets (start - 1), default side stays raw INTERSECTS column-join under mixed conventions
5 TestBaseGIQLGenerator A table declared with explicit defaults coordinate_system="0based" and interval_type="half_open" INTERSECTS is called with a literal range Raw start/end columns are emitted with no arithmetic wrappers Regression guard for default Table
6 TestBaseGIQLGenerator A 0-based-closed table INTERSECTS is called with a literal range The table-side end is wrapped as (end + 1) and start stays raw INTERSECTS literal under 0-based-closed
7 TestBaseGIQLGenerator A 1-based-half-open table INTERSECTS is called with a literal range Both table-side endpoints are wrapped by subtracting 1 INTERSECTS literal under 1-based-half-open
8 TestBaseGIQLGenerator A 0-based-half-open table joined against a 1-based-closed table CONTAINS is called between columns from each table Each side is canonicalized independently while comparison operators stay <= / >= CONTAINS column-join under mixed conventions
9 TestBaseGIQLGenerator A 0-based-half-open table joined against a 1-based-closed table WITHIN is called between columns from each table Each side is canonicalized independently while comparison operators stay >= / <= WITHIN column-join under mixed conventions
10 TestBaseGIQLGenerator A 1-based-closed table CONTAINS is called with a point query chr1:1500 The point-query branch fires and the table-side start is wrapped as (start - 1) CONTAINS point-query under 1-based-closed
11 TestBaseGIQLGenerator A 1-based-closed table INTERSECTS ANY is called with multiple literal ranges Each OR-disjunct is canonicalized independently with (start - 1) on the table side SpatialSetPredicate flow-through under 1-based-closed

@conradbzura conradbzura self-assigned this Apr 25, 2026
@conradbzura conradbzura force-pushed the 88-honor-table-coordinate-system-and-interval-type branch from dde13b6 to 444919e Compare April 26, 2026 01:40
@conradbzura conradbzura marked this pull request as ready for review April 27, 2026 14:10
…cates

INTERSECTS, CONTAINS, and WITHIN ignored the table-storage convention
declared via Table(coordinate_system=..., interval_type=...) and emitted
SQL hard-coded for 0-based half-open columns. Tables backed by other
conventions (e.g. VCF-style 1-based closed) silently received off-by-one
results in both range-literal and column-join forms; SpatialSetPredicate
inherited the same bug because it routes through the same helper.

Introduce two staticmethod helpers, _canonical_start and _canonical_end,
that wrap a raw start/end column expression to yield its canonical
0-based half-open value, plus _resolve_table to look up the Table backing
a column reference. Wire the helpers into _generate_range_predicate and
_generate_column_join so the table side is canonicalized before the
existing comparison operators run. The literal side already arrives in
canonical form via RangeParser.to_zero_based_half_open, so no per-operator
branching is needed.

DISTANCE and NEAREST still ignore coordinate_system and conflate
interval_type with bedtools-compatibility math; that is tracked as a
separate follow-up and out of scope here.
…val combinations

Add eleven tests asserting the new canonicalization behavior for
INTERSECTS, CONTAINS, WITHIN, and SpatialSetPredicate:

- All three predicates against a 1-based-closed table in literal form
- Column-join form for INTERSECTS, CONTAINS, and WITHIN with one
  default-convention table joined against a 1-based-closed table
- INTERSECTS literal against the rare 0-based-closed and
  1-based-half-open conventions to cover the remaining two cells of
  the (coordinate_system, interval_type) matrix
- INTERSECTS literal against an explicitly-default Table to guard
  against accidental wrapping when defaults are passed in
- CONTAINS point-query branch against a 1-based-closed table
- INTERSECTS ANY against a 1-based-closed table to verify the
  fix flows through SpatialSetPredicate transitively

Tests follow the BDD naming pattern test_<method>_should_<outcome>_when_<condition>
and the AAA structure with Given/When/Then docstrings.
@conradbzura conradbzura force-pushed the 88-honor-table-coordinate-system-and-interval-type branch from 444919e to 1179636 Compare April 27, 2026 16:44
@conradbzura conradbzura merged commit 20e54fe into main Apr 27, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Table.coordinate_system and Table.interval_type are silently ignored by spatial-predicate transpilation

1 participant