Skip to content

Release v0.1 (third time's the charm...)#12

Merged
conradbzura merged 51 commits into
masterfrom
release
Dec 8, 2025
Merged

Release v0.1 (third time's the charm...)#12
conradbzura merged 51 commits into
masterfrom
release

Conversation

@conradbzura
Copy link
Copy Markdown
Collaborator

No description provided.

conradbzura and others added 30 commits December 8, 2025 11:56
…vior of "stranded" parameter when intervals are on opposite strands
Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>
Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>
Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>
conradbzura and others added 19 commits December 8, 2025 11:56
Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>
Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>
* Initialize project

* Implement binary predicate operators and genomic interval notation parser

* Rename OVERLAPS to INTERSECTS to align with bedtools and eliminate conflict with existing OVERLAPS keyword

* Implement expected schema and dialect parser

* Implement GIQL to SQL transpiler

* Implement GIQL query engine

* Make all imports absolute

* Fix bug preventing GIQL operators from being used alongside standard SQL operators and in CTEs

* Add bedtools intersect-like recipes to readme

* Add documentation

* Implement CLUSTER and MERGE UDFs

* Quote customizable field names to avoid collisions with protected SQL keywords

* Fully implement custom schema support

* Implement "giql intersect" CLI command mirroring "bedtools intersect" API and functionality

* Replace GIQLEngine.query() with new a execute() method that returns a PEP 249-like cursor

* Add transpile() method to GIQLEngine class and update docs

* Update README to include installation instructions

* Fix bug in MERGE and CLUSTER queries that incorporate CTEs; Add demo notebook with various usage examples

* Drafted GIQL's governing project constitution

* Created feature specification for "DISTANCE" operator

* Implement DISTANCE UDF for caluculating the distance(s) between intervals

* Update demo.ipynb with DISTANCE examples

* Update DISTANCE operator to remove "signed" parameter and change behavior of "stranded" parameter when intervals are on opposite strands

* Create feature specification for "NEAREST" operator

* Implement NEAREST operator

* Fix formatting

* Implement integration test suite evaluating GIQL behavior against bedtools (#5)

* Update core principle I

Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>

* Update rules for core principle I

Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>

* Update core principle II

Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>

* Update core principle III

Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>

* Update rationale for core principle III

Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>

* Clean up speckit artifacts we don't want to track

* Refactor documentation

* Changed default genomic column name from "position" to "interval"

* Update readme

* Implement CICD

* Update project authors and maintainers

* Implement build hook for dynamic version resolution from git tag

* Update readme

* Add hypothesis to dev dependencies

* Add pybedtools to dev dependencies

* Update test workflow to use pixi

* Update test workflow to install GIQL in editable mode

* Lower code coverage target for now

---------

Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>
@conradbzura conradbzura merged commit 2848078 into master Dec 8, 2025
4 checks passed
@conradbzura conradbzura deleted the release branch December 8, 2025 19:17
conradbzura added a commit that referenced this pull request May 20, 2026
…logical PBT, harden edge cases

Acts on the deferred items from the 10-agent re-review of PR #93.

Source changes (src/giql/transformer.py):

- Reject 'SELECT a.* AS x' rather than silently dropping the user
  alias. Most engines reject star-with-alias outright; the dialect
  now raises ValueError with a message pointing at the two valid
  alternatives (drop the alias or list columns explicitly). Closes
  the previously-deferred R4 finding.
- Defensive subquery gate at transform_to_sql: if any of GROUP BY /
  HAVING / ORDER BY embeds a subquery, fall back to the binned plan.
  The modifier rewriter is not scope-aware, so this prevents a future
  relaxation from accidentally rewriting columns inside a nested
  scope.
- Document the traversal-order assumption on allocate() and the
  lexical-scope assumption on _rewrite_modifier_refs (R4 follow-ups).

Test tightening (tests/test_duckdb_iejoin.py):

- ADD-R-006: test that 'SELECT a.* AS x' raises ValueError naming
  the unsupported star-with-alias shape.
- R6 #19: relax 'assert type(...) is ValueError' to isinstance, so
  the test doesn't forbid future ValueError subclasses with richer
  error info.
- R6 #20: replace the brittle inner_aliases == {__giql_p0, __giql_p1}
  assertion with a behavior-level dedup check (a."score" must be
  allocated to exactly one inner alias regardless of its index).
- R6 #21: tighten the multi-key ORDER BY DESC-attachment regex to
  pin DESC positionally to the second term.
- R6 #22: anchor the BETWEEN / IS NULL inline assertions to the SET
  VARIABLE half of the SQL (the per-chromosome template) and assert
  the predicate does NOT appear in the outer SELECT wrapper.
- R6 #17: rephrase three test docstrings to avoid naming private
  helpers (_strip_intersects, _can_inline_extras) — describe the
  behavior in user-visible terms instead.
- R7 #9: pivot the near-duplicate cross-side ON test to a genuinely
  new shape — cross-side predicate in WHERE rather than ON,
  exercising the WHERE-extras extraction path rather than the
  JOIN-ON extras path.
- R7 #10: tighten the kitchen-sink test's HAVING from the no-op
  'COUNT(*) > 0' to 'SUM(a.score) > 40' (now actually filters one
  group out), add SUM(a.score) to the SELECT list so multiple
  aggregates coexist, and document the row-by-row truth table in
  the docstring.
- R7 #11: add a Python-native expected list to the coord-system
  parametric test so a regression where both plans share the same
  bug can't silently pass.
- R8 #12: add an executing AVG-equivalence example test
  (test_query_should_match_binned_plan_avg_aggregate_when_dialect_is_duckdb)
  that compares binned vs dialect AVG values with pytest.approx —
  fills the gap the aggregate PBT intentionally leaves.
- R8 #13: fix the 'is_null' PBT tautology. The score strategy now
  draws st.one_of(st.none(), st.integers(0, 50)) so WHERE a.score
  IS NULL / IS NOT NULL are no longer tautologically true/false.
- R8 #14, #15, #16: full-composite PBT now uses min_size=0 (covers
  empty-input + Tier 1 stack), max_examples=60 (was 30 — the
  largest input space deserves more sampling), and limit=
  st.integers(min_value=0) (LIMIT 0 edge case now exercised).
  Skip executemany when the parameter list is empty (DuckDB rejects
  it).

Tests: 110 dialect tests pass (was 108). Full suite: 1184 passed,
1 skipped, 90 xfailed (all xfails pre-existing in unrelated
modules).

Refs #93.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant