Sync release with master#22
Closed
conradbzura wants to merge 2 commits into
Closed
Conversation
* Implement genomic interval SQL operators * Initialize project * Implement binary predicate operators and genomic interval notation parser * Rename OVERLAPS to INTERSECTS to align with bedtools and eliminate conflict with existing OVERLAPS keyword * Implement expected schema and dialect parser * Implement GIQL to SQL transpiler * Implement GIQL query engine * Make all imports absolute * Fix bug preventing GIQL operators from being used alongside standard SQL operators and in CTEs * Add bedtools intersect-like recipes to readme * Add documentation * Implement CLUSTER and MERGE UDFs * Quote customizable field names to avoid collisions with protected SQL keywords * Fully implement custom schema support * Implement "giql intersect" CLI command mirroring "bedtools intersect" API and functionality * Replace GIQLEngine.query() with new a execute() method that returns a PEP 249-like cursor * Add transpile() method to GIQLEngine class and update docs * Update README to include installation instructions * Fix bug in MERGE and CLUSTER queries that incorporate CTEs; Add demo notebook with various usage examples * Drafted GIQL's governing project constitution * Created feature specification for "DISTANCE" operator * Implement DISTANCE UDF for caluculating the distance(s) between intervals * Update demo.ipynb with DISTANCE examples * Update DISTANCE operator to remove "signed" parameter and change behavior of "stranded" parameter when intervals are on opposite strands * Create feature specification for "NEAREST" operator * Implement NEAREST operator * Fix formatting * Implement integration test suite evaluating GIQL behavior against bedtools (#5) * Update core principle I Co-authored-by: Nezar Abdennur <nabdennur@gmail.com> * Update rules for core principle I Co-authored-by: Nezar Abdennur <nabdennur@gmail.com> * Update core principle II Co-authored-by: Nezar Abdennur <nabdennur@gmail.com> * Update core principle III Co-authored-by: Nezar Abdennur <nabdennur@gmail.com> * Update rationale for core principle III Co-authored-by: Nezar Abdennur <nabdennur@gmail.com> * Clean up speckit artifacts we don't want to track * Refactor documentation * Changed default genomic column name from "position" to "interval" * Update readme * Implement CICD * Update project authors and maintainers * Implement build hook for dynamic version resolution from git tag * Update readme * Add hypothesis to dev dependencies * Add pybedtools to dev dependencies * Update test workflow to use pixi * Update test workflow to install GIQL in editable mode * Lower code coverage target for now * Describe tags of head branch when tagging new version in publish-release workflow * Separate bump and tag version jobs in publish-release workflow --------- Co-authored-by: Nezar Abdennur <nabdennur@gmail.com> * Treat undefined release cycle as stable --------- Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>
conradbzura
added a commit
that referenced
this pull request
May 20, 2026
…logical PBT, harden edge cases Acts on the deferred items from the 10-agent re-review of PR #93. Source changes (src/giql/transformer.py): - Reject 'SELECT a.* AS x' rather than silently dropping the user alias. Most engines reject star-with-alias outright; the dialect now raises ValueError with a message pointing at the two valid alternatives (drop the alias or list columns explicitly). Closes the previously-deferred R4 finding. - Defensive subquery gate at transform_to_sql: if any of GROUP BY / HAVING / ORDER BY embeds a subquery, fall back to the binned plan. The modifier rewriter is not scope-aware, so this prevents a future relaxation from accidentally rewriting columns inside a nested scope. - Document the traversal-order assumption on allocate() and the lexical-scope assumption on _rewrite_modifier_refs (R4 follow-ups). Test tightening (tests/test_duckdb_iejoin.py): - ADD-R-006: test that 'SELECT a.* AS x' raises ValueError naming the unsupported star-with-alias shape. - R6 #19: relax 'assert type(...) is ValueError' to isinstance, so the test doesn't forbid future ValueError subclasses with richer error info. - R6 #20: replace the brittle inner_aliases == {__giql_p0, __giql_p1} assertion with a behavior-level dedup check (a."score" must be allocated to exactly one inner alias regardless of its index). - R6 #21: tighten the multi-key ORDER BY DESC-attachment regex to pin DESC positionally to the second term. - R6 #22: anchor the BETWEEN / IS NULL inline assertions to the SET VARIABLE half of the SQL (the per-chromosome template) and assert the predicate does NOT appear in the outer SELECT wrapper. - R6 #17: rephrase three test docstrings to avoid naming private helpers (_strip_intersects, _can_inline_extras) — describe the behavior in user-visible terms instead. - R7 #9: pivot the near-duplicate cross-side ON test to a genuinely new shape — cross-side predicate in WHERE rather than ON, exercising the WHERE-extras extraction path rather than the JOIN-ON extras path. - R7 #10: tighten the kitchen-sink test's HAVING from the no-op 'COUNT(*) > 0' to 'SUM(a.score) > 40' (now actually filters one group out), add SUM(a.score) to the SELECT list so multiple aggregates coexist, and document the row-by-row truth table in the docstring. - R7 #11: add a Python-native expected list to the coord-system parametric test so a regression where both plans share the same bug can't silently pass. - R8 #12: add an executing AVG-equivalence example test (test_query_should_match_binned_plan_avg_aggregate_when_dialect_is_duckdb) that compares binned vs dialect AVG values with pytest.approx — fills the gap the aggregate PBT intentionally leaves. - R8 #13: fix the 'is_null' PBT tautology. The score strategy now draws st.one_of(st.none(), st.integers(0, 50)) so WHERE a.score IS NULL / IS NOT NULL are no longer tautologically true/false. - R8 #14, #15, #16: full-composite PBT now uses min_size=0 (covers empty-input + Tier 1 stack), max_examples=60 (was 30 — the largest input space deserves more sampling), and limit= st.integers(min_value=0) (LIMIT 0 edge case now exercised). Skip executemany when the parameter list is empty (DuckDB rejects it). Tests: 110 dialect tests pass (was 108). Full suite: 1184 passed, 1 skipped, 90 xfailed (all xfails pre-existing in unrelated modules). Refs #93.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Auto-generated by the sync branches workflow.