Implement UNION / UNION ALL#281
Open
mlell wants to merge 4 commits into
Open
Conversation
The function pre-filtered the input string accepting only day, month, or year, while the function accepts more inputs. Change the regex to split only number from word.
Previously the grammar rule `select` owned ORDER BY, LIMIT and PIVOT BY
directly, and the parser returned a bare `ast.Select`. This conflated the
data extraction (defining columns, a source, filters and grouping) with
result-set modifiers (ORDER BY, LIMIT and PIVOT BY) that act on whatever comes
out of that expression. Splitting this is a preparation for a future UNION
chain.
The new `ast.Query` node separates these concerns. The grammar rule
`query::Query` wraps one (as of now) SELECT body and claims ORDER BY,
LIMIT, and PIVOT BY for itself; `ast.Select` is now a pure table expression
with no sorting or paging fields. `parse()` always returns `ast.Query`, even
for the simplest `SELECT *`.
**ast.py**: `Select` loses `order_by`, `limit`, and `pivot_by`; new `Query`
node carries those fields and wraps a list of `Select` nodes.
**bql.ebnf**: `select` rule no longer contains ORDER BY / LIMIT / PIVOT BY;
a new `query::Query` rule wraps `select` and owns those clauses. Rename
`subselect` to `subquery`, reflecting the change of top level `select` ->
`query`. This delegates to `query` so parenthesised sub-queries may carry
their own result-set modifiers. Updated `any` and `all` rules to avoid double
parentheses when used with subselects. The `expression` rule requires
`subquery` (instead of formerly `select`) to avoid ambiguities like
`SELECT SELECT x FROM y WHERE z`.
**query_compile.py**: Rename `EvalQuery` to `EvalSelect`. The dataclass holds
the compiled SELECT body (table, targets, where, group_indexes, having_index,
distinct). A new `EvalQuery` now wraps `EvalSelect` and owns `order_spec` and
`limit`. `EvalQuery` properties `columns` and `c_targets` are retained, these
are forwarded from the nested SELECT. In the future, this will only be
possible for single-SELECT queries (not e.g., UNION chains).
**compiler.py**: New `_query` dispatch handler is extracted from `_select`.
`_select` compiles the inner SELECT body until GROUP BY. `_query` then
compiles ORDER BY, performs the aggregate coverage check, and finally compiles
LIMIT and PIVOT BY. In the function `_compile_from`, the subquery detection
is updated from `ast.Select` to `ast.Query`. A new check rejects
`SELECT DISTINCT ... ORDER BY <col>` when `<col>` is not in the SELECT list,
since this would produce non-deterministic results. This avoids handling
DISTINCT on Query level.
**query_execute.py**: New `execute_query()` wraps `execute_select()`, ensuing
in changes in control flow:
Before:
execute_select(query)
├── Compute result_types (visible columns only)
├── Compute result_indexes (visible column indices)
├── Execute query (non-aggregated or aggregated path)
├── ORDER BY (on full rows)
├── Extract visible columns into result tuples
├── DISTINCT (on extracted rows)
├── LIMIT
└── Return (result_types, rows)
After:
execute_query(query) ← New entry point
├── query.select() ← Delegates to EvalQuery.select()
│ └── execute_select(query) ← Returns ALL columns + visibility mask
│ ├── Compute result_types (ALL columns)
│ ├── Compute visible_mask
│ ├── Execute query (non-aggregated or aggregated)
│ ├── DISTINCT (on visible columns, but keeps full rows)
│ └── Return (result_types, rows, visible_mask)
│
├── ORDER BY (on full rows)
├── Extract visible columns
├── LIMIT
└── Return (result_types, rows)
**transform_journal / transform_balances**: These template-based desugaring
functions now return `ast.Query` wrapping the constructed `ast.Select`, so
ORDER BY from the BALANCES template reaches the `_query` handler through the
normal path.
**Tests**: Updated to expect `ast.Query` from parser, access `query.select`
for inner fields, and construct `EvalQuery(select=EvalSelect(...), ...)`.
Type coercion for numeric operands will be needed in multiple contexts: Binary operators (existing) and UNION type compatibility checking (upcoming). Currently, the coercion logic is duplicated in _binaryop. Extracting it into a reusable helper enables both contexts to apply consistent type coercion rules, particularly the int→Decimal promotion that avoids information loss. Changes: - Add _try_coerce_operand(operand, target_type) helper method to Compiler. Returns coerced operand or None if coercion is not possible Encapsulates: type equality check, int→Decimal promotion, function lookup - Refactor _binaryop to use _try_coerce_operand - Add unit tests for _try_coerce_operand
BQL previously supported only single SELECT statements. This change introduces UNION and UNION ALL set operators so that multiple SELECT operands can be combined into one result set, with optional ORDER BY, LIMIT, and PIVOT BY applied to the combined output. Grammar (bql.ebnf): extend the query rule to accept a chain of SELECT operands separated by UNION or UNION ALL tokens; the resulting AST carries a parallel set_operators list (length = number of operands − 1). AST (ast.py): add the set_operators field to the Query node and document its semantics. Compiler (compiler.py): compile each operand independently against the original table context, validate that all operands have the same column count and compatible types, and auto-coerce int/Decimal mismatches to Decimal using the existing _try_coerce_operand helper. Make `_query()` a top-level dispatcher for two flows: * Simple query with only one SELECT: `_compile_single_select_query` * Query that joins multiple SELECTs using UNION: `_compile_union_query` Runtime (query_compile.py): introduce EvalUnion, a new dataclass that accumulates rows across sub-queries and applies deduplication on UNION boundaries while preserving insertion order. EvalUnion has the same interface as EvalSelect (returns result_types, rows, visible_mask), and is wrapped by EvalQuery which handles ORDER BY, LIMIT, and visible column extraction uniformly for both single SELECTs and UNIONs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR allows to join SELECT queries by UNION ALL (concatenate) or UNION (concat and dedup).
This is the first part of reworking #265 to first provide a union based on which GROUP BY GROUPING SETS / GROUP BY ROLLUP, etc. can be implemented, like suggested in the PR.
There are two commits that only refactor without changing functionalities to prepare for the third commit that adds the UNION clause. The refactoring is mainly to introduce "Query" as a new top-level AST entity which either wraps a single SELECT or a UNION of multiple SELECTs.
EvalQueryis renamed toEvalSelectfor clearer terminology as it relates toast.Select. A newEvalQuerynode is introduced which takes the responsibility for ORDER BY, LIMIT, and PIVOT BY fromEvalSelect(the oldEvalQuery). The reason is that the last ORDER BY, LIMIT, etc. of a UNION apply to the end result of the union. To apply those operators to a single SELECT operand inside UNION, use subqueries like (SELECT ... ORDER BY ... ) UNION .... `