Skip to content

Add variable field support in aggregations#18

Merged
Atlas07 merged 15 commits intoahrefs:masterfrom
Atlas07:field-var-aggregations
Mar 26, 2026
Merged

Add variable field support in aggregations#18
Atlas07 merged 15 commits intoahrefs:masterfrom
Atlas07:field-var-aggregations

Conversation

@Atlas07
Copy link
Copy Markdown
Contributor

@Atlas07 Atlas07 commented Mar 21, 2026

Summary

Addresses #3

  • Adds $(var:field:<type>) syntax for parameterizing the "field" value in aggregations (e.g., $(group_by:field:string))
  • The variable becomes a string input parameter (the ES field name passed at runtime)
  • The type annotation (string, int, float, int64) determines the output bucket key type
  • Allows a single query template to be reused across multiple ES fields instead of duplicating files

Changes

  • tjson.ml: Extended variable parser to recognize $(var:field:<type>) alongside existing $(var:list)
  • common.ml: Added Field_var constructor to value type, parse_field_type helper, and typeof_value handling
  • aggregations.ml: Wired Field_var through aggregation analysis and constraint resolution
  • query.ml: Handle Field_num (Field_var _) in constraint validation

Test Plan

  • New regression test field_var_agg with terms aggregation using $(group_by:field:string)
  • All 23 tests pass (22 existing + 1 new)
  • Smoke tested with exact syntax from the issue

Atlas07 and others added 6 commits March 21, 2026 14:17
Support $(var:field:<type>) syntax alongside existing $(var:list).
No behavioral changes yet — field_type is parsed but unused.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Field_var carries a Tjson.var and a simple_type parsed from the
field type annotation. typeof_value returns Simple typ directly,
bypassing the mapping lookup.

Also adds parse_field_type helper to convert user-facing type
strings ("string", "int", "float", "int64") to simple_type.

Note: the value type definition was moved after pp_simple_type
because [@@deriving show] on Field_var needs the printer in scope.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Parse $(var:field:<type>) in aggregation field position
- Register field variables as string input parameters
- Skip numeric/date field validation for dynamic fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests $(group_by:field:string) syntax: verifies string input parameter
and string bucket key output type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add field variable section under Variables with usage example
and supported type annotations.
Use descriptive names like _typ, _name, _metric, _keys etc.
instead of bare _ in pattern matches added by this PR.
Comment thread common.ml Outdated
Comment on lines +162 to +167
let parse_field_type = function
| "string" -> String
| "int" -> Int
| "float" -> Double
| "int64" -> Int64
| s -> fail "unsupported field type annotation %S, expected: string, int, float, int64" s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not simple_of_es_type above

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 8e044e4

Comment thread tjson.ml Outdated
match String.nsplit s ":" with
| [name] -> name, false, None
| [name; "list"] -> name, true, None
| [name; "field"; typ] -> name, false, Some typ
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think better syntax is just name:type? maybe it will be useful in more contexts than dynamic field and overall there is no semantic importance to it being field, it is currently a field type only by the virtue of place where it is used

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 45a09a6

Comment thread query.ml Outdated
| Bool | Json as t -> eprintfn "W: field %S expected to be numeric, but has type %s" f (show_simple_type t)
end
| Field_num (Script _) -> ()
| Field_num (Field_var (_v, _typ)) -> () (* cannot validate dynamic field type *)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we know the type and can validate it same as Field f above?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 5ae15fc

Comment thread aggregations.ml Outdated
Comment on lines +284 to +306
let field_var_cstrs = match agg with
| Dynamic _dv -> []
| Static agg ->
let vs = match agg with
| Simple_metric (_metric, v) -> [v]
| Value_count v | Cardinality v
| Histogram v | Range v -> [v]
| Range_keyed (v, _keys) -> [v]
| Terms { term; _ } | Significant_terms { term; _ } | Significant_text { term; _ } -> [term]
| Date_histogram { on; _ } | Date_range { on; _ } -> [on]
| Multi_terms { terms; _ } -> terms
| Weighted_avg { value; weight } -> [value.value; weight.value]
| Filter _q -> []
| Filters _fs -> []
| Filters_dynamic _dv -> []
| Top_hits _th -> []
| Nested _path -> []
| Reverse_nested _rpath -> []
| Bucket_sort _bs -> []
| Cumulative_sum _cp -> []
in
List.concat_map field_var_constraints vs
in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be done in the above match where all the constraints are computed already

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 05db4fd

Atlas07 and others added 4 commits March 24, 2026 01:11
Type annotations now use ES type names (keyword, long, double)
instead of custom names (string, int, float).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The :field: segment was redundant — the type annotation is useful
regardless of context, and the usage site already determines semantics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The annotated type is known at codegen time, so we can warn when
a non-numeric type is used in a numeric context — same as Field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eliminates redundant second match on aggregation type. Each arm now
handles its own field_var_constraints inline, matching the existing
pattern for on_int_var and Field_num/Field_date constraints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Atlas07 Atlas07 requested a review from rr0gi March 26, 2026 12:35
Copy link
Copy Markdown
Contributor

@rr0gi rr0gi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make fixes and merge

Comment thread aggregations.ml Outdated
let dummy_expunge = Dict ["dummy_expunge", Simple Json]

let field_var_constraints = function
| Field_var (v, _typ) -> [On_var (v, Eq_type String)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this warrants a comment that String here is for the variable value itself, not for the aggregation result, to avoid confusion

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 465917e

Comment thread README.md Outdated

The type annotation determines the output bucket key type. The variable itself becomes a `string` input parameter — the caller passes the ES field name at runtime (e.g., `"keyword_en"`).

Supported type annotations: any ES type accepted by `simple_of_es_type` (`keyword`, `text`, `long`, `double`, `float`, `boolean`, `date`, `int64`, `ip`, `murmur3`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is external documentation, they don't know what is simple_of_es_type.

Suggested change
Supported type annotations: any ES type accepted by `simple_of_es_type` (`keyword`, `text`, `long`, `double`, `float`, `boolean`, `date`, `int64`, `ip`, `murmur3`).
Type can be any ES type - `keyword`, `text`, `long`, `double`, `float`, `boolean`, `date`, `int64`, `ip`, `murmur3`.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 6c14bde

Comment thread tjson.ml Outdated
let debug_dump = false

type var = { optional : bool; list : bool; name : string } [@@deriving show]
type var = { optional : bool; list : bool; name : string; field_type : string option } [@@deriving show]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type var = { optional : bool; list : bool; name : string; field_type : string option } [@@deriving show]
type var = { optional : bool; list : bool; name : string; type_ : string option } [@@deriving show]

but at this point will be cleaner to get rid of list bool and rename type_ to hint and introduce Common.var parsed from generic Tjson.var, not in this PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in e20ff5b

Comment thread tjson.ml
let opt_suffix = if optional then "?" else "" in
let name_s = name ^ opt_suffix in
match field_type with
| Some ft -> sprintf "$(%s:%s)" name_s ft
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

natural qn is what if it is both list and has type, guess can assert for now to be clear

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in af75b0f

Atlas07 and others added 5 commits March 26, 2026 14:51
String is the type of the variable value (field name passed at
runtime), not the aggregation result type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
External users don't know what simple_of_es_type is.
List supported types directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The field is no longer field-specific after the syntax simplification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The parser branches already prevent this, but the explicit check
makes the invariant visible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests $(bucket_field:long) in a histogram aggregation, exercising
the Field_num constraint validation path and float bucket key output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Atlas07 Atlas07 merged commit 357d5ac into ahrefs:master Mar 26, 2026
rr0gi added a commit that referenced this pull request Mar 27, 2026
Author: Vitalii Shapoval <vitalii_shapoval@ukr.net>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rr0gi pushed a commit that referenced this pull request Mar 27, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants