Conversation
Port NumberFormat from Storyline (INTEGER, FLOAT, PERCENT, CURRENCY types with K/M notation and dynamic precision) and integrate it throughout the query pipeline: - Float-like SQL columns (FLOAT, DOUBLE, DECIMAL, NUMERIC) no longer generate dimensions during auto-ingestion — they get measures only, following the Storyline pattern where floats are quantities, not categorical values. - Int-like dimensions and measures get NumberFormat inferred at ingestion time (INTEGER for int columns, FLOAT for float columns). - Query engine propagates format through FieldMetadata, with aggregation- aware inference (count→INTEGER, avg→FLOAT, sum/min/max→inherited). - SlayerResponse.to_markdown() applies number formatting; MCP server reuses it instead of custom markdown rendering. - API server exposes format in FieldMetadataResponse. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ev-1227-carry-over-formatting-implementation-incl-propagation-from
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds end-to-end numeric formatting: new NumberFormat model and formatter, format metadata on core models and enriched entities, float-detection in ingestion, aggregation-aware format inference in the query engine, and propagation/rendering of formats in API/MCP responses and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Ingestion as Ingestion Engine
participant Models as Core Models
participant Enrichment as Enrichment Pipeline
participant QueryEngine as Query Engine
participant Response as Response Builder
Ingestion->>Models: Detect columns + is_float, attach NumberFormat hints
Models->>Enrichment: Supply Dimensions/Measures with format metadata
Enrichment->>QueryEngine: Provide enriched model (formats & source_measure_name)
QueryEngine->>QueryEngine: Infer aggregated format for measures (aggregation rules)
QueryEngine->>Response: Build SlayerResponse with FieldMetadata.format
Response->>Response: _format_value() applies format_number per column
Response->>Client: Render Markdown/CSV/JSON output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (2)
slayer/mcp/server.py (1)
771-772: Keep the full format spec in MCP metadata.
format=currencydropsprecisionandsymbol, so MCP no longer exposes the same metadata that the API now returns. Serializing the wholeNumberFormatpayload, or at leasttype,precision, andsymbol, would keep the propagation consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/mcp/server.py` around lines 771 - 772, The current MCP metadata only appends fm.format.type (format=currency) which drops precision and symbol; update the code that builds metadata (the block using fm.format and parts.append) to serialize the full NumberFormat payload (or at minimum include fm.format.type, fm.format.precision, and fm.format.symbol) so MCP exposes the same format spec as the API—locate where fm.format is checked and replace the single-value append with a serialized representation containing type, precision, and symbol.slayer/core/format.py (1)
125-126: Use keyword args for_format_with_notation()consistently.These new call sites still pass
valuepositionally. Making that explicit would match the repo-wide Python call-style rule. As per coding guidelines, "Use keyword arguments for functions with more than 1 parameter."Also applies to: 140-147, 152-153
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/core/format.py` around lines 125 - 126, Calls to _format_with_notation pass the main value positionally (e.g., _format_with_notation(abs_value, default_precision=3, explicit_precision=precision, max_precision=2)); update these to use keyword arguments for clarity and consistency (e.g., value=abs_value, default_precision=3, explicit_precision=precision, max_precision=2). Locate every call to _format_with_notation in this file (including the other sites around the formatted_value/suffix/calc_precision assignments) and change the first positional argument to value=..., keeping the other named args unchanged so all parameters are passed as keywords.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@slayer/core/format.py`:
- Around line 33-36: The precision field currently allows negative integers
which later cause format_number() to raise when building a format spec; update
the model validation for the precision field (the Optional[int] Field named
precision in slayer/core/format.py) to reject negatives—either add a constraint
like ge=0 to the Field(...) declaration or add a Pydantic `@validator` (e.g.,
validate_precision) that raises a ValueError for precision < 0—so invalid
negative precision values are rejected at parse time before format_number() is
called.
- Around line 109-115: The numeric-type checks currently use isinstance(value,
numbers.Real) and isinstance(value, (int, float)) which exclude decimal.Decimal;
update both checks (in slayer/core/format.py around the value numeric/NaN guard
and in slayer/engine/query_engine.py where the (int, float) guard is used) to
also accept decimal.Decimal (e.g., isinstance(value, (numbers.Real,
decimal.Decimal)) or isinstance(value, (int, float, decimal.Decimal))) and
import decimal where needed so Decimal-backed NUMERIC/DECIMAL DB columns are
formatted by the existing numeric logic.
In `@slayer/engine/ingestion.py`:
- Around line 493-500: The code currently special-cases only "_count" when
building measures (in the numeric_columns and non_numeric_columns loops) so a
real column named "count" will be emitted as measure "count" which violates the
reserved-name rule; update both loops that append Measure(...) to map col_name
== "count" to measure_name "count_value" (and keep the existing col_name ==
"_count" -> "count_col" behavior), i.e., compute measure_name by checking
col_name == "_count" first (or both) and col_name == "count" and then use that
measure_name when calling measures.append(Measure(...)) (the same change applies
in the numeric loop that uses _FLOAT_FORMAT/_INT_FORMAT and in the non-numeric
loop).
- Around line 52-60: The code treats all NUMERIC/DECIMAL as floats and misnames
physical count columns; remove NUMERIC/DECIMAL from the blanket float sets
(_FLOAT_LIKE_SA_TYPES and _FLOAT_LIKE_INFO_SCHEMA_TYPES) and instead determine
is_float in _columns_to_model by checking the column's scale/precision: when the
dialect exposes numeric_scale/decimal_scale use that to treat scale==0 as
integer-like; in the INFORMATION_SCHEMA fallback query fetch numeric_scale (or
DECIMAL_SCALE) and use it similarly rather than hard-coding is_float=True after
stripping precision; finally update the measure naming logic where measure_name
is assigned (the block that currently maps names like "_count" → "count_col") to
also detect a physical column named exactly "count" and set measure_name =
"count_value". Ensure references to _FLOAT_LIKE_SA_TYPES,
_FLOAT_LIKE_INFO_SCHEMA_TYPES, _columns_to_model, and the measure_name
assignment are updated.
In `@slayer/engine/query_engine.py`:
- Around line 199-204: The code only adds FieldMetadata for enriched.expressions
and enriched.transforms when e.label/t.label exists, causing unlabeled derived
fields to lack the FLOAT format; update the logic in the loop over
enriched.expressions and enriched.transforms to always set
meta[e.alias]/meta[t.alias] = FieldMetadata(...,
format=NumberFormat(type=NumberFormatType.FLOAT)) and only include label if
e.label/t.label is truthy (e.g., label=e.label if present), and make the same
change in the other occurrence that mirrors these lines so all derived fields
get FLOAT formatting regardless of having a label.
- Around line 128-131: _infer_aggregated_format currently re-looks up
model.get_measure(measure_name) using the canonicalized aggregated Measure.name,
which loses the original source measure's format for aggregations
(sum/min/max/first/last); instead, pass the original source Measure or its
source_name through the enrichment path (from _query_as_model into where
aggregated Measures are created) and use that to resolve formats rather than
re-querying by the aggregated name. Update _query_as_model to persist the source
Measure or source_measure_name on the virtual Measure object and change
_infer_aggregated_format (and the other similar places that call
model.get_measure(measure_name)) to read the format from that threaded source
Measure/object (or its name) so currency/percent/integer formats are preserved
for inherited aggregations.
- Around line 195-205: The execute() metadata population currently handles
enriched.measures, enriched.expressions, and enriched.transforms but omits
enriched.cross_model_measures, and _query_as_model() forces fmt=None for
cross-model measures; update execute() to iterate enriched.cross_model_measures
and populate meta[t.alias] (or appropriate alias) with FieldMetadata using the
format resolved via _infer_aggregated_format(model=model,
measure_name=<cross_model_measure_name>, aggregation=<aggregation>) similar to
how enriched.measures are handled, and modify _query_as_model() to stop
hard-coding fmt=None for cross-model measures so it uses the inferred format;
ensure you reference enriched.cross_model_measures, _infer_aggregated_format,
execute(), _query_as_model(), and FieldMetadata when making the changes.
- Around line 188-191: The code currently only calls model.get_dimension(d.name)
and thus misses formats for dimensions coming from joins; change the lookup to
resolve dotted/joined dimension paths instead of assuming the source model by:
if d.name contains a dot, split the path and walk/resolve the referenced
model(s) to find the actual Dimension object (or call an existing resolver like
model.resolve_dimension or a helper you add), then read dim_obj.format and use
that when constructing FieldMetadata; update both the block using get_dimension
(the snippet with meta[d.alias] = FieldMetadata(...)) and the analogous logic at
the other location (around lines 440-448) so joined/multi‑hop dimensions supply
their correct format.
---
Nitpick comments:
In `@slayer/core/format.py`:
- Around line 125-126: Calls to _format_with_notation pass the main value
positionally (e.g., _format_with_notation(abs_value, default_precision=3,
explicit_precision=precision, max_precision=2)); update these to use keyword
arguments for clarity and consistency (e.g., value=abs_value,
default_precision=3, explicit_precision=precision, max_precision=2). Locate
every call to _format_with_notation in this file (including the other sites
around the formatted_value/suffix/calc_precision assignments) and change the
first positional argument to value=..., keeping the other named args unchanged
so all parameters are passed as keywords.
In `@slayer/mcp/server.py`:
- Around line 771-772: The current MCP metadata only appends fm.format.type
(format=currency) which drops precision and symbol; update the code that builds
metadata (the block using fm.format and parts.append) to serialize the full
NumberFormat payload (or at minimum include fm.format.type, fm.format.precision,
and fm.format.symbol) so MCP exposes the same format spec as the API—locate
where fm.format is checked and replace the single-value append with a serialized
representation containing type, precision, and symbol.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 87ef1dd2-77a1-48dc-b78f-32997dfdee0c
📒 Files selected for processing (9)
slayer/api/server.pyslayer/core/format.pyslayer/core/models.pyslayer/engine/ingestion.pyslayer/engine/query_engine.pyslayer/mcp/server.pytests/integration/test_integration_duckdb.pytests/integration/test_integration_postgres.pytests/test_format.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
@CodeRabbit review this! |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
slayer/engine/ingestion.py (1)
282-295:⚠️ Potential issue | 🟠 MajorDon't treat
DECIMAL(precision)as “unknown scale”.
_parse_info_schema_is_float()currently returnsTruefor any parenthesized exact numeric that lacks a comma, soDECIMAL(10)/NUMERIC(10)still get classified as float-like. PostgreSQL documentsNUMERIC(precision)as scale0, and MySQL documentsDECIMAL(M)asDECIMAL(M,0), so this will still mis-infer integer-like exact numerics when the fallback path sees that spelling. The companiontest_no_scale_in_parenswill need to flip with this. (postgresql.org)🛠️ Proposed fix
def _parse_info_schema_is_float(data_type_str: str) -> bool: """Determine if a NUMERIC/DECIMAL info-schema type string is float-like. @@ if "(" in data_type_str and "," in data_type_str: try: scale_str = data_type_str.split(",")[-1].rstrip(")").strip() return int(scale_str) > 0 except (ValueError, IndexError): return True # Can't parse scale, default to float + if "(" in data_type_str and ")" in data_type_str: + return False # DECIMAL(10) / NUMERIC(10) -> scale 0 return True # No precision/scale info, default to float🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/engine/ingestion.py` around lines 282 - 295, _parse_info_schema_is_float currently treats any parenthesized numeric type without a comma (e.g., "DECIMAL(10)") as float-like; change it to treat parenthesized single-argument forms as scale=0 (integer-like). In _parse_info_schema_is_float, when "(" in data_type_str but no ",", parse the value inside the parentheses (e.g., between "(" and ")"); if it parses as an integer (precision), return False (scale==0 -> integer-like); if parsing fails, also return False (conservative integer-like assumption for DECIMAL(N)/NUMERIC(N)). Keep existing logic for the comma case (parse scale after comma) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@slayer/core/format.py`:
- Around line 115-119: The code currently special-cases NaN in format_number but
still lets Infinity/-Infinity reach _format_with_notation; update format_number
to also guard infinite values by checking math.isinf(value) for floats and
value.is_infinite() for decimal.Decimal and immediately return str(value) when
true so _format_with_notation never receives non-finite numbers.
---
Duplicate comments:
In `@slayer/engine/ingestion.py`:
- Around line 282-295: _parse_info_schema_is_float currently treats any
parenthesized numeric type without a comma (e.g., "DECIMAL(10)") as float-like;
change it to treat parenthesized single-argument forms as scale=0
(integer-like). In _parse_info_schema_is_float, when "(" in data_type_str but no
",", parse the value inside the parentheses (e.g., between "(" and ")"); if it
parses as an integer (precision), return False (scale==0 -> integer-like); if
parsing fails, also return False (conservative integer-like assumption for
DECIMAL(N)/NUMERIC(N)). Keep existing logic for the comma case (parse scale
after comma) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 99698980-cd80-4d8e-8493-d0986f6d5ead
📒 Files selected for processing (9)
slayer/core/format.pyslayer/engine/enriched.pyslayer/engine/enrichment.pyslayer/engine/ingestion.pyslayer/engine/query_engine.pyslayer/mcp/server.pytests/test_format.pytests/test_format_propagation.pytests/test_ingestion.py
✅ Files skipped from review due to trivial changes (1)
- tests/test_format.py
Co-Authored-By: Egor Kraev <egor.kraev@gmail.com>
80fa27d to
73be519
Compare
…tation-incl-propagation-from
There was a problem hiding this comment.
🧹 Nitpick comments (3)
tests/test_format_propagation.py (1)
42-78: Consider adding test coverage forweighted_avgandmedianaggregations.The
_infer_aggregated_formatfunction handlesweighted_avgandmedian(returning FLOAT), but these are not covered by tests. While the existing tests validate the pattern, explicit coverage would catch regressions.💡 Additional test cases
def test_weighted_avg_returns_float(self, model): fmt = _infer_aggregated_format(model=model, measure_name="revenue", aggregation="weighted_avg") assert fmt.type == NumberFormatType.FLOAT def test_median_returns_float(self, model): fmt = _infer_aggregated_format(model=model, measure_name="revenue", aggregation="median") assert fmt.type == NumberFormatType.FLOAT🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_format_propagation.py` around lines 42 - 78, Add unit tests to cover the weighted_avg and median aggregation branches in _infer_aggregated_format: create two tests named test_weighted_avg_returns_float and test_median_returns_float that call _infer_aggregated_format(model=model, measure_name="revenue", aggregation="weighted_avg") and _infer_aggregated_format(model=model, measure_name="revenue", aggregation="median") respectively, and assert that the returned fmt.type equals NumberFormatType.FLOAT; keep the same testing style as the neighboring tests (use the provided model fixture and simple assert).slayer/mcp/server.py (1)
244-250: Fallback model construction is overly complex and may hide errors.The inline fallback
SlayerModel(name="", data_source="")is used only to access.hidden, but this creates an empty model just to default tohidden=False. Consider simplifying:♻️ Suggested simplification
available = sorted( - [ - n - for n in storage.list_models() - if not (storage.get_model(n) or SlayerModel(name="", data_source="")).hidden - ] + n for n in storage.list_models() + if not getattr(storage.get_model(n), "hidden", False) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/mcp/server.py` around lines 244 - 250, The list comprehension building available is creating a temporary SlayerModel(name="", data_source="") solely to read .hidden; replace that with an explicit check: call storage.get_model(n) once, assign it to a local (e.g., model = storage.get_model(n)), treat None as not hidden (hidden = False) and use model.hidden otherwise, then filter by not hidden; update the code that builds available (which uses storage.list_models() and storage.get_model()) to avoid constructing a fallback SlayerModel and to avoid calling get_model twice.slayer/core/format.py (1)
124-139: Minor: Redundant fallback for currency symbol.Line 125 uses
symbol or "$"but themodel_validatoralready setssymbol = "$"for CURRENCY types when None. Theor "$"is defensive but unnecessary.♻️ Optional simplification
if format_type == NumberFormatType.CURRENCY: - currency_symbol = symbol or "$" + currency_symbol = symbol # Validator guarantees non-None for CURRENCY is_negative = value < 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/core/format.py` around lines 124 - 139, The currency branch redundantly falls back to "$" with `currency_symbol = symbol or "$"` even though the model_validator guarantees symbol is set for CURRENCY; remove the defensive `or "$"` so `currency_symbol = symbol` is used instead, updating the code inside the format handling where `format_type == NumberFormatType.CURRENCY` (refer to the variables `symbol`, `currency_symbol`, `is_negative`, `formatted_value`, and the `_format_with_notation` call) to avoid the unnecessary fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@slayer/core/format.py`:
- Around line 124-139: The currency branch redundantly falls back to "$" with
`currency_symbol = symbol or "$"` even though the model_validator guarantees
symbol is set for CURRENCY; remove the defensive `or "$"` so `currency_symbol =
symbol` is used instead, updating the code inside the format handling where
`format_type == NumberFormatType.CURRENCY` (refer to the variables `symbol`,
`currency_symbol`, `is_negative`, `formatted_value`, and the
`_format_with_notation` call) to avoid the unnecessary fallback.
In `@slayer/mcp/server.py`:
- Around line 244-250: The list comprehension building available is creating a
temporary SlayerModel(name="", data_source="") solely to read .hidden; replace
that with an explicit check: call storage.get_model(n) once, assign it to a
local (e.g., model = storage.get_model(n)), treat None as not hidden (hidden =
False) and use model.hidden otherwise, then filter by not hidden; update the
code that builds available (which uses storage.list_models() and
storage.get_model()) to avoid constructing a fallback SlayerModel and to avoid
calling get_model twice.
In `@tests/test_format_propagation.py`:
- Around line 42-78: Add unit tests to cover the weighted_avg and median
aggregation branches in _infer_aggregated_format: create two tests named
test_weighted_avg_returns_float and test_median_returns_float that call
_infer_aggregated_format(model=model, measure_name="revenue",
aggregation="weighted_avg") and _infer_aggregated_format(model=model,
measure_name="revenue", aggregation="median") respectively, and assert that the
returned fmt.type equals NumberFormatType.FLOAT; keep the same testing style as
the neighboring tests (use the provided model fixture and simple assert).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 293c3ffe-c0d9-4c70-85ce-ad78fe940586
📒 Files selected for processing (9)
slayer/core/format.pyslayer/engine/enriched.pyslayer/engine/enrichment.pyslayer/engine/ingestion.pyslayer/engine/query_engine.pyslayer/mcp/server.pytests/test_format.pytests/test_format_propagation.pytests/test_ingestion.py
✅ Files skipped from review due to trivial changes (1)
- tests/test_format.py
🚧 Files skipped from review as they are similar to previous changes (5)
- slayer/engine/enrichment.py
- tests/test_ingestion.py
- slayer/engine/enriched.py
- slayer/engine/query_engine.py
- slayer/engine/ingestion.py
Summary by CodeRabbit
New Features
Behavior Changes
Tests