Skip to content

Add Byte Array specialization#134

Merged
Waguramu merged 14 commits intov0.6.3from
byte-array
Feb 13, 2026
Merged

Add Byte Array specialization#134
Waguramu merged 14 commits intov0.6.3from
byte-array

Conversation

@Waguramu
Copy link
Contributor

@Waguramu Waguramu commented Feb 4, 2026

Note

Medium Risk
Touches core parsing/tokenization, value/operator dispatch, and model serialization/JSON encoding, so regressions could affect query evaluation and persistence. Changes are well-scoped and covered by new unit tests.

Overview
Adds first-class bytes support to simfil end-to-end: a new ByteArray scalar type, b"..."/b'...' literals (including \xNN escapes), typeof/# support, as bytes casting, and comparison semantics against other bytes and numeric values.

Extends the model layer to store bytes in ModelPool (new column + serialization) and introduces a tagged JSON representation ({"_bytes":true,"hex":...}) with round-trip parsing/printing; also refactors typed node resolution to an ADL-based Model::resolve<T> API (replacing the old resolveObject/resolveArray helpers). Documentation and tests are updated to cover the new bytes behavior.

Written by Cursor Bugbot for commit eb25799. This will update automatically on new commits. Configure here.

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

Test Results

 1 files  ±0   1 suites  ±0   6m 47s ⏱️ -4s
88 tests +2  88 ✅ +2  0 💤 ±0  0 ❌ ±0 
93 runs  +2  93 ✅ +2  0 💤 ±0  0 ❌ ±0 

Results for commit eb25799. ± Comparison against base commit c50a0f3.

♻️ This comment has been updated with latest results.

if (static_cast<unsigned char>(bytes[i]) != 0)
return std::nullopt;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big-endian decode logic truncates wrong bytes for overflow

High Severity

The decodeBigEndianI64 function incorrectly handles byte arrays longer than 8 bytes. It checks if trailing bytes (indices 8+) are zero and then uses the leading 8 bytes. In big-endian, trailing bytes are the least significant, so this logic rejects valid values that fit in 64 bits (like 256 in a 10-byte array) while accepting values that overflow (like 2^72 truncated to its high bytes). The check for overflow needs to verify the leading excess bytes are zero (or proper sign extension), not the trailing ones.

Fix in Cursor Fix in Web

strings_(std::move(strings))
{
columns_.stringData_.reserve(detail::ColumnPageSize*4);
columns_.byteArrayData_.reserve(detail::ColumnPageSize*4);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse stringData column instead.

clear_and_shrink(columns.strings_);
clear_and_shrink(columns.stringData_);
clear_and_shrink(columns.byteArrays_);
clear_and_shrink(columns.byteArrayData_);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

auto idx = n.addr().index();
if (auto err = checkBounds(impl_->columns_.byteArrays_))
return tl::unexpected<Error>(*err);
auto& val = impl_->columns_.byteArrays_[idx];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access stringData column

ModelNode::Ptr ModelPool::newValue(simfil::ByteArray const& value)
{
impl_->columns_.byteArrays_.emplace_back(Impl::StringRange{
(uint32_t)impl_->columns_.byteArrayData_.size(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also stringData column

@Waguramu Waguramu self-assigned this Feb 4, 2026
return l == r.toDisplayString();
}

auto operator()(const ByteArray& l, int64_t r) const
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this should be allowed:

  • We do not allow comparing strings to numbers without casting
  • toDisplayString() changes format depending on content and should not be used for comparison at all

(Same for double.)

return l < r.toDisplayString();
}

auto operator()(const ByteArray& l, int64_t r) const
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment about ByteArray to number conversion.


auto operator()(const ByteArray& v) const -> std::string
{
return v.toDisplayString();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo. this should always use toHex().


auto operator()(const ByteArray&) const -> std::string_view
{
static auto n = "string"sv;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the type "string"? Since ByteArray acts differently (hex format, ...)
it should get its own type name or be handled exactly like string.

@johannes-wolf
Copy link
Collaborator

johannes-wolf commented Feb 4, 2026

Do we really need to make the ByteArray accessible via simfil queries?

@Waguramu Waguramu changed the base branch from main to v0.6.3 February 4, 2026 13:58
@Waguramu
Copy link
Contributor Author

Waguramu commented Feb 4, 2026

Do we really need to make the ByteArray accessible via simfil queries?

The relevant issue:

The NDS.Live specifies in the RoadCharacteristicsLayer the RoadLocationId attribute as array of bytes that stores numeric value, in case of HERE implementation the long will be encoded in the array of bytes, please see attached.

There are basically two use cases for decoding a byte array to numeric value:
   - Attribute inspection - the numeric representation should be visible as well
   - Search - the numeric value should used as an input value

Proposal:
- Introduce byte array as a specialization of the simfil string.
- Byte array value returns big endian decimal if the value fits into 8 bytes.
- Byte array value returns hexadecimal if the value doesn't fit into 8 bytes.
- In the Livesource ensure that all zserio fields of type byte array are converted to simfil byte arrays.

Could you take a look? Maybe you've got a great idea @johannes-wolf

@johannes-wolf
Copy link
Collaborator

Do we really need to make the ByteArray accessible via simfil queries?

The relevant issue:

The NDS.Live specifies in the RoadCharacteristicsLayer the RoadLocationId attribute as array of bytes that stores numeric value, in case of HERE implementation the long will be encoded in the array of bytes, please see attached.

There are basically two use cases for decoding a byte array to numeric value:
   - Attribute inspection - the numeric representation should be visible as well
   - Search - the numeric value should used as an input value

Proposal:
- Introduce byte array as a specialization of the simfil string.
- Byte array value returns big endian decimal if the value fits into 8 bytes.
- Byte array value returns hexadecimal if the value doesn't fit into 8 bytes.
- In the Livesource ensure that all zserio fields of type byte array are converted to simfil byte arrays.

Could you take a look? Maybe you've got a great idea @johannes-wolf

Can we use the native simfil int & string values instead? Livesource could do the conversion to either an int or a string depending on the content/length.

@josephbirkner
Copy link
Collaborator

josephbirkner commented Feb 9, 2026

TODO:

  • Add a Bytes value type which is returned by ValueType4CType<ByteArray>
  • Add a possibility to declare a byte array through the language (b"c0ffee")
  • Be consistent with cross-type operator logic
    • typeof b"ff" -> "bytes"
    • b"89899" as string -> "89899"
    • "A normal string" as bytes -> b"41206E6F726D616C20737472696E67"
    • b"89899" > 5 -> true
    • b"89899" > "normal-string" -> false (incompatible comparison)
  • JSON conversion as tagged object (similar to _multimap): {'_bytes': true, 'number': <number-conv-if-possible>, 'data': 'data-as-base64'}
  • Move toDisplayString() to the erdblick inspection converter.

josephbirkner and others added 4 commits February 10, 2026 07:20
…>. Add a possibility to declare a byte array through the language. Fix consistency with cross-type operator logic. Add JSON conversion as tagged object (similar to _multimap). Remove toDisplayString().
…g. Fix ByteArray display format not round-trippable with literal syntax (parse b string literal as hex bytes).
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@josephbirkner josephbirkner dismissed johannes-wolf’s stale review February 12, 2026 16:57

PR comments are addressed :)

@sonarqubecloud
Copy link

@github-actions
Copy link

Package Line Rate Branch Rate Health
include.simfil 24% 10%
include.simfil.model 89% 57%
src 74% 46%
src.model 74% 43%
Summary 41% (6174 / 14923) 25% (4030 / 15857)

@johannes-wolf johannes-wolf self-requested a review February 13, 2026 11:34
Copy link
Collaborator

@johannes-wolf johannes-wolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

@Waguramu Waguramu merged commit 6976c70 into v0.6.3 Feb 13, 2026
7 checks passed
@Waguramu Waguramu deleted the byte-array branch February 13, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants