Endianness #104

jklmnn · 2019-12-02T12:54:27Z

Currently the parsing is done only in big endian (network byte order). It should also be possible to specify host byte order (or more explicitly little endian).

Work items:

Extend subprograms to extract/insert bits Issue 104: rewrite and extend Insert/Extract subprograms #894
Adapt parser to support "with Byte_Order" aspect Pull request
Code to select correct extract/insert subprograms based on message endianness Issue #104: Use new extract/insert functions in messages #900
Language documentation
Adapt pyrflx Issue 104: Support endianness conversion in PyRFLX #910

KOLANICH · 2020-02-17T07:39:45Z

You may find it useful: https://github.com/kaitaiStructCompile/Endianness.py
More info: kaitai-io/kaitai_struct#76
Usage examples:
https://github.com/kaitaiStructCompile/synalysis2kaitai/blob/master/Synalysis2Kaitai/__init__.py#L115
https://github.com/kaitaiStructCompile/synalysis2kaitai/blob/master/Synalysis2Kaitai/__init__.py#L124

senier · 2020-02-17T18:35:30Z

@KOLANICH Thanks for the pointers! This is quite a lengthy discussion and a lot of information to digest. As our DSL is influenced heavily by Ada (and the SPARK subset of Ada is also our target language), we will probably try to stay syntactically close to what the Ada community is doing. Here is a discussion on how they handle the endianness issue.

kanigsson · 2021-11-23T01:04:16Z

Here is a first design.

Language design

A message type definition can have an aspect with Byte_Order => with two possible values:

High_Order_First (default)
Low_Order_First

This corresponds to the Ada/GNAT aspect/attribute.

Implementation changes

We would add a non-generic parameter Byte_Order to RFLX.Generic_Types.Extract and Insert, which specifies the endianness.
This info would be passed on to the internal Extract_U64 and Insert_U64 via a non-generic parameter
The code of these two subprograms needs to be adapted.

This last part looks quite hard as the code is already intricate. But the change is very local and can be easily unit-tested.

kanigsson · 2021-11-23T02:29:20Z

P.S. In our discussion with @senier I was surprised that we wanted to support little endian even if the field is not byte-aligned. But having looked at the code now I understand things better. It's not obvious how we would benefit from such a restriction even if we introduced it.

treiher · 2021-11-24T14:43:15Z

It's not obvious how we would benefit from such a restriction even if we introduced it.

I agree, we should not introduce any restriction if there is no benefit.

Here is a first design.

Language design

A scalar type definition can have an aspect with Scalar_Storage_Order => with two possible values:

I think the aspect should be called Byte_Order and the aspect should be associated with a message. The aspect will affect the interpretation of all scalar fields of a message. In the future, we could consider adding a Byte_Order aspect for message fields, so that the behavior can be overwritten for single fields, but we do not need that for now.

Implementation changes

I would add a generic parameter Endianness (or Byte_Order?) to RFLX.Generic_Types.Extract and Insert, which specifies the endianness.

Clients of the generic interface would provide the endianness of the type at instantiation time

This info would be passed on to the internal Extract_U64 and Insert_U64 via a non-generic parameter

The code of these two subprograms needs to be adapted.

I think the parameter should be called Byte_Order (to be consistent and differentiate it from the bit ordering). What is the advantage of using a generic parameter instead of a non-generic parameter for the generic Insert/Extract?

This last part looks quite hard as the code is already intricate. But the change is very local and can be easily unit-tested.

I agree. It would make sense to get some samples from real-world messages to validate the implementation. SPDM would be a sensible option.

kanigsson · 2021-11-24T23:27:13Z

What is the advantage of using a generic parameter instead of a non-generic parameter for the generic Insert/Extract?

The advantage would be that the Byte_Order is specified by the client at one point only, so we can avoid the mistake where different byte orderings would be used throughout the application.

I have updated the design to reflect your comments.

treiher · 2021-11-25T08:18:18Z

The advantage would be that the Byte_Order is specified by the client at one point only, so we can avoid the mistake where different byte orderings would be used throughout the application.

Extract/Insert is only used in generated code. For messages, their use is limited to Get_Field_Value or Set_Field_Value. So I'm not really convinced that a generic parameter is beneficial. For just supporting byte ordering for a complete message, both variants would be simple to generate correctly (setting the same byte ordering for all instantiations or all calls makes not much difference). But when we will add support for different endianness for different message fields in the future, a non-generic parameter would simplify the code generation (as each field is represented by a call).

kanigsson · 2021-11-26T06:00:28Z

I'm OK with a non-generic parameter. I'm updating the design.

kanigsson · 2021-11-29T12:07:24Z

Summary from our live discussion. The Extract / Insert functions are quite intricate and a large change/rewrite should be avoided if possible. So if small tweaks are enough to support endianness, that would be preferable. Alex suggested as a first step to implement a separate function to support little endian. This could suggest the necessary modifications to the original functions. If the functions turn out to be very different, the final design could even consist of using separate functions for little and big endianness.

kanigsson · 2021-12-08T00:39:01Z

Coming back to this point. Is it really worthwhile to have a single Extract function which does both little and big endian? Same question for Insert obviously. I think both for proof and execution it will be simpler/faster to have separate versions for little and big endian.

kanigsson · 2021-12-08T09:42:27Z

Checking in here before I go any further to avoid losing time working in a wrong direction. I have written a version of Extract for little endian, called Extract_Little, copy-pasted below.

   function U64_Extract_Little
      (Data       : Bytes;
       Off        : Offset;
       Value_Size : Positive) return U64 with
     Pre =>
       (Value_Size in 1 .. U64'Size
        and then Long_Integer ((Natural (Off) + Value_Size - 1) / Byte'Size) < Data'Length
        and then (Natural (Off) + Value_Size - 1) / Byte'Size <= Natural'Size
        and then (Byte'Size - Natural (Off) mod Byte'Size) < Long_Integer'Size - 1),
     Post =>
       (if Value_Size < U64'Size then U64_Extract_Little'Result < 2**Value_Size)
   is
      LSE_Index : constant Index := Index (Natural (Data'Last) - Natural (Off) / Byte'Size);
      MSE_Index : constant Index := Index (Natural (Data'Last) - (Natural (Off) + Value_Size - 1) / Byte'Size);

      LSE_Offset : constant Natural := Natural (Natural (Off) mod Byte'Size);
      LSE_Size   : constant Natural := Byte'Size - LSE_Offset;

      MSE_Size   : constant Natural := (LSE_Offset + Value_Size + Byte'Size - 1) mod Byte'Size + 1;
      Result : U64 := 0;
      Bits : Natural with Ghost;

      --  Data is assumed to contain Amount bits of information. We append
      --  those bits to the current value of Result. We also update the Bits
      --  ghost variable.
      procedure Shift_Add (Data : U64; Amount : Natural)
      with Pre  => Bits < U64'Size and then Amount < U64'Size and then Result < 2 ** Bits and then U64'Size - Amount >= Bits and then Data < 2 ** Amount,
           Post => (if Bits < U64'Size then Result < 2 ** Bits) and then Bits = Bits'Old + Amount;

      procedure Shift_Add (Data : U64; Amount : Natural) is
      begin
         Result := Shift_Left (Result, Amount) + Data;
         Bits := Bits + Amount;
      end Shift_Add;

      pragma Warnings (Off, "postcondition does not check the outcome of calling ""Lemma_Bound_And""");
      procedure Lemma_Bound_And (X : U64; Bound : Natural)
        with Pre => Bound < U64'Size,
             Post => (X and (2 ** Bound - 1)) < 2 ** Bound
      is
      begin
         null;
      end Lemma_Bound_And;
      pragma Warnings (On, "postcondition does not check the outcome of calling ""Lemma_Bound_And""");

   begin
      --  This function simply iterates over all data bytes that contain
      --  relevant data, from most significant to least significant, and adds
      --  them up in Result, shifting the Result before the addition as needed
      --  (see helper function Shift_Add).

      --  The ghost variable Bits stores the number of bits that are contained
      --  in Result. This is useful to bound the --  current value of Result by
      --  2 ** Bits. At the end of the function, Bits = Value_Size.

      --  We start with the most significant byte, which is LSE_Index in little
      --  endian. We need to take into account the case where this is the
      --  last/only byte.
      declare
         Tmp : U64 := Byte'Pos (Data (LSE_Index));
      begin
         if LSE_Index = MSE_Index then
            Tmp := Tmp and (2 ** MSE_Size - 1);
         end if;
         Tmp := Shift_Right (Tmp, LSE_Offset);
         Result := Result + Tmp;
      end;

      --  If it was the only byte, we are done.

      if LSE_Index = MSE_Index then
         return Result;
      end if;

      --  The Bits variable is not needed for the previous early return, so we
      --  update it only here. Also, setting it to LSE_Size is only correct if
      --  we did not return early, so this is the best place to update it.
      Bits := LSE_Size;

      --  We now iterate over the "inner bytes" excluding the two extreme bytes.
      for I in reverse MSE_Index + 1  .. LSE_Index  - 1 loop
         pragma Loop_Invariant
           (Bits = Natural (LSE_Index - I) * Byte'Size - LSE_Offset
            and then Result < 2 ** Bits);
         Shift_Add (Byte'Pos (Data (I)), Byte'Size);
      end loop;

      --  We now add the relevant bits from the last byte (MSE_Index in little
      --  endian). Some helper assertions and a lemma are needed.
      pragma Assert (MSE_Size < U64'Size);
      Lemma_Bound_And (Byte'Pos (Data (MSE_Index)), MSE_Size);
      Shift_Add (Byte'Pos (Data (MSE_Index)) and (2 ** MSE_Size - 1), MSE_Size);
      pragma Assert (Bits = Value_Size);
      pragma Assert (if Bits < U64'Size then Result < 2 ** Bits);
      return Result;
   end U64_Extract_Little;

Some comments:

The structure of Extract_Little is different from Extract, as I was unable to sufficiently understand Extract to adapt it to Little endian. The structure is now similar to Insert, that is it just proceeds byte per byte in the input data.
I think it would be easy to make a big endian version of it (that is, rewrite Extract in the style of my Extract_Little). However it would be cumbersome to write a single function that does both.
The code above is fully proved, however colibri is needed for one check (the postcondition).
The code doesn't use the various helper functions of RFLX.Arithmetic, in particular it uses regular Shift_Left and Shift_Right.
Going more into detail, the function defines the local variables LSE_Index, MSE_Index according to the picture in the comments, while the existing Extract and Insert use another definition and then use translation functions to map the indices back to their original meaning.

Going forward, my next steps would be:

Proceed to the Insert function. I think here I can adapt the existing Insert as the structure is quite simple. Again I would write a second Insert_Little function instead of a single function with an extra parameter as in the initial design.
(Optional) rewrite Extract in the style of Extract_Little for conformity
(Optional) adapt existing code to avoid dependency on RFLX.Arithmetic and remove that unit.

Any opinions on this before I go any further?

treiher · 2021-12-08T12:36:44Z

Is it really worthwhile to have a single Extract function which does both little and big endian? Same question for Insert obviously. I think both for proof and execution it will be simpler/faster to have separate versions for little and big endian.

I think it is fine to have separate functions for LE and BE. Nevertheless, both functions should not be completely different, but use the same approach.

The code above is fully proved, however colibri is needed for one check (the postcondition).

I suppose that will not be a viable solution at the moment, as our CI pipeline uses Community 2021 which does not include Colibri. I don't think we can easily switch to another version. Not testing it in the CI pipeline is also not an option.

The code doesn't use the various helper functions of RFLX.Arithmetic, in particular it uses regular Shift_Left and Shift_Right.

As far as I remember, we did not use the built-in shift functions, because they did not have the necessary contracts to prove anything about their results. Isn't this the case in Community 2021 anymore?

Going forward, my next steps would be:

Proceed to the Insert function. I think here I can adapt the existing Insert as the structure is quite simple. Again I would write a second Insert_Little function instead of a single function with an extra parameter as in the initial design.

(Optional) rewrite Extract in the style of Extract_Little for conformity

(Optional) adapt existing code to avoid dependency on RFLX.Arithmetic and remove that unit.

That makes sense. Ideally, all the Insert and Extract functions use a similar approach, so we should rewrite exiting functions if necessary. Getting rid of RFLX.Arithmetic by using built-in functions would definitely an advantage. If it is possible, we should do it.

kanigsson · 2021-12-08T23:38:13Z

I suppose that will not be a viable solution at the moment, as our CI pipeline uses Community 2021 which does not include Colibri. I don't think we can easily switch to another version. Not testing it in the CI pipeline is also not an option.

I was using the Community 2021 for my experiments. It contains Colibri, but was not advertised as it was still experimental. But I don't think there were any known unsoundnesses. If not acceptable I can also try to eliminate colibri from the required provers.

As far as I remember, we did not use the built-in shift functions, because they did not have the necessary contracts to prove anything about their results. Isn't this the case in Community 2021 anymore?

The built-in Shift functions have had the relevant contracts for quite a while I think, and definitely in CE 2021.

treiher · 2021-12-09T08:02:34Z

I was using the Community 2021 for my experiments. It contains Colibri, but was not advertised as it was still experimental. But I don't think there were any known unsoundnesses. If not acceptable I can also try to eliminate colibri from the required provers.

I see. If you don't see an increased probability of soundness issues by enabling Colibri, I think it is fine to use it.

Also, bump version to 0.10.0 For AdaCore/RecordFlux#104

Use the correct Insert/Extract functions depending on the Byte_Order specification of the message. Adding a test. This requires Recordflux-parser 0.10.0

treiher · 2022-01-17T12:04:51Z

Here are some tests for the changes on PyRFLX:

diff --git a/tests/data/fixtures/pyrflx.py b/tests/data/fixtures/pyrflx.py
index 840cbee23..b80b68bd0 100644
--- a/tests/data/fixtures/pyrflx.py
+++ b/tests/data/fixtures/pyrflx.py
@@ -27,6 +27,7 @@ def fixture_pyrflx() -> pyrflx.PyRFLX:
             f"{SPEC_DIR}/message_type_size_condition.rflx",
             f"{SPEC_DIR}/always_valid_aspect.rflx",
             f"{SPEC_DIR}/parameterized.rflx",
+            f"{SPEC_DIR}/endianness.rflx",
         ],
         skip_model_verification=True,
     )
@@ -225,3 +226,8 @@ def fixture_always_valid_aspect_value(
 @pytest.fixture(name="parameterized_package", scope="session")
 def fixture_parameterized_package(pyrflx_: pyrflx.PyRFLX) -> pyrflx.Package:
     return pyrflx_.package("Parameterized")
+
+
+@pytest.fixture(name="endianness_package", scope="session")
+def fixture_endianness_package(pyrflx_: pyrflx.PyRFLX) -> pyrflx.Package:
+    return pyrflx_.package("Endianness")
diff --git a/tests/data/specs/endianness.rflx b/tests/data/specs/endianness.rflx
new file mode 100644
index 000000000..1ad326d0c
--- /dev/null
+++ b/tests/data/specs/endianness.rflx
@@ -0,0 +1,31 @@
+package Endianness is
+
+   type Tag is (None => 1, Data => 2) with Size => 16;
+   type Length is range 0 .. 2 ** 14 - 1 with Size => 16;
+
+   type Message is
+      message
+         Tag : Tag
+            then null
+               if Tag = None
+            then Length
+               if Tag = Data;
+         Length : Length;
+         Payload : Opaque
+            with Size => Length * 8;
+      end message;
+
+   type Message_LE is
+      message
+         Tag : Tag
+            then null
+               if Tag = None
+            then Length
+               if Tag = Data;
+         Length : Length;
+         Payload : Opaque
+            with Size => Length * 8;
+      end message
+         with Byte_Order => Low_Order_First;
+
+end Endianness;
diff --git a/tests/unit/pyrflx_test.py b/tests/unit/pyrflx_test.py
index 3d6575f8a..efdd12917 100644
--- a/tests/unit/pyrflx_test.py
+++ b/tests/unit/pyrflx_test.py
@@ -1439,3 +1439,67 @@ def test_json_serialization() -> None:
     opaque_value = OpaqueValue(Opaque())
     opaque_value.assign(b"RecordFlux")
     assert opaque_value.as_json() == b"RecordFlux"
+
+
+def test_message_endianness_parse_be(endianness_package: Package) -> None:
+    message = endianness_package.new_message("Message")
+
+    message.parse(b"\x00\x01")
+
+    assert message.valid_message
+    assert message.get("Tag") == "None"
+
+    message.parse(b"\x00\x02\x00\x04\x01\x02\x03\x04")
+
+    assert message.valid_message
+    assert message.get("Tag") == "Data"
+    assert message.get("Length") == 4
+    assert message.get("Payload") == b"\x01\x02\x03\x04"
+
+
+def test_message_endianness_parse_le(endianness_package: Package) -> None:
+    message_le = endianness_package.new_message("Message_LE")
+
+    message_le.parse(b"\x01\x00")
+
+    assert message_le.valid_message
+    assert message_le.get("Tag") == "None"
+
+    message_le.parse(b"\x02\x00\x04\x00\x01\x02\x03\x04")
+
+    assert message_le.valid_message
+    assert message_le.get("Tag") == "Data"
+    assert message_le.get("Length") == 4
+    assert message_le.get("Payload") == b"\x01\x02\x03\x04"
+
+
+def test_message_endianness_set_be(endianness_package: Package) -> None:
+    message = endianness_package.new_message("Message")
+
+    message.set("Tag", "None")
+
+    assert message.valid_message
+    assert message.bytestring == b"\x00\x01"
+
+    message.set("Tag", "Data")
+    message.set("Length", 4)
+    message.set("Payload", b"\x01\x02\x03\x04")
+
+    assert message.valid_message
+    assert message.bytestring == b"\x00\x02\x00\x04\x01\x02\x03\x04"
+
+
+def test_message_endianness_set_le(endianness_package: Package) -> None:
+    message_le = endianness_package.new_message("Message_LE")
+
+    message_le.set("Tag", "None")
+
+    assert message_le.valid_message
+    assert message_le.bytestring == b"\x01\x00"
+
+    message_le.set("Tag", "Data")
+    message_le.set("Length", 4)
+    message_le.set("Payload", b"\x01\x02\x03\x04")
+
+    assert message_le.valid_message
+    assert message_le.bytestring == b"\x02\x00\x04\x00\x01\x02\x03\x04"

kanigsson · 2022-01-18T02:13:15Z

Thanks, that helps a lot. I was able to modify pyrflx to pass your tests, though I'm not confident in my patch ... I will put it on review when the other patches are through.

Use the correct Insert/Extract functions depending on the Byte_Order specification of the message. Adding a test. This requires Recordflux-parser 0.10.0

Also, bump version to 0.10.0 For #104

treiher added this to To do in RecordFlux Future via automation Dec 4, 2019

treiher added limitation labels May 28, 2020

treiher added the v0.4.1 label Jul 14, 2020

treiher added the v0.5.0 label Jul 9, 2021

treiher removed this from To do in RecordFlux Future Jul 26, 2021

treiher added this to To do in RecordFlux 0.6 via automation Jul 26, 2021

senier moved this from To do to Design in RecordFlux 0.6 Nov 22, 2021

senier assigned kanigsson Nov 22, 2021

senier moved this from Design to Implementation in RecordFlux 0.6 Nov 29, 2021

kanigsson added a commit to AdaCore/RecordFlux-parser that referenced this issue Jan 11, 2022

Add Byte_Order aspect to Recordflux parser

26c0a6d

Also, bump version to 0.10.0 For AdaCore/RecordFlux#104

kanigsson mentioned this issue Jan 11, 2022

Add Byte_Order aspect to Recordflux parser AdaCore/RecordFlux-parser#54

Merged

kanigsson added a commit to AdaCore/RecordFlux-parser that referenced this issue Jan 12, 2022

Add Byte_Order aspect to Recordflux parser

ade2892

Also, bump version to 0.10.0 For AdaCore/RecordFlux#104

kanigsson added a commit that referenced this issue Jan 13, 2022

Issue #104: Use new extract/insert functions in messages

c2eef6b

Use the correct Insert/Extract functions depending on the Byte_Order specification of the message. Adding a test. This requires Recordflux-parser 0.10.0

kanigsson added a commit that referenced this issue Jan 13, 2022

Issue #104: Use new extract/insert functions in messages

aad3fae

Use the correct Insert/Extract functions depending on the Byte_Order specification of the message. Adding a test. This requires Recordflux-parser 0.10.0

kanigsson added a commit that referenced this issue Jan 13, 2022

Issue #104: add documentation

0442aaa

kanigsson added a commit that referenced this issue Jan 14, 2022

Issue #104: add documentation

69af366

treiher mentioned this issue Jan 17, 2022

Issue #104: Use new extract/insert functions in messages #900

Merged

kanigsson added a commit that referenced this issue Jan 18, 2022

Issue #104: Use new extract/insert functions in messages

3172de5

Use the correct Insert/Extract functions depending on the Byte_Order specification of the message. Adding a test. This requires Recordflux-parser 0.10.0

kanigsson added a commit that referenced this issue Jan 18, 2022

Issue #104: add documentation

f9e3a44

kanigsson moved this from Implementation to Review in RecordFlux 0.6 Jan 18, 2022

treiher linked a pull request Jan 19, 2022 that will close this issue

Issue 104: Support endianness conversion in PyRFLX #910

Merged

kanigsson closed this as completed in #910 Jan 19, 2022

RecordFlux 0.6 automation moved this from Review to Done Jan 19, 2022

adacore-bot pushed a commit that referenced this issue May 15, 2023

Add Byte_Order aspect to Recordflux parser

ca6201b

Also, bump version to 0.10.0 For #104

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Endianness #104

Endianness #104

jklmnn commented Dec 2, 2019 •

edited by kanigsson

Loading

KOLANICH commented Feb 17, 2020 •

edited

Loading

senier commented Feb 17, 2020

kanigsson commented Nov 23, 2021 •

edited

Loading

kanigsson commented Nov 23, 2021

treiher commented Nov 24, 2021

Language design

Implementation changes

kanigsson commented Nov 24, 2021 •

edited

Loading

treiher commented Nov 25, 2021

kanigsson commented Nov 26, 2021

kanigsson commented Nov 29, 2021

kanigsson commented Dec 8, 2021

kanigsson commented Dec 8, 2021

treiher commented Dec 8, 2021

kanigsson commented Dec 8, 2021

treiher commented Dec 9, 2021

treiher commented Jan 17, 2022

kanigsson commented Jan 18, 2022

Endianness #104

Endianness #104

Comments

jklmnn commented Dec 2, 2019 • edited by kanigsson Loading

KOLANICH commented Feb 17, 2020 • edited Loading

senier commented Feb 17, 2020

kanigsson commented Nov 23, 2021 • edited Loading

Language design

Implementation changes

kanigsson commented Nov 23, 2021

treiher commented Nov 24, 2021

Language design

Implementation changes

kanigsson commented Nov 24, 2021 • edited Loading

treiher commented Nov 25, 2021

kanigsson commented Nov 26, 2021

kanigsson commented Nov 29, 2021

kanigsson commented Dec 8, 2021

kanigsson commented Dec 8, 2021

treiher commented Dec 8, 2021

kanigsson commented Dec 8, 2021

treiher commented Dec 9, 2021

treiher commented Jan 17, 2022

kanigsson commented Jan 18, 2022

jklmnn commented Dec 2, 2019 •

edited by kanigsson

Loading

KOLANICH commented Feb 17, 2020 •

edited

Loading

kanigsson commented Nov 23, 2021 •

edited

Loading

kanigsson commented Nov 24, 2021 •

edited

Loading