Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endianness #104

Closed
5 tasks done
jklmnn opened this issue Dec 2, 2019 · 16 comments · Fixed by #910
Closed
5 tasks done

Endianness #104

jklmnn opened this issue Dec 2, 2019 · 16 comments · Fixed by #910
Assignees

Comments

@jklmnn
Copy link
Member

jklmnn commented Dec 2, 2019

Currently the parsing is done only in big endian (network byte order). It should also be possible to specify host byte order (or more explicitly little endian).

Work items:

@treiher treiher added this to To do in RecordFlux Future via automation Dec 4, 2019
@senier
Copy link
Member

senier commented Feb 17, 2020

@KOLANICH Thanks for the pointers! This is quite a lengthy discussion and a lot of information to digest. As our DSL is influenced heavily by Ada (and the SPARK subset of Ada is also our target language), we will probably try to stay syntactically close to what the Ada community is doing. Here is a discussion on how they handle the endianness issue.

@treiher treiher added the v0.4.1 label Jul 14, 2020
@treiher treiher added the v0.5.0 label Jul 9, 2021
@treiher treiher removed this from To do in RecordFlux Future Jul 26, 2021
@treiher treiher added this to To do in RecordFlux 0.6 via automation Jul 26, 2021
@senier senier moved this from To do to Design in RecordFlux 0.6 Nov 22, 2021
@kanigsson
Copy link
Collaborator

kanigsson commented Nov 23, 2021

Here is a first design.

Language design

A message type definition can have an aspect with Byte_Order => with two possible values:

  • High_Order_First (default)
  • Low_Order_First

This corresponds to the Ada/GNAT aspect/attribute.

Implementation changes

  • We would add a non-generic parameter Byte_Order to RFLX.Generic_Types.Extract and Insert, which specifies the endianness.
  • This info would be passed on to the internal Extract_U64 and Insert_U64 via a non-generic parameter
  • The code of these two subprograms needs to be adapted.

This last part looks quite hard as the code is already intricate. But the change is very local and can be easily unit-tested.

@kanigsson
Copy link
Collaborator

P.S. In our discussion with @senier I was surprised that we wanted to support little endian even if the field is not byte-aligned. But having looked at the code now I understand things better. It's not obvious how we would benefit from such a restriction even if we introduced it.

@treiher
Copy link
Collaborator

treiher commented Nov 24, 2021

It's not obvious how we would benefit from such a restriction even if we introduced it.

I agree, we should not introduce any restriction if there is no benefit.

Here is a first design.

Language design

A scalar type definition can have an aspect with Scalar_Storage_Order => with two possible values:

I think the aspect should be called Byte_Order and the aspect should be associated with a message. The aspect will affect the interpretation of all scalar fields of a message. In the future, we could consider adding a Byte_Order aspect for message fields, so that the behavior can be overwritten for single fields, but we do not need that for now.

Implementation changes

  • I would add a generic parameter Endianness (or Byte_Order?) to RFLX.Generic_Types.Extract and Insert, which specifies the endianness.
  • Clients of the generic interface would provide the endianness of the type at instantiation time
  • This info would be passed on to the internal Extract_U64 and Insert_U64 via a non-generic parameter
  • The code of these two subprograms needs to be adapted.

I think the parameter should be called Byte_Order (to be consistent and differentiate it from the bit ordering). What is the advantage of using a generic parameter instead of a non-generic parameter for the generic Insert/Extract?

This last part looks quite hard as the code is already intricate. But the change is very local and can be easily unit-tested.

I agree. It would make sense to get some samples from real-world messages to validate the implementation. SPDM would be a sensible option.

@kanigsson
Copy link
Collaborator

kanigsson commented Nov 24, 2021

What is the advantage of using a generic parameter instead of a non-generic parameter for the generic Insert/Extract?

The advantage would be that the Byte_Order is specified by the client at one point only, so we can avoid the mistake where different byte orderings would be used throughout the application.

I have updated the design to reflect your comments.

@treiher
Copy link
Collaborator

treiher commented Nov 25, 2021

The advantage would be that the Byte_Order is specified by the client at one point only, so we can avoid the mistake where different byte orderings would be used throughout the application.

Extract/Insert is only used in generated code. For messages, their use is limited to Get_Field_Value or Set_Field_Value. So I'm not really convinced that a generic parameter is beneficial. For just supporting byte ordering for a complete message, both variants would be simple to generate correctly (setting the same byte ordering for all instantiations or all calls makes not much difference). But when we will add support for different endianness for different message fields in the future, a non-generic parameter would simplify the code generation (as each field is represented by a call).

@kanigsson
Copy link
Collaborator

I'm OK with a non-generic parameter. I'm updating the design.

@senier senier moved this from Design to Implementation in RecordFlux 0.6 Nov 29, 2021
@kanigsson
Copy link
Collaborator

Summary from our live discussion. The Extract / Insert functions are quite intricate and a large change/rewrite should be avoided if possible. So if small tweaks are enough to support endianness, that would be preferable. Alex suggested as a first step to implement a separate function to support little endian. This could suggest the necessary modifications to the original functions. If the functions turn out to be very different, the final design could even consist of using separate functions for little and big endianness.

@kanigsson
Copy link
Collaborator

Coming back to this point. Is it really worthwhile to have a single Extract function which does both little and big endian? Same question for Insert obviously. I think both for proof and execution it will be simpler/faster to have separate versions for little and big endian.

@kanigsson
Copy link
Collaborator

Checking in here before I go any further to avoid losing time working in a wrong direction. I have written a version of Extract for little endian, called Extract_Little, copy-pasted below.

   function U64_Extract_Little
      (Data       : Bytes;
       Off        : Offset;
       Value_Size : Positive) return U64 with
     Pre =>
       (Value_Size in 1 .. U64'Size
        and then Long_Integer ((Natural (Off) + Value_Size - 1) / Byte'Size) < Data'Length
        and then (Natural (Off) + Value_Size - 1) / Byte'Size <= Natural'Size
        and then (Byte'Size - Natural (Off) mod Byte'Size) < Long_Integer'Size - 1),
     Post =>
       (if Value_Size < U64'Size then U64_Extract_Little'Result < 2**Value_Size)
   is
      LSE_Index : constant Index := Index (Natural (Data'Last) - Natural (Off) / Byte'Size);
      MSE_Index : constant Index := Index (Natural (Data'Last) - (Natural (Off) + Value_Size - 1) / Byte'Size);

      LSE_Offset : constant Natural := Natural (Natural (Off) mod Byte'Size);
      LSE_Size   : constant Natural := Byte'Size - LSE_Offset;

      MSE_Size   : constant Natural := (LSE_Offset + Value_Size + Byte'Size - 1) mod Byte'Size + 1;
      Result : U64 := 0;
      Bits : Natural with Ghost;

      --  Data is assumed to contain Amount bits of information. We append
      --  those bits to the current value of Result. We also update the Bits
      --  ghost variable.
      procedure Shift_Add (Data : U64; Amount : Natural)
      with Pre  => Bits < U64'Size and then Amount < U64'Size and then Result < 2 ** Bits and then U64'Size - Amount >= Bits and then Data < 2 ** Amount,
           Post => (if Bits < U64'Size then Result < 2 ** Bits) and then Bits = Bits'Old + Amount;

      procedure Shift_Add (Data : U64; Amount : Natural) is
      begin
         Result := Shift_Left (Result, Amount) + Data;
         Bits := Bits + Amount;
      end Shift_Add;

      pragma Warnings (Off, "postcondition does not check the outcome of calling ""Lemma_Bound_And""");
      procedure Lemma_Bound_And (X : U64; Bound : Natural)
        with Pre => Bound < U64'Size,
             Post => (X and (2 ** Bound - 1)) < 2 ** Bound
      is
      begin
         null;
      end Lemma_Bound_And;
      pragma Warnings (On, "postcondition does not check the outcome of calling ""Lemma_Bound_And""");

   begin
      --  This function simply iterates over all data bytes that contain
      --  relevant data, from most significant to least significant, and adds
      --  them up in Result, shifting the Result before the addition as needed
      --  (see helper function Shift_Add).

      --  The ghost variable Bits stores the number of bits that are contained
      --  in Result. This is useful to bound the --  current value of Result by
      --  2 ** Bits. At the end of the function, Bits = Value_Size.

      --  We start with the most significant byte, which is LSE_Index in little
      --  endian. We need to take into account the case where this is the
      --  last/only byte.
      declare
         Tmp : U64 := Byte'Pos (Data (LSE_Index));
      begin
         if LSE_Index = MSE_Index then
            Tmp := Tmp and (2 ** MSE_Size - 1);
         end if;
         Tmp := Shift_Right (Tmp, LSE_Offset);
         Result := Result + Tmp;
      end;

      --  If it was the only byte, we are done.

      if LSE_Index = MSE_Index then
         return Result;
      end if;

      --  The Bits variable is not needed for the previous early return, so we
      --  update it only here. Also, setting it to LSE_Size is only correct if
      --  we did not return early, so this is the best place to update it.
      Bits := LSE_Size;

      --  We now iterate over the "inner bytes" excluding the two extreme bytes.
      for I in reverse MSE_Index + 1  .. LSE_Index  - 1 loop
         pragma Loop_Invariant
           (Bits = Natural (LSE_Index - I) * Byte'Size - LSE_Offset
            and then Result < 2 ** Bits);
         Shift_Add (Byte'Pos (Data (I)), Byte'Size);
      end loop;

      --  We now add the relevant bits from the last byte (MSE_Index in little
      --  endian). Some helper assertions and a lemma are needed.
      pragma Assert (MSE_Size < U64'Size);
      Lemma_Bound_And (Byte'Pos (Data (MSE_Index)), MSE_Size);
      Shift_Add (Byte'Pos (Data (MSE_Index)) and (2 ** MSE_Size - 1), MSE_Size);
      pragma Assert (Bits = Value_Size);
      pragma Assert (if Bits < U64'Size then Result < 2 ** Bits);
      return Result;
   end U64_Extract_Little;

Some comments:

  • The structure of Extract_Little is different from Extract, as I was unable to sufficiently understand Extract to adapt it to Little endian. The structure is now similar to Insert, that is it just proceeds byte per byte in the input data.
  • I think it would be easy to make a big endian version of it (that is, rewrite Extract in the style of my Extract_Little). However it would be cumbersome to write a single function that does both.
  • The code above is fully proved, however colibri is needed for one check (the postcondition).
  • The code doesn't use the various helper functions of RFLX.Arithmetic, in particular it uses regular Shift_Left and Shift_Right.
  • Going more into detail, the function defines the local variables LSE_Index, MSE_Index according to the picture in the comments, while the existing Extract and Insert use another definition and then use translation functions to map the indices back to their original meaning.

Going forward, my next steps would be:

  • Proceed to the Insert function. I think here I can adapt the existing Insert as the structure is quite simple. Again I would write a second Insert_Little function instead of a single function with an extra parameter as in the initial design.
  • (Optional) rewrite Extract in the style of Extract_Little for conformity
  • (Optional) adapt existing code to avoid dependency on RFLX.Arithmetic and remove that unit.

Any opinions on this before I go any further?

@treiher
Copy link
Collaborator

treiher commented Dec 8, 2021

Is it really worthwhile to have a single Extract function which does both little and big endian? Same question for Insert obviously. I think both for proof and execution it will be simpler/faster to have separate versions for little and big endian.

I think it is fine to have separate functions for LE and BE. Nevertheless, both functions should not be completely different, but use the same approach.

  • The code above is fully proved, however colibri is needed for one check (the postcondition).

I suppose that will not be a viable solution at the moment, as our CI pipeline uses Community 2021 which does not include Colibri. I don't think we can easily switch to another version. Not testing it in the CI pipeline is also not an option.

  • The code doesn't use the various helper functions of RFLX.Arithmetic, in particular it uses regular Shift_Left and Shift_Right.

As far as I remember, we did not use the built-in shift functions, because they did not have the necessary contracts to prove anything about their results. Isn't this the case in Community 2021 anymore?

Going forward, my next steps would be:

  • Proceed to the Insert function. I think here I can adapt the existing Insert as the structure is quite simple. Again I would write a second Insert_Little function instead of a single function with an extra parameter as in the initial design.
  • (Optional) rewrite Extract in the style of Extract_Little for conformity
  • (Optional) adapt existing code to avoid dependency on RFLX.Arithmetic and remove that unit.

That makes sense. Ideally, all the Insert and Extract functions use a similar approach, so we should rewrite exiting functions if necessary. Getting rid of RFLX.Arithmetic by using built-in functions would definitely an advantage. If it is possible, we should do it.

@kanigsson
Copy link
Collaborator

I suppose that will not be a viable solution at the moment, as our CI pipeline uses Community 2021 which does not include Colibri. I don't think we can easily switch to another version. Not testing it in the CI pipeline is also not an option.

I was using the Community 2021 for my experiments. It contains Colibri, but was not advertised as it was still experimental. But I don't think there were any known unsoundnesses. If not acceptable I can also try to eliminate colibri from the required provers.

As far as I remember, we did not use the built-in shift functions, because they did not have the necessary contracts to prove anything about their results. Isn't this the case in Community 2021 anymore?

The built-in Shift functions have had the relevant contracts for quite a while I think, and definitely in CE 2021.

@treiher
Copy link
Collaborator

treiher commented Dec 9, 2021

I was using the Community 2021 for my experiments. It contains Colibri, but was not advertised as it was still experimental. But I don't think there were any known unsoundnesses. If not acceptable I can also try to eliminate colibri from the required provers.

I see. If you don't see an increased probability of soundness issues by enabling Colibri, I think it is fine to use it.

kanigsson added a commit to AdaCore/RecordFlux-parser that referenced this issue Jan 11, 2022
kanigsson added a commit to AdaCore/RecordFlux-parser that referenced this issue Jan 12, 2022
kanigsson added a commit that referenced this issue Jan 13, 2022
Use the correct Insert/Extract functions depending on the Byte_Order
specification of the message.
Adding a test.
This requires Recordflux-parser 0.10.0
kanigsson added a commit that referenced this issue Jan 13, 2022
Use the correct Insert/Extract functions depending on the Byte_Order
specification of the message.
Adding a test.
This requires Recordflux-parser 0.10.0
kanigsson added a commit that referenced this issue Jan 13, 2022
kanigsson added a commit that referenced this issue Jan 14, 2022
@treiher
Copy link
Collaborator

treiher commented Jan 17, 2022

Here are some tests for the changes on PyRFLX:

diff --git a/tests/data/fixtures/pyrflx.py b/tests/data/fixtures/pyrflx.py
index 840cbee23..b80b68bd0 100644
--- a/tests/data/fixtures/pyrflx.py
+++ b/tests/data/fixtures/pyrflx.py
@@ -27,6 +27,7 @@ def fixture_pyrflx() -> pyrflx.PyRFLX:
             f"{SPEC_DIR}/message_type_size_condition.rflx",
             f"{SPEC_DIR}/always_valid_aspect.rflx",
             f"{SPEC_DIR}/parameterized.rflx",
+            f"{SPEC_DIR}/endianness.rflx",
         ],
         skip_model_verification=True,
     )
@@ -225,3 +226,8 @@ def fixture_always_valid_aspect_value(
 @pytest.fixture(name="parameterized_package", scope="session")
 def fixture_parameterized_package(pyrflx_: pyrflx.PyRFLX) -> pyrflx.Package:
     return pyrflx_.package("Parameterized")
+
+
+@pytest.fixture(name="endianness_package", scope="session")
+def fixture_endianness_package(pyrflx_: pyrflx.PyRFLX) -> pyrflx.Package:
+    return pyrflx_.package("Endianness")
diff --git a/tests/data/specs/endianness.rflx b/tests/data/specs/endianness.rflx
new file mode 100644
index 000000000..1ad326d0c
--- /dev/null
+++ b/tests/data/specs/endianness.rflx
@@ -0,0 +1,31 @@
+package Endianness is
+
+   type Tag is (None => 1, Data => 2) with Size => 16;
+   type Length is range 0 .. 2 ** 14 - 1 with Size => 16;
+
+   type Message is
+      message
+         Tag : Tag
+            then null
+               if Tag = None
+            then Length
+               if Tag = Data;
+         Length : Length;
+         Payload : Opaque
+            with Size => Length * 8;
+      end message;
+
+   type Message_LE is
+      message
+         Tag : Tag
+            then null
+               if Tag = None
+            then Length
+               if Tag = Data;
+         Length : Length;
+         Payload : Opaque
+            with Size => Length * 8;
+      end message
+         with Byte_Order => Low_Order_First;
+
+end Endianness;
diff --git a/tests/unit/pyrflx_test.py b/tests/unit/pyrflx_test.py
index 3d6575f8a..efdd12917 100644
--- a/tests/unit/pyrflx_test.py
+++ b/tests/unit/pyrflx_test.py
@@ -1439,3 +1439,67 @@ def test_json_serialization() -> None:
     opaque_value = OpaqueValue(Opaque())
     opaque_value.assign(b"RecordFlux")
     assert opaque_value.as_json() == b"RecordFlux"
+
+
+def test_message_endianness_parse_be(endianness_package: Package) -> None:
+    message = endianness_package.new_message("Message")
+
+    message.parse(b"\x00\x01")
+
+    assert message.valid_message
+    assert message.get("Tag") == "None"
+
+    message.parse(b"\x00\x02\x00\x04\x01\x02\x03\x04")
+
+    assert message.valid_message
+    assert message.get("Tag") == "Data"
+    assert message.get("Length") == 4
+    assert message.get("Payload") == b"\x01\x02\x03\x04"
+
+
+def test_message_endianness_parse_le(endianness_package: Package) -> None:
+    message_le = endianness_package.new_message("Message_LE")
+
+    message_le.parse(b"\x01\x00")
+
+    assert message_le.valid_message
+    assert message_le.get("Tag") == "None"
+
+    message_le.parse(b"\x02\x00\x04\x00\x01\x02\x03\x04")
+
+    assert message_le.valid_message
+    assert message_le.get("Tag") == "Data"
+    assert message_le.get("Length") == 4
+    assert message_le.get("Payload") == b"\x01\x02\x03\x04"
+
+
+def test_message_endianness_set_be(endianness_package: Package) -> None:
+    message = endianness_package.new_message("Message")
+
+    message.set("Tag", "None")
+
+    assert message.valid_message
+    assert message.bytestring == b"\x00\x01"
+
+    message.set("Tag", "Data")
+    message.set("Length", 4)
+    message.set("Payload", b"\x01\x02\x03\x04")
+
+    assert message.valid_message
+    assert message.bytestring == b"\x00\x02\x00\x04\x01\x02\x03\x04"
+
+
+def test_message_endianness_set_le(endianness_package: Package) -> None:
+    message_le = endianness_package.new_message("Message_LE")
+
+    message_le.set("Tag", "None")
+
+    assert message_le.valid_message
+    assert message_le.bytestring == b"\x01\x00"
+
+    message_le.set("Tag", "Data")
+    message_le.set("Length", 4)
+    message_le.set("Payload", b"\x01\x02\x03\x04")
+
+    assert message_le.valid_message
+    assert message_le.bytestring == b"\x02\x00\x04\x00\x01\x02\x03\x04"

@kanigsson
Copy link
Collaborator

Thanks, that helps a lot. I was able to modify pyrflx to pass your tests, though I'm not confident in my patch ... I will put it on review when the other patches are through.

kanigsson added a commit that referenced this issue Jan 18, 2022
Use the correct Insert/Extract functions depending on the Byte_Order
specification of the message.
Adding a test.
This requires Recordflux-parser 0.10.0
kanigsson added a commit that referenced this issue Jan 18, 2022
@kanigsson kanigsson moved this from Implementation to Review in RecordFlux 0.6 Jan 18, 2022
@treiher treiher linked a pull request Jan 19, 2022 that will close this issue
RecordFlux 0.6 automation moved this from Review to Done Jan 19, 2022
adacore-bot pushed a commit that referenced this issue May 15, 2023
Also, bump version to 0.10.0

For #104
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants