Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflow for 64 bit modular types #182

Closed
senier opened this issue Mar 18, 2020 · 13 comments · Fixed by #294
Closed

Overflow for 64 bit modular types #182

senier opened this issue Mar 18, 2020 · 13 comments · Fixed by #294
Assignees
Labels
bug generator Related to generator package (SPARK code generation)

Comments

@senier
Copy link
Member

senier commented Mar 18, 2020

The following simple message

package Protocol is
   type U64 is mod 2**64;
   type Msg is
      message
         Field1 : U64;
      end message;
end Protocol;

yields an overflow in the field extraction code:

Compile
   [Ada]          rflx-protocol-msg.ads
rflx-protocol-msg.ads:13:01: warning: in instantiation at rflx-generic_types.adb:93
rflx-protocol-msg.ads:13:01: warning: in instantiation at rflx-protocol-generic_msg.adb:187
rflx-protocol-msg.ads:13:01: warning: mod with zero divisor
rflx-protocol-msg.ads:13:01: warning: "Constraint_Error" will be raised at run time
rflx-protocol-msg.ads:13:01: instantiation error at rflx-generic_types.adb:172
rflx-protocol-msg.ads:13:01: instantiation error at rflx-protocol-generic_msg.adb:295
rflx-protocol-msg.ads:13:01: value not in range of type "Standard.Long_Integer"
rflx-protocol-msg.ads:13:01: "Constraint_Error" would have been raised at run time
gprbuild: *** compilation phase failed
@senier senier created this issue from a note in RecordFlux 0.4 (To do) Mar 18, 2020
@senier
Copy link
Member Author

senier commented Apr 1, 2020

Here are the hacks I'm currently adding manually to the generated code. In the Extract function I changed the final return statement to to prevent 2**64 from overflowing:

      return (if Value'Size < 64                                                                                        
              then Value'Val (Result mod (2**Value'Size))                                                               
              else Value'Val (Result));

In Insert I applied the following changes:

-            Element_Value : constant Long_Integer := (Value'Pos (Val) mod 2**Value'Size);
+            Element_Value : constant Long_Integer := (if Value'Size < 32 then Value'Pos (Val) mod 2**Value'Size else Value'Pos (Val));

This only happens to work for me as my type is 32 bit - we need a solution for 64 bit types, too. Most likely, the type of Element_Value must be changed to mod 64. To ease verification we could try using a non-overflowing modular type.

@senier
Copy link
Member Author

senier commented Apr 16, 2020

One more thought on intermediate values: If we define a 64bit type unconditionally, we'll get in trouble on non-64-bit platforms. We have to come up with a solution that uses types with the maximum system word size (Long_Integer'Size? System.Address'Size?) and rejects specifications exceeding that size.

@jklmnn @treiher Ideas how we could test less-than-64-bit support in the CI? It's probably a good idea to open another ticket for that.

@jklmnn
Copy link
Member

jklmnn commented Apr 16, 2020

What would happen on a non-64-bit platform if we defined an unconditional 64 bit type? From what I can tell it would either fail to compile or the compiler would insert software support (at the expense of performance).
To me it doesn't make a difference if the specification is exceeded or the compiler fails to compile the generated code for that particular platform. In neither case the protocol can be used on that platform. But in the first case we would have to add platform specific elements to the specification.
So I'm not sure if it makes sense to test for this case at all.

@senier
Copy link
Member Author

senier commented Apr 16, 2020

It think I didn't make my point very clear and there are multiple issues to consider here:

  1. Supporting 64-bit types in RecordFlux specifications

    I agree that if you have 64-bit types in your spec you either need a compiler emulation for the generated type or it fails when compiling. Having platform-specific specifications is undesirable.

  2. Intermediate values

    This was my original point: If we unconditionally chose a 64-bit type internally in the routines extracting fields from bytes, this may fail on 32-bit platforms even though no greater-than-32bit type is used in the specification. Or an inefficient emulation is used even in cases where its would not be required. We need to store intermediate values such that the native bit size is not exceeded (but maybe with an option to use an available emulation).

  3. Testing

    Notwithstanding the above points, we should test at least one non-64bit target if we claim to support those. In practice this means some 32-bit target, as even the smallest microcontrollers we target are that size. What could be a good CI-setup here?

@jklmnn
Copy link
Member

jklmnn commented Apr 16, 2020

So can I assume that 2. is already a problem but it only is visible when we have a 64bit type because it would use a greater than 64bit type internally?

@senier
Copy link
Member Author

senier commented Apr 16, 2020

Yes, 64bit types are a problem, already. Hence this ticket. We should just make sure that the fix is not limited to 64-bit platforms and does not create undesired inefficiencies (such as always using a 64-bit emulation).

@jklmnn
Copy link
Member

jklmnn commented Apr 16, 2020

But if we fix the 64bit problem in a generic way (so that we don't need any integers larger than the original data type) wouldn't that implicitly fix the problem for 32 bit platforms?

@senier
Copy link
Member Author

senier commented Apr 16, 2020

Hopefully. I'd still argue that we need to test this to be sure.

@jklmnn
Copy link
Member

jklmnn commented Apr 16, 2020

Okay, I think now I understood the problem correctly. So in theory if we try a protocol with a 32bit value on a 32bit only architecture with the current implementation it should already fail?

About the performance impact of 64bit values on 32bit platforms. I compiled the following code:

procedure Long                                                                                                             
is                                                                                                                         
   type Long is mod 2 ** 64;                                                                                               
   L1 : Long := Long'Last;                                                                                                 
   L2 : Long;                                                                                                              
begin                                                                                                                      
   L2 := L1 + 42;                                                                                                          
end Long;

On x86_64 it compiled with gcc -g -c long.adb into the following bytecode:

long.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_ada_long>:
_ada_long():
/tmp/Ada/long.adb:2

procedure Long
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
/tmp/Ada/long.adb:5
is
   type Long is mod 2 ** 64;
   L1 : Long := Long'Last;
   4:   48 c7 45 f8 ff ff ff    movq   $0xffffffffffffffff,-0x8(%rbp)
   b:   ff 
/tmp/Ada/long.adb:8
   L2 : Long;
begin
   L2 := L1 + 42;
   c:   48 c7 45 f0 29 00 00    movq   $0x29,-0x10(%rbp)
  13:   00 
/tmp/Ada/long.adb:9
end Long;
  14:   5d                      pop    %rbp
  15:   c3                      retq

On arm the result with arm-eabi-gcc --RTS=ada-runtime/build/stm32f0/obj -g -c long.adb was as follows:

long.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <_ada_long>:
_ada_long():
/tmp/Ada/long.adb:2

procedure Long
   0:   e92d0810        push    {r4, fp}
   4:   e28db004        add     fp, sp, #4
   8:   e24dd010        sub     sp, sp, #16
/tmp/Ada/long.adb:5
is
   type Long is mod 2 ** 64;
   L1 : Long := Long'Last;
   c:   e3e03000        mvn     r3, #0
  10:   e3e04000        mvn     r4, #0
  14:   e50b300c        str     r3, [fp, #-12]
  18:   e50b4008        str     r4, [fp, #-8]
/tmp/Ada/long.adb:8
   L2 : Long;
begin
   L2 := L1 + 42;
  1c:   e3a03029        mov     r3, #41 ; 0x29
  20:   e3a04000        mov     r4, #0
  24:   e50b3014        str     r3, [fp, #-20]  ; 0xffffffec
  28:   e50b4010        str     r4, [fp, #-16]
/tmp/Ada/long.adb:9
end Long;
  2c:   e24bd004        sub     sp, fp, #4
  30:   e8bd0810        pop     {r4, fp}
  34:   e12fff1e        bx      lr

It seams that 64bit on 32bit arm are software emulated. Instead of one instruction each line required 4 of which two are STR instructions that write to memory. According to the ARM Infocenter these only need on cycle. Since this is a really simple example L1 is not loaded. If it was with LDR the performance impact would probably worse as LDR might block for more than one cycle.

If we want to prevent this we'd need to instruct the compiler to prevent this kind of code generation.

I also tried to restrict the available types via runtime as defined here. But changing these properties in the runtime results in the same generated code and no errors.

@treiher
Copy link
Collaborator

treiher commented Apr 17, 2020

Using Unsigned_Long with switches -gnata and -gnato13 causes a bug box in all currently available GNAT versions (Componolit/Workarounds#19).

@senier
Copy link
Member Author

senier commented Apr 17, 2020

So in theory if we try a protocol with a 32bit value on a 32bit only architecture with the current implementation it should already fail?

It should work, depending on what the compiler does with the 2**32 constant - from what you saw on ARM I suppose this is supported.

@senier
Copy link
Member Author

senier commented Apr 17, 2020

About the performance impact of 64bit values on 32bit platforms. I compiled the following code:

Could you redo this experiment with optimizations (-O1, -O2) enabled? I believe only this will give us a good insight on the real performance impact.

@jklmnn
Copy link
Member

jklmnn commented Apr 20, 2020

I had to change the example because even -O1 optimized the whole code away with a single return:

package Long is                                                                                                            
                                                                                                                           
   type Long is mod 2 ** 64;                                                                                               
                                                                                                                           
   procedure Add (A :     Long;                                                                                            
                  B :     Long;                                                                                            
                  C : out Long);                                                                                           
                                                                                                                           
end Long;

package body Long is                                                                                                       
                                                                                                                           
   procedure Add (A :     Long;                                                                                            
                  B :     Long;                                                                                            
                  C : out Long)                                                                                            
   is                                                                                                                      
   begin                                                                                                                   
      C := A + B;                                                                                                          
   end Add;                                                                                                                
                                                                                                                           
end Long;

gcc -g -c long.adb:

long__add():
/tmp/Ada/long.adb:4

package body Long is

   procedure Add (A :     Long;
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 89 7d e8             mov    %rdi,-0x18(%rbp)
   8:   48 89 75 e0             mov    %rsi,-0x20(%rbp)
/tmp/Ada/long.adb:9
                  B :     Long;
                  C : out Long)
   is
   begin
      C := A + B;
   c:   48 8b 55 e8             mov    -0x18(%rbp),%rdx
  10:   48 8b 45 e0             mov    -0x20(%rbp),%rax
  14:   48 01 d0                add    %rdx,%rax
  17:   48 89 45 f8             mov    %rax,-0x8(%rbp)
/tmp/Ada/long.adb:10
   end Add;
  1b:   48 8b 45 f8             mov    -0x8(%rbp),%rax
  1f:   90                      nop
  20:   5d                      pop    %rbp
  21:   c3                      retq

gcc -g -c -O(1|2) long.adb:

long__add():
/tmp/Ada/long.adb:9
   procedure Add (A :     Long;
                  B :     Long;
                  C : out Long)
   is
   begin
      C := A + B;
   0:   48 8d 04 37             lea    (%rdi,%rsi,1),%rax
/tmp/Ada/long.adb:10
   end Add;
   4:   c3                      retq

arm-eabi-gcc --RTS=ada-runtime/build/stm32f0/obj -g -c long.adb

long__add():
/tmp/Ada/long.adb:4

package body Long is

   procedure Add (A :     Long;
   0:   e92d0870        push    {r4, r5, r6, fp}
   4:   e28db00c        add     fp, sp, #12
   8:   e24dd018        sub     sp, sp, #24
   c:   e50b001c        str     r0, [fp, #-28]  ; 0xffffffe4
  10:   e50b1018        str     r1, [fp, #-24]  ; 0xffffffe8
  14:   e50b2024        str     r2, [fp, #-36]  ; 0xffffffdc
  18:   e50b3020        str     r3, [fp, #-32]  ; 0xffffffe0
/tmp/Ada/long.adb:9
                  B :     Long;
                  C : out Long)
   is
   begin
      C := A + B;
  1c:   e24b201c        sub     r2, fp, #28
  20:   e8920006        ldm     r2, {r1, r2}
  24:   e24b4024        sub     r4, fp, #36     ; 0x24
  28:   e8940018        ldm     r4, {r3, r4}
  2c:   e0935001        adds    r5, r3, r1
  30:   e0a46002        adc     r6, r4, r2
  34:   e50b5014        str     r5, [fp, #-20]  ; 0xffffffec
  38:   e50b6010        str     r6, [fp, #-16]
/tmp/Ada/long.adb:10
   end Add;
  3c:   e24b4014        sub     r4, fp, #20
  40:   e8940018        ldm     r4, {r3, r4}
  44:   e1a00000        nop                     ; (mov r0, r0)
  48:   e1a00003        mov     r0, r3
  4c:   e1a01004        mov     r1, r4
  50:   e24bd00c        sub     sp, fp, #12
  54:   e8bd0870        pop     {r4, r5, r6, fp}
  58:   e12fff1e        bx      lr

arm-eabi-gcc --RTS=ada-runtime/build/stm32f0/obj -g -c -O(1|2) long.adb

long__add():
/tmp/Ada/long.adb:4

package body Long is

   procedure Add (A :     Long;
   0:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
/tmp/Ada/long.adb:9
                  B :     Long;
                  C : out Long)
   is
   begin
      C := A + B;
   4:   e090b002        adds    fp, r0, r2
   8:   e0a1c003        adc     ip, r1, r3
/tmp/Ada/long.adb:10
   end Add;
   c:   e1a0000b        mov     r0, fp
  10:   e1a0100c        mov     r1, ip
  14:   e49db004        pop     {fp}            ; (ldr fp, [sp], #4)
  18:   e12fff1e        bx      lr

On X86 with optimization it is done with a single LEA instruction that places the result in the result register. On ARM with optimization the addition itself needs 2 instructions (Example 3.4). It seems that the additional instructions are the overhead of the procedure call. They are quite similar to the unoptimized versions of both ARM and X86.

@senier senier added this to To do in RecordFlux 0.5 via automation May 11, 2020
@senier senier removed this from To do in RecordFlux 0.4 May 11, 2020
@senier senier added the bug label May 14, 2020
@senier senier added this to To do in RecordFlux 0.4 via automation May 14, 2020
@senier senier removed this from To do in RecordFlux 0.5 May 14, 2020
@treiher treiher added the generator Related to generator package (SPARK code generation) label May 14, 2020
@treiher treiher self-assigned this May 20, 2020
@treiher treiher moved this from To do to In progress in RecordFlux 0.4 May 20, 2020
@senier senier removed this from In progress in RecordFlux 0.4 May 27, 2020
@senier senier added this to To do in RecordFlux 0.5 via automation May 27, 2020
@treiher treiher moved this from To do to In progress in RecordFlux 0.5 Jun 3, 2020
@senier senier removed this from In progress in RecordFlux 0.5 Jun 6, 2020
@senier senier added this to To do in RecordFlux 0.4.1 via automation Jun 6, 2020
@senier senier moved this from To do to In progress in RecordFlux 0.4.1 Jun 6, 2020
RecordFlux 0.4.1 automation moved this from In progress to Done Jun 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug generator Related to generator package (SPARK code generation)
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants