Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch lookup tables are not properly placed in program memory #47

Open
gergoerdi opened this issue May 6, 2017 · 18 comments
Open

Switch lookup tables are not properly placed in program memory #47

gergoerdi opened this issue May 6, 2017 · 18 comments
Labels
A-llvm Affects the LLVM AVR backend has-reduced-testcase A small LLVM IR file exists that demonstrates the problem

Comments

@gergoerdi
Copy link
Collaborator

gergoerdi commented May 6, 2017

The input code here is a slight tweak of #46, but the generated code is very different because it uses a lookup instead of bit-twiddling:

Rust:

#[inline(never)]
pub fn decode(n: u8) -> Option<()> {
    match n {
        0x0 => Some(()),
        0x1 => Some(()),
        0x2 => Some(()),
        0x3 => Some(()),
        0x4 => Some(()),
        0xe => Some(()),
        _ => None
    }
}

LLVM IR:

        .text
        .file   "external-05dffe7586b7d437.ll"
        .globl  _ZN8external6decode17h0bb113af5b7a3594E
        .p2align        1
        .type   _ZN8external6decode17h0bb113af5b7a3594E,@function
_ZN8external6decode17h0bb113af5b7a3594E: ; @_ZN8external6decode17h0bb113af5b7a3594E
; BB#0:                                 ; %start
        cpi     r24, 15
        brlo    .+2
        rjmp    LBB0_1
; BB#2:                                 ; %switch.lookup
        mov     r26, r24
        mov     r27, r24
        lsl     r27
        sbc     r27, r27
        subi    r26, -lo8(switch.table)
        sbci    r27, -hi8(switch.table)
        ld      r24, X
        ret
LBB0_1:
        ldi     r24, 0
        ret
.Lfunc_end0:
        .size   _ZN8external6decode17h0bb113af5b7a3594E, .Lfunc_end0-_ZN8external6decode17h0bb113af5b7a3594E

        .type   switch.table,@object    ; @switch.table
        .section        .rodata,"a",@progbits
switch.table:
        .ascii  "\001\001\001\001\001\000\000\000\000\000\000\000\000\000\001"
        .size   switch.table, 15

AVR assembly:

Contents of section .text:
 0000 8f3008f0 00c0a82f b82fbb0f bb0ba050  .0....././.....P
 0010 b0408c91 089580e0 0895               .@........      
Contents of section .rodata:
 0000 01010101 01000000 00000000 000001    ............... 

Disassembly of section .text:

00000000 <_ZN8external6decode17h0bb113af5b7a3594E>:
   0:   8f 30           cpi     r24, 0x0F       ; 15
   2:   08 f0           brcs    .+2             ; 0x6 <_ZN8external6decode17h0bb113af5b7a3594E+0x6>
   4:   00 c0           rjmp    .+0             ; 0x6 <_ZN8external6decode17h0bb113af5b7a3594E+0x6>
                        4: R_AVR_13_PCREL       .text+0x16
   6:   a8 2f           mov     r26, r24
   8:   b8 2f           mov     r27, r24
   a:   bb 0f           add     r27, r27
   c:   bb 0b           sbc     r27, r27
   e:   a0 50           subi    r26, 0x00       ; 0
                        e: R_AVR_LO8_LDI_NEG    .rodata
  10:   b0 40           sbci    r27, 0x00       ; 0
                        10: R_AVR_HI8_LDI_NEG   .rodata
  12:   8c 91           ld      r24, X
  14:   08 95           ret

00000016 <LBB0_1>:
  16:   80 e0           ldi     r24, 0x00       ; 0
  18:   08 95           ret
@gergoerdi
Copy link
Collaborator Author

gergoerdi commented May 6, 2017

In the fully linked .elf, the table address in subi/sbci seems wrong to me:

Contents of section .data:
 800100 01010101 01000000 00000000 00000100  ................

Disassembly of section .text:
...
000000b4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E>:
  b4:   8f 30           cpi     r24, 0x0F       ; 15
  b6:   08 f0           brcs    .+2             ; 0xba <_ZN8external6decode17hbbb21fc0c1cdbaf3E+0x6>
  b8:   08 c0           rjmp    .+16            ; 0xca <LBB0_1>
  ba:   a8 2f           mov     r26, r24
  bc:   b8 2f           mov     r27, r24
  be:   bb 0f           add     r27, r27
  c0:   bb 0b           sbc     r27, r27
  c2:   a0 50           subi    r26, 0x00       ; 0
  c4:   bf 4f           sbci    r27, 0xFF       ; 255
  c6:   8c 91           ld      r24, X
  c8:   08 95           ret

@gergoerdi
Copy link
Collaborator Author

Should these lookup tables even be generated?! avr-llvm/llvm#88

@gergoerdi
Copy link
Collaborator Author

Using a linker script, I moved the lookup table into .text. The linked binary now looks more promising:

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .data         00000000  00800100  000000e1  00000155  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  1 .text         000000e1  00000000  00000000  00000074  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .stab         000006cc  00000000  00000000  00000158  2**2
                  CONTENTS, READONLY, DEBUGGING
  3 .stabstr      00000054  00000000  00000000  00000824  2**0
                  CONTENTS, READONLY, DEBUGGING
Disassembly of section .text:

000000b4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E>:
  b4:   8f 30           cpi     r24, 0x0F       ; 15
  b6:   08 f0           brcs    .+2             ; 0xba <_ZN8external6decode17hbb
b21fc0c1cdbaf3E+0x6>
  b8:   08 c0           rjmp    .+16            ; 0xca <LBB0_1>
  ba:   a8 2f           mov     r26, r24
  bc:   b8 2f           mov     r27, r24
  be:   bb 0f           add     r27, r27
  c0:   bb 0b           sbc     r27, r27
  c2:   ae 52           subi    r26, 0x2E       ; 46
  c4:   bf 4f           sbci    r27, 0xFF       ; 255
  c6:   8c 91           ld      r24, X
  c8:   08 95           ret

000000d2 <_etext>:
  d2:   01 01           movw    r0, r2
  d4:   01 01           movw    r0, r2
  d6:   01 00           .word   0x0001  ; ????
        ...
  e0:   01 02           Address 0x00000000000000e0 is out of bounds.
.word   0xffff  ; ????

Where 0xff2e from 0xc2/0xc4 is exactly -0xe0, so this seems like it should work.

Unfortunately, decode(0x03) is still returning None.

@gergoerdi
Copy link
Collaborator Author

gergoerdi commented May 6, 2017

Details of decode going wrong with an input of 0x03 in r24. Note that r26 and r27 seem to point to the right .text address, but the ld still results in 0.

Breakpoint 1, 0x000000b4 in external::decode::hbbb21fc0c1cdbaf3 ()
=> 0x000000b4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+0>:	8f 30	cpi	r24, 0x0F	; 15
=> 0x000000b6 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+2>:	08 f0	brcs	.+2      	;  0xba
=> 0x000000ba <_ZN8external6decode17hbbb21fc0c1cdbaf3E+6>:	a8 2f	mov	r26, r24
=> 0x000000bc <_ZN8external6decode17hbbb21fc0c1cdbaf3E+8>:	b8 2f	mov	r27, r24
=> 0x000000be <_ZN8external6decode17hbbb21fc0c1cdbaf3E+10>:	bb 0f	add	r27, r27
=> 0x000000c0 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+12>:	bb 0b	sbc	r27, r27
=> 0x000000c2 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+14>:	ae 52	subi	r26, 0x2E	; 46
=> 0x000000c4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+16>:	bf 4f	sbci	r27, 0xFF	; 255
=> 0x000000c6 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+18>:	8c 91	ld	r24, X
r26            0xd5	213
r27            0x0	0
=> 0x000000c8 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+20>:	08 95	ret
(gdb) i r r24
r24            0x0	0

@gergoerdi
Copy link
Collaborator Author

gergoerdi commented May 6, 2017

My working theory now is that the .text version is the one that should work, but we should be using LPM instead of LD.

@gergoerdi
Copy link
Collaborator Author

I've fixed this locally by matching load IR instructions with LPMRdZ and linking .rodata into .text with a linking script.

This works for the test case from this ticket, but unfortunately doesn't readily generalize to 16-bit loads, since the pseudo-instruction LPMWRdZ is not implemented. Next step: see if I can copy-paste LDWRdPtr's implementation.

gergoerdi added a commit to gergoerdi/llvm-avr that referenced this issue May 7, 2017
gergoerdi added a commit to gergoerdi/llvm-avr that referenced this issue May 7, 2017
gergoerdi added a commit to gergoerdi/llvm-avr that referenced this issue May 7, 2017
gergoerdi added a commit to gergoerdi/llvm-avr that referenced this issue May 7, 2017
@gergoerdi gergoerdi changed the title Table-based switch gives wrong result LPM needs to be used to load data from ProgMem May 10, 2017
@shepmaster shepmaster added A-llvm Affects the LLVM AVR backend has-reduced-testcase A small LLVM IR file exists that demonstrates the problem labels May 10, 2017
@dylanmckay
Copy link
Member

LLVM IR was not actually attached in the description so here is what I get when I compile it locally

; ModuleID = 'test.cgu-0.rs'
source_filename = "test.cgu-0.rs"
target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-n8"
target triple = "avr-atmel-none"

%"lib::option::Option<()>" = type { i8, [0 x i8], [0 x i8] }

; Function Attrs: noinline uwtable
define internal i8 @_ZN4test6decode17hcd01465f37da56a3E(i8) unnamed_addr #0 {
start:
  %_0 = alloca %"lib::option::Option<()>"
  switch i8 %0, label %bb7 [
    i8 0, label %bb1
    i8 1, label %bb2
    i8 2, label %bb3
    i8 3, label %bb4
    i8 4, label %bb5
    i8 14, label %bb6
  ]

bb1:                                              ; preds = %start
  %1 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %1
  br label %bb8

bb2:                                              ; preds = %start
  %2 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %2
  br label %bb8

bb3:                                              ; preds = %start
  %3 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %3
  br label %bb8

bb4:                                              ; preds = %start
  %4 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %4
  br label %bb8

bb5:                                              ; preds = %start
  %5 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %5
  br label %bb8

bb6:                                              ; preds = %start
  %6 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %6
  br label %bb8

bb7:                                              ; preds = %start
  %7 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 0, i8* %7
  br label %bb8

bb8:                                              ; preds = %bb1, %bb2, %bb3, %bb4, %bb5, %bb6, %bb7
  %8 = bitcast %"lib::option::Option<()>"* %_0 to i8*
  %9 = load i8, i8* %8, align 1
  ret i8 %9
}

; Function Attrs: nounwind uwtable
define void @rust_eh_personality() unnamed_addr #1 {
start:
  ret void
}

; Function Attrs: uwtable
define internal void @_ZN4test4main17hf4d700377696cbb0E() unnamed_addr #2 {
start:
  ret void
}

; Function Attrs: uwtable
define internal i16 @_ZN4test5start17h518035a39d5f5ca0E(i16, i8**) unnamed_addr #2 {
start:
  ret i16 0
}

define i16 @main(i16, i8**) unnamed_addr {
top:
  %2 = call i16 @_ZN4test5start17h518035a39d5f5ca0E(i16 %0, i8** %1)
  ret i16 %2
}

attributes #0 = { noinline uwtable }
attributes #1 = { nounwind uwtable }
attributes #2 = { uwtable }

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"PIE Level", i32 2}

@dylanmckay
Copy link
Member

Continuing conversation from #53

It's interesting because it looks like LLVM is making the switch -> lookup table optimisation.

That means that in order to correctly mark the global variable with the program memory address space, we will need to modify LLVM.

@dylanmckay
Copy link
Member

dylanmckay commented May 13, 2017

This completely untested/uncompiled hack would probably fix it

diff --git a/lib/Transforms/Utils/SimplifyCFG.cpp b/lib/Transforms/Utils/SimplifyCFG.cpp
index 7b0bddbbb83..d0506c2ef17 100644
--- a/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -4973,7 +4973,8 @@ SwitchLookupTable::SwitchLookupTable(
 
   Array = new GlobalVariable(M, ArrayTy, /*constant=*/true,
                              GlobalVariable::PrivateLinkage, Initializer,
-                             "switch.table");
+                             "switch.table", /*InsertBefore=*/ nullptr,
+                             GlobalValue::NotThreadLocal, /*AddressSpace=*/1);
   Array->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
   Kind = ArrayKind;
 }

Of course, if we wanted to upstream it, it would have to be more general purpose.

I can think of two solutions

  • Add a new function to TargetMachineInfo that just creates a GlobalVariable given all of the required arguments. Instead of building the GlobalVariable here, we could call the new function. We can then override this function specifically for AVR and set the address space there.
  • Add a custom pass that runs after simplify-cfgthat enumerates all global variables, inspecting if they contain SwitchTable objects, and sets the appropriate address space. This solution sucks.

@dylanmckay dylanmckay changed the title LPM needs to be used to load data from ProgMem Switch lookup tables are not properly placed in program memory May 13, 2017
@dylanmckay
Copy link
Member

I've been looking into fixing this properly via adding target-specific hooks to the TargetTransformInfo class.

I've been thinking about it and I don't think these lookup tables are a special case.

The switch table itself is just a global variable, i.e.

@switch.table.foo = private unnamed_addr constant [15 x i8] c"\01\01\01\01\01\00\00\00\00\00\00\00\00\00\01"

Now, there is nothing special about this - it is just an array of characters. In general, we should always be able to load things from RAM using the ld family of instructions.

Now, at chip startup, RAM is completely empty and all variables are stored in program memory. What should happen is that the startup code copies all variables that should exist in RAM, into the RAM. Now this should also include the switch table, which shouldn't work any different.

Now if we use ld to load a switch table in data space (address space #0, the default) and use it to lookup a value, it should work fine given that the lookup table is correctly copied to RAM.

Granted, we really would want switch tables to exist in program memory because it's just static branching data. I will finish writing my patch because it's good anyway, but I think there must be something deeper afoot.

Here is where I'm getting my information, questions 1-3.

@dylanmckay
Copy link
Member

Opened D34983.

@gergoerdi
Copy link
Collaborator Author

I am 100% percent against any solution that involves mindlessly copying switch lookup tables into RAM just so we can use ld to load them.

These lookup tables are generated by the compiler. On MCUs that have 512 bytes of RAM or even less, having them in RAM is definitely observable behaviour -- an observable leak of compiler behaviour.

We have everything we need to know, end to end, to avoid this copying: the Rust compiler synthesizes these globals and it also generates the LLVM IR for accessing them. This cannot be an unsolvable problem.

@dylanmckay
Copy link
Member

I completely agree, having lookup tables copied into ram would be a very ugly wart.

My point is that I can fix switch tables but if we currently cannot load those globals, are we even loading globals correctly now?

My patch completely fixes the switch table issue by the way, so they they only exist and are correctly loaded from program memory.

@shepmaster
Copy link
Member

are we even loading globals correctly now?

No, I don't think we are, but I think that responsibility is outside of the compiler proper. I have some old hacked up code locally that adds some linker things to get the size of the global section and then copies it out into RAM before Rust's main starts. I'm pretty sure that GCCs startup files do the same thing.

@dylanmckay
Copy link
Member

Do you know if this functionality is a part of the CRT? If so, that means we will get it for free if we link correctly with avr-gcc's CRT library.

@shepmaster
Copy link
Member

Do you know if this functionality is a part of the CRT?

I think so, but to be honest I don't fully recall and I'm not even sure I have a great distinction of what is and is not the CRT.

if we link correctly with avr-gcc's CRT library.

That would be the trick, wouldn't it? :-)

In case it wasn't clear, I really want to write as much of that in Rust as possible, so I'm always trying to remove any GCC bits ;-)

@dylanmckay
Copy link
Member

I think there are a few unresolved questions with switch tables:

  • Switch tables cannot be placed in memory on MCUs without the load program memory (lpm) instruction
  • Therefore, if we want switch lookup tables in program memory, we must still support loading from them from RAM

I think the easiest way forward is to disable switch lookup tables on AVR until these two problems are resolved (#47 and #69).

@dylanmckay
Copy link
Member

I have a WIP branch here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-llvm Affects the LLVM AVR backend has-reduced-testcase A small LLVM IR file exists that demonstrates the problem
Projects
None yet
Development

No branches or pull requests

3 participants