Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame pointer clobbered in some functions which have spills #38

Closed
gergoerdi opened this issue Apr 30, 2017 · 27 comments
Closed

Frame pointer clobbered in some functions which have spills #38

gergoerdi opened this issue Apr 30, 2017 · 27 comments
Labels
A-llvm Affects the LLVM AVR backend has-llvm-commit This issue should be fixed in upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem

Comments

@gergoerdi
Copy link
Collaborator

I'm trying to add an interrupt handler to TIMER1_COMPA to flash PB5. Here's what I tried:

#[no_mangle]
pub unsafe extern "avr-interrupt" fn __vector_11() {
    volatile_store(PORTB, volatile_load(PORTB) ^ MASK);
}

simavr reports that the generated executable goes wrong after running the handler twice:

CORE: *** Invalid read address PC=0136 SP=2022 O=901f Address 2023 out of ram (08ff)
avr_sadly_crashed

Standalone, full test case:

#![feature(no_core)]
#![feature(lang_items)]
#![feature(fundamental)]
#![feature(intrinsics)]
#![feature(optin_builtin_traits)]
#![feature(asm)]
#![feature(abi_avr_interrupt)]

#![no_core]
#![no_main]

mod clone {
    use marker::Sized;

    pub trait Clone : Sized {
        fn clone(&self) -> Self;

        #[inline(always)]
        fn clone_from(&mut self, source: &Self) {
            *self = source.clone()
        }
    }
}

mod marker {
    use clone::Clone;

    #[lang = "sized"]
    #[fundamental]
    pub trait Sized {}

    #[lang = "copy"]
    pub trait Copy : Clone {}


    #[lang = "freeze"]
    unsafe trait Freeze {}

    unsafe impl Freeze for .. {}
}

mod ops {
    #[lang = "bitor"]
    pub trait BitOr {
        fn bitor(self, rhs: Self) -> Self;
    }

    impl BitOr for u8 {
        #[inline]
        fn bitor(self, rhs: u8) -> u8 { self | rhs }
    }


    #[lang = "bitxor"]
    pub trait BitXor {
        fn bitxor(self, rhs: Self) -> Self;
    }

    impl BitXor for u8 {
        #[inline]
        fn bitxor(self, other: u8) -> u8 { self ^ other }
    }
}

mod intrinsics {
    extern "rust-intrinsic" {
        pub fn volatile_load<T>(src: *const T) -> T;
        pub fn volatile_store<T>(dst: *mut T, val: T);
    }
}

use intrinsics::{volatile_load, volatile_store};

mod avr {
    pub const DDRB:   *mut u8  = 0x24 as *mut u8;
    pub const PORTB:  *mut u8  = 0x25 as *mut u8;
    pub const TCCR1B: *mut u8  = 0x81 as *mut u8;
    pub const TIMSK1: *mut u8  = 0x6f as *mut u8;
    pub const OCR1A:  *mut u16 = 0x88 as *mut u16;
}

use avr::*;

const MASK: u8 = 0b_0010_0000;

#[no_mangle]
pub extern fn main() {
    unsafe {
        volatile_store(DDRB, volatile_load(DDRB) | MASK);

        // Configure timer 1 for CTC mode, with divider of 64
        volatile_store(TCCR1B, volatile_load(TCCR1B) | 0b_0000_1101);

        // Timer frequency
        volatile_store(OCR1A, 62500);

        // Enable CTC interrupt
        volatile_store(TIMSK1, volatile_load(TIMSK1) | 0b_0000_0010);

        // Good to go!
        asm!("SEI");

        loop {}
    }
}

#[no_mangle]
pub unsafe extern "avr-interrupt" fn __vector_11() {
    volatile_store(PORTB, volatile_load(PORTB) ^ MASK);
}

@gergoerdi
Copy link
Collaborator Author

Let's compare the generated assembly code with GCC.

First of all, if the handler (__vector_11()) is empty, the generated assembly is pretty much the same, modulo the order of r0 and r1 on the stack:

  1. From GCC:
000000b0 <__vector_11>:
  b0:   1f 92           push    r1
  b2:   0f 92           push    r0
  b4:   0f b6           in      r0, 0x3f        ; 63
  b6:   0f 92           push    r0
  b8:   11 24           eor     r1, r1
  ba:   0f 90           pop     r0
  bc:   0f be           out     0x3f, r0        ; 63
  be:   0f 90           pop     r0
  c0:   1f 90           pop     r1
  c2:   18 95           reti
  1. From Rust + LLVM:
000000e6 <__vector_11>:
  e6:   0f 92           push    r0
  e8:   1f 92           push    r1
  ea:   0f b6           in      r0, 0x3f        ; 63
  ec:   0f 92           push    r0
  ee:   00 24           eor     r0, r0
  f0:   0f 90           pop     r0
  f2:   0f be           out     0x3f, r0        ; 63
  f4:   1f 90           pop     r1
  f6:   0f 90           pop     r0
  f8:   18 95           reti

@gergoerdi
Copy link
Collaborator Author

If I change __vector_11() to just

#[no_mangle]
pub unsafe extern "avr-interrupt" fn __vector_11() {
    volatile_store(PORTB, MASK);
}

then the generated assembly is still quite close (and there's no error message from simavr:

  1. From GCC:
000000b0 <__vector_11>:
  b0:   1f 92           push    r1
  b2:   0f 92           push    r0
  b4:   0f b6           in      r0, 0x3f        ; 63
  b6:   0f 92           push    r0
  b8:   11 24           eor     r1, r1
  ba:   8f 93           push    r24
  bc:   80 e2           ldi     r24, 0x20       ; 32
  be:   85 b9           out     0x05, r24       ; 5
  c0:   8f 91           pop     r24
  c2:   0f 90           pop     r0
  c4:   0f be           out     0x3f, r0        ; 63
  c6:   0f 90           pop     r0
  c8:   1f 90           pop     r1
  ca:   18 95           reti
  1. From Rust + LLVM:
000000e6 <__vector_11>:
  e6:   0f 92           push    r0
  e8:   1f 92           push    r1
  ea:   0f b6           in      r0, 0x3f        ; 63
  ec:   0f 92           push    r0
  ee:   00 24           eor     r0, r0
  f0:   8f 93           push    r24
  f2:   80 e2           ldi     r24, 0x20       ; 32
  f4:   85 b9           out     0x05, r24       ; 5
  f6:   00 c0           rjmp    .+0             ; 0xf8 <LBB1_1>

000000f8 <LBB1_1>:
  f8:   8f 91           pop     r24
  fa:   0f 90           pop     r0
  fc:   0f be           out     0x3f, r0        ; 63
  fe:   1f 90           pop     r1
 100:   0f 90           pop     r0
 102:   18 95           reti

@gergoerdi
Copy link
Collaborator Author

gergoerdi commented Apr 30, 2017

Things seem to turn weird when the original definition of __vector_11() (i.e. the one that XOr's MASK onto PORTB) is used:

  1. GCC: still very clean:
000000b0 <__vector_11>:
  b0:   1f 92           push    r1
  b2:   0f 92           push    r0
  b4:   0f b6           in      r0, 0x3f        ; 63
  b6:   0f 92           push    r0
  b8:   11 24           eor     r1, r1
  ba:   8f 93           push    r24
  bc:   9f 93           push    r25
  be:   85 b1           in      r24, 0x05       ; 5
  c0:   90 e2           ldi     r25, 0x20       ; 32
  c2:   89 27           eor     r24, r25
  c4:   85 b9           out     0x05, r24       ; 5
  c6:   9f 91           pop     r25
  c8:   8f 91           pop     r24
  ca:   0f 90           pop     r0
  cc:   0f be           out     0x3f, r0        ; 63
  ce:   0f 90           pop     r0
  d0:   1f 90           pop     r1
  d2:   18 95           reti
  1. Rust + LLVM: What is happening here?!
000000e6 <__vector_11>:
  e6:   0f 92           push    r0
  e8:   1f 92           push    r1
  ea:   0f b6           in      r0, 0x3f        ; 63
  ec:   0f 92           push    r0
  ee:   00 24           eor     r0, r0
  f0:   8f 93           push    r24
  f2:   9f 93           push    r25
  f4:   cf 93           push    r28
  f6:   df 93           push    r29
  f8:   cd b7           in      r28, 0x3d       ; 61
  fa:   de b7           in      r29, 0x3e       ; 62
  fc:   22 97           sbiw    r28, 0x02       ; 2
  fe:   0f b6           in      r0, 0x3f        ; 63
 100:   f8 94           cli
 102:   de bf           out     0x3e, r29       ; 62
 104:   0f be           out     0x3f, r0        ; 63
 106:   cd bf           out     0x3d, r28       ; 61
 108:   85 b1           in      r24, 0x05       ; 5
 10a:   8a 83           std     Y+2, r24        ; 0x02
 10c:   8a 81           ldd     r24, Y+2        ; 0x02
 10e:   89 83           std     Y+1, r24        ; 0x01
 110:   00 c0           rjmp    .+0             ; 0x112 <LBB1_1>
00000112 <LBB1_1>:
 112:   80 e2           ldi     r24, 0x20       ; 32
 114:   99 81           ldd     r25, Y+1        ; 0x01
 116:   98 27           eor     r25, r24
 118:   95 b9           out     0x05, r25       ; 5
 11a:   00 c0           rjmp    .+0             ; 0x11c <LBB1_2>

0000011c <LBB1_2>:
 11c:   df 91           pop     r29
 11e:   cf 91           pop     r28
 120:   9f 91           pop     r25
 122:   8f 91           pop     r24
 124:   0f 90           pop     r0
 126:   0f be           out     0x3f, r0        ; 63
 128:   22 96           adiw    r28, 0x02       ; 2
 12a:   0f b6           in      r0, 0x3f        ; 63
 12c:   f8 94           cli
 12e:   de bf           out     0x3e, r29       ; 62
 130:   0f be           out     0x3f, r0        ; 63
 132:   cd bf           out     0x3d, r28       ; 61
 134:   1f 90           pop     r1
 136:   0f 90           pop     r0
 138:   18 95           reti

@shepmaster
Copy link
Member

Thank you for a very complete test case! Could you share what compilation arguments you are using? I tend to always use --release. The number of push / pop feels like there's no optimization. Of course, that shouldn't affect correctness, but compilers....

@gergoerdi
Copy link
Collaborator Author

I'm not really using any compilation arguments except --target=avr-atmel-none --emit=obj. --release doesn't seem to be a recognized rustc option:

11:16:08 [cactus@galaxy interrupt-bug]$ "/home/cactus/prog/rust/rust-avr/build/build/x86_64-unknown-linux-gnu/stage1/bin/rustc" --target=avr-atmel-none src/main.rs --release  --emit=obj -o _build/main.o
error: Unrecognized option: 'release'.

@gergoerdi
Copy link
Collaborator Author

Hmmm playing around with cargo, I figured that I should pass -C opt-level=3 to rustc, which indeed results in much simpler assembly. Let me see if that works.

@gergoerdi
Copy link
Collaborator Author

Ha! Using -C opt-level=3, the resulting binary works fine in simavr:

000000a6 <__vector_11>:
  a6:	0f 92       	push	r0
  a8:	1f 92       	push	r1
  aa:	0f b6       	in	r0, 0x3f	; 63
  ac:	0f 92       	push	r0
  ae:	00 24       	eor	r0, r0
  b0:	8f 93       	push	r24
  b2:	9f 93       	push	r25
  b4:	80 e2       	ldi	r24, 0x20	; 32
  b6:	95 b1       	in	r25, 0x05	; 5
  b8:	98 27       	eor	r25, r24
  ba:	95 b9       	out	0x05, r25	; 5
  bc:	9f 91       	pop	r25
  be:	8f 91       	pop	r24
  c0:	0f 90       	pop	r0
  c2:	0f be       	out	0x3f, r0	; 63
  c4:	1f 90       	pop	r1
  c6:	0f 90       	pop	r0
  c8:	18 95       	reti

@gergoerdi gergoerdi changed the title Wrong interrupt handler generated for nontrivial handler body Wrong interrupt handler generated in non-optimized mode Apr 30, 2017
@shepmaster
Copy link
Member

Cool! Now, as to "wrong". The error message is:

Invalid read address PC=0136 SP=2022 O=901f Address 2023 out of ram (08ff)

It seems hard to believe that we could have actually run out of RAM; there's just not that much code! Can you see why it's trying to read from a piece of memory that's outside of the valid area?

My gut tells me that it has something to do with the fact that we aren't actually initializing the chip properly. I'm pretty sure if you look at GCC's assembly, there's some initializer code that sets up the stack pointer and friends. I thought I had some code that dealt with all that, but I appear to have never pushed it.

@shepmaster
Copy link
Member

shepmaster commented Apr 30, 2017

And for my own curiosity, could you add the optimized assembly from GCC for the handler? Always good to see how that stacks up.

@gergoerdi
Copy link
Collaborator Author

I'm pretty sure the GCC output in #38 (comment) is the optimized version.

@shepmaster
Copy link
Member

I'm pretty sure the GCC output in #38 (comment) is the optimized version.

That's cool; then they are the same (except the r0 / r1 preference)

@dylanmckay
Copy link
Member

My gut tells me that it has something to do with the fact that we aren't actually initializing the chip properly

@gergoerdi: I see that you've used rust to generate an object file - how did you link this? If the GCC version has startup code, that same code should be in avr-rust's executable because we don't have our own CRT library.

There's a similar problem in the AVR support of clang - it hasn't been taught to link a runtime library yet, and so all executables linked by it will be missing the code which initialises the stack and the IVT.

@gergoerdi
Copy link
Collaborator Author

I'm using GCC for linking:

avr-gcc -Os -Wl,--gc-sections -mmcu=atmega328p -o $OUTDIR/image.elf $OBJS

@dylanmckay
Copy link
Member

I've compiled the Rust code on my machine and linked it with the same command and it looks like it indeed does contain all the necesary runtime libraries and startup code.

I can also see the stack pointer being initialised in both executables

  6c:   cf ef           ldi     r28, 0xFF       ; 255
  6e:   d8 e0           ldi     r29, 0x08       ; 8
  70:   de bf           out     0x3e, r29       ; 62
  72:   cd bf           out     0x3d, r28       ; 61

@dylanmckay
Copy link
Member

When I step through and set a breakpoint on vector_11, here is the value of the SP register

SP             0x8008f3 0x8008f3
PC2            0xe6     230
pc             0xe6     0xe6 <__vector_11>

That seems interesting because with an SP that big, you'd need 8mb of RAM, and I was only simulating an atmega328p. I will debug the GCC generated executable and see the stack pointer when vector_11 is hit.

Also, I believe the actual error is hit on the very last line of vector_11 - the reti instruction. This is likely because the stack is corrupted and this

The return address is loaded from the STACK and the Global Interrupt Flag is set

This would be the very first ret/reti instruction hit in the enire program, which is why it is probably triggered here.

@dylanmckay
Copy link
Member

I may have been getting confused, it looks like SP always have 0x800000 added to it. When writing print SP, it displays 2303.

@dylanmckay
Copy link
Member

dylanmckay commented Apr 30, 2017

I've verified that it is not any code in main that corrupted the SP.

Here is the data flow vector_11

  ; load SP and subtract two from it
  in  r28, 61
  in  r29, 62
  sbiw  r28, 2
  in  r0, 63

  ; Store SP.
  ; At this point and afterwards, r29:r28 and SP is 0x08ea, which is perfect
  cli
  out 62, r29
  out 63, r0
  out 61, r28
  in  r24, 5
  std Y+2, r24
  ldd r24, Y+2
  std Y+1, r24
  rjmp  LBB1_1
LBB1_1:
  ldi r24, 32
  ldd r25, Y+1
  eor r25, r24
  out 5, r25
  rjmp  LBB1_2
LBB1_2:

  ; This corrupts the r29:r28 register. Befor this, it stores the value of SP, which should
  ; be used in the `adiw` instruction below to adjust the stack.
  ;
  ; Instead of this, `0` gets popped off so that r29:r28 is 0 and our stack is corrupted below when we subtract two and put it back in SP
  pop r29
  pop r28

  pop r25
  pop r24
  pop r0
  out 63, r0
  adiw  r28, 2

  ; Stack write begin (expanded from 'SPWRITE' node)
  in  r0, 63
  cli
  ; r29 is '32', r0 is '224', r28 is '34'
  ; These values are mucked up because the weird asm block corrupted our registers.
  out 62, r29 # this sets the high byte to `0x20`
  out 63, r0
  out 61, r28 # this sets the low byte to `0x22`
  ; Stack write end

  pop r1
  pop r0
  reti

I have a feeling this is related to the frame pointer, and how it is placed in the Y (r29:r28) register. It is likely we are adjusting the frame pointer while clobbering the frame pointer in the process.

EDIT: Added more annotations to the asm

@dylanmckay
Copy link
Member

Here is an LLVM IR file that can reproduce the problem.

; ModuleID = 'interrupt.cgu-0.rs'
source_filename = "interrupt.cgu-0.rs"
target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-n8"
target triple = "avr-atmel-none"

@__rustc_debug_gdb_scripts_section__ = internal unnamed_addr constant [34 x i8] c"\01gdb_load_rust_pretty_printers.py\00", section ".debug_gdb_scripts", align 1

; Function Attrs: nounwind uwtable
define void @main() unnamed_addr #0 !dbg !5 {
start:
  %tmp_ret2 = alloca i8
  %tmp_ret1 = alloca i8
  %tmp_ret = alloca i8
  %0 = load volatile i8, i8* inttoptr (i16 36 to i8*), align 1, !dbg !10
  store i8 %0, i8* %tmp_ret, !dbg !10
  %1 = load i8, i8* %tmp_ret, !dbg !10
  br label %bb1, !dbg !10

bb1:                                              ; preds = %start
  %2 = or i8 %1, 32, !dbg !10
  store volatile i8 %2, i8* inttoptr (i16 36 to i8*), align 1, !dbg !10
  br label %bb2, !dbg !10

bb2:                                              ; preds = %bb1
  %3 = load volatile i8, i8* inttoptr (i16 129 to i8*), align 1, !dbg !11
  store i8 %3, i8* %tmp_ret1, !dbg !11
  %4 = load i8, i8* %tmp_ret1, !dbg !11
  br label %bb3, !dbg !11

bb3:                                              ; preds = %bb2
  %5 = or i8 %4, 13, !dbg !11
  store volatile i8 %5, i8* inttoptr (i16 129 to i8*), align 1, !dbg !11
  br label %bb4, !dbg !11

bb4:                                              ; preds = %bb3
  store volatile i16 -3036, i16* inttoptr (i16 136 to i16*), align 2, !dbg !12
  br label %bb5, !dbg !12

bb5:                                              ; preds = %bb4
  %6 = load volatile i8, i8* inttoptr (i16 111 to i8*), align 1, !dbg !13
  store i8 %6, i8* %tmp_ret2, !dbg !13
  %7 = load i8, i8* %tmp_ret2, !dbg !13
  br label %bb6, !dbg !13

bb6:                                              ; preds = %bb5
  %8 = or i8 %7, 2, !dbg !13
  store volatile i8 %8, i8* inttoptr (i16 111 to i8*), align 1, !dbg !13
  br label %bb7, !dbg !13

bb7:                                              ; preds = %bb6
  call void asm "SEI", ""(), !dbg !14, !srcloc !15
  br label %bb8, !dbg !16

bb8:                                              ; preds = %bb8, %bb7
  br label %bb8, !dbg !16
}

; Function Attrs: nounwind uwtable
define avr_signalcc  void @__vector_11() unnamed_addr #0 !dbg !17 {
start:
  %tmp_ret = alloca i8
  %0 = load volatile i8, i8* inttoptr (i16 37 to i8*), align 1, !dbg !18
  store i8 %0, i8* %tmp_ret, !dbg !18
  %1 = load i8, i8* %tmp_ret, !dbg !18
  br label %bb1, !dbg !18

bb1:                                              ; preds = %start
  %2 = xor i8 %1, 32, !dbg !18
  store volatile i8 %2, i8* inttoptr (i16 37 to i8*), align 1, !dbg !18
  br label %bb2, !dbg !18

bb2:                                              ; preds = %bb1
  ret void, !dbg !19
}

attributes #0 = { nounwind uwtable "no-frame-pointer-elim"="true" }

!llvm.module.flags = !{!0, !1}
!llvm.dbg.cu = !{!2}

!0 = !{i32 1, !"PIE Level", i32 2}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = distinct !DICompileUnit(language: DW_LANG_Rust, file: !3, producer: "clang LLVM (rustc version 1.18.0-dev (f8982f5c8 2017-04-27))", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !4)
!3 = !DIFile(filename: "interrupt", directory: "/home/dylan/projects/builds/rust-lang/avr-rust")
!4 = !{}
!5 = distinct !DISubprogram(name: "main", linkageName: "_ZN9interrupt4mainE", scope: !7, file: !6, line: 87, type: !8, isLocal: false, isDefinition: true, scopeLine: 87, flags: DIFlagPrototyped, isOptimized: false, unit: !2, templateParams: !4, variables: !4)
!6 = !DIFile(filename: "/tmp/interrupt.rs", directory: "/home/dylan/projects/builds/rust-lang/avr-rust")
!7 = !DINamespace(name: "interrupt", scope: null, file: !6, line: 1)
!8 = !DISubroutineType(types: !9)
!9 = !{null}
!10 = !DILocation(line: 89, scope: !5)
!11 = !DILocation(line: 92, scope: !5)
!12 = !DILocation(line: 95, scope: !5)
!13 = !DILocation(line: 98, scope: !5)
!14 = !DILocation(line: 101, scope: !5)
!15 = !{i32 1}
!16 = !DILocation(line: 103, scope: !5)
!17 = distinct !DISubprogram(name: "__vector_11", linkageName: "_ZN9interrupt11__vector_11E", scope: !7, file: !6, line: 108, type: !8, isLocal: false, isDefinition: true, scopeLine: 108, flags: DIFlagPrototyped, isOptimized: false, unit: !2, templateParams: !4, variables: !4)
!18 = !DILocation(line: 109, scope: !17)
!19 = !DILocation(line: 110, scope: !17)

@dylanmckay
Copy link
Member

This is indeed being caused by the frame pointer.

There is an SPWRITE node which is writing the corrupted value to the stack pointer. In a function with a frame pointer, the Y register should not be used at all.

This is probably caused by a hack I had to remove a while back in order to finish upstreaming the backend.

@shepmaster might remember this - avr-llvm/llvm#145

It looks like we removed the code which actually rewound the register allocator (avr-llvm/llvm#226) but we left a place in the code where it assumes that the FP lives in r29:r28.

There are two places I can see that are relevant

I will investigate to see if it's possible to just remove them, as we already decided that we could just stop using the FP.

@dylanmckay
Copy link
Member

I'm not entirely sure what the deal is on avr-llvm/llvm#145

I make the claim

From what I can tell, we still need a frame pointer in the case of variable sized allocas. If we push arbitrary sizes values onto the stack, the only way (as far as I can tell) is to store the frame pointer and restore it afterwards.

and the same day I say

Your comments made me think a bit more, any why our backend does this and no others do.

We reserve the Y register if a function has spills. This is super conservative, and I'm pretty sure does not need to be done. Spills by themselves become push/pop instructions, which do not require a frame pointer.

The hack mentioned in this issue is just a workaround so that we can reserve the FP in the case we have spills. I think it should be sufficient to remove the hack.

I didn't explain why I changed my mind. Thinking about it now, I still don't see a way to have dynamically sized allocas without a FP.

It looks like we're going to have to rebuild some kind of support for reserving the frame pointer. But at least we would've been forced to remove the hack anyway (it touched LLVM code common to all targets).

@dylanmckay
Copy link
Member

I think I've got a solution - previously we'd set the FP if there were any spills or allocas.

The reason we needed the rewinding trick is because we couldn't tell in advance if a given function has a spill. This meant that we had to hack the register allocator to work around it if we made a spill in a function.

If we amend it so that we only use an FP if we have allocas. This is something we can tell in advance.

@dylanmckay dylanmckay changed the title Wrong interrupt handler generated in non-optimized mode Frame pointer clobbered in some functions which have spills Apr 30, 2017
@dylanmckay
Copy link
Member

dylanmckay commented May 1, 2017

It will be quite hard to write out a frame pointer, and I think the performance of the generated code would be worse.

If we don't have a dedicated frame pointer, we'd need to scavenge a DREGS register for every FrameIndex, loading the SP and saving it every time.

It looks like we really will have to teach LLVM to mark the frame pointer as reserved if there is a spill. I'm not entirely sure how to do it, and so I've written an email to llvmdev.

In the meantime, we have a few different options.

  • Reinstate the hack ([AVR] Don't reserve the frame pointer when spilling registers avr-llvm/llvm#226) but only for this fork. We would never be able to upstream the hack
  • Enable the frame pointer for all functions with locals (absolutely terrible for performance).
  • Create a subclass of RegAllocGreedy and move the dirty hack into it. This means that the hack would be self-contained within lib/Target/AVR and would be mergeable.
  • Wait

@dylanmckay dylanmckay added the A-llvm Affects the LLVM AVR backend label May 1, 2017
@shepmaster
Copy link
Member

  • Reinstate the hack

I'd prefer not to go down this route, as it would give false hope that things are working when really they aren't.

Enable the frame pointer for all functions with locals (absolutely terrible for performance).

This is more palatable as it's "just" a performance loss. If the change were easy, I'd love to apply it and see how far the compilation gets.

Wait

This seems best currently; you've already got some interesting suggestions on the thread.

  • Create a subclass of RegAllocGreedy and move the dirty hack into it.

An interesting alternative. If we have to do something, at least it would be contained somewhat.

@dylanmckay
Copy link
Member

This is more palatable as it's "just" a performance loss. If the change were easy, I'd love to apply it and see how far the compilation gets.

I'm keen on this too. It's very easy to enable it globally, and I think it's better that we generate correct code now and properly fix it later.

I will add this to the avr-support branch.

@dylanmckay
Copy link
Member

When I enable the frame pointer for all functions, this specific problem is fixed.

I've found another problem which causes Y to be clobbered however.

The function emission code looks something like this

  • Emit function prologue
  • Restore callee saved registers
  • Emit function epilogue

It looks like Y is automatically restored before the epilogue is generated. The epilogue assumes that 'Y' still contains SP, but it in actual fact contains its original value at the time the function was called.

This should be an easy fix however.

@dylanmckay
Copy link
Member

I will raise another bug for the other issue.

I've just committed a patch to enable the FP for all functions to LLVM trunk r301881. I'm not gonna bother cherry-picking it until the second bug is fixed.

@dylanmckay dylanmckay added has-llvm-commit This issue should be fixed in upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem labels May 1, 2017
dylanmckay pushed a commit that referenced this issue May 2, 2017
@dylanmckay
Copy link
Member

dylanmckay commented May 2, 2017

Fix has been cherry picked from llvm-mirror/llvm@2cc0b07, everything is shiny

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-llvm Affects the LLVM AVR backend has-llvm-commit This issue should be fixed in upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem
Projects
None yet
Development

No branches or pull requests

3 participants