Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM greedy register allocator does not know how to split live variables across register classes #37

Closed
shepmaster opened this issue Apr 28, 2017 · 38 comments
Assignees
Labels
A-libcore Affects compiling the core library A-llvm Affects the LLVM AVR backend has-llvm-commit This issue should be fixed in upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem

Comments

@shepmaster
Copy link
Member

shepmaster commented Apr 28, 2017

(See below for a further minimized example)

Error message

LLVM ERROR: ran out of registers during register allocation

So from my testing, it seams like the allocator reserves the Z register for the ICALL instruction and then the register class ptrdispregs only has 1 register left and we can't use Y for source and destination.

Reproduction

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-650d16e.bc"
target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-n8"
target triple = "avr"

%FmtFormatter.5.12.19.26.33.40.54.68.75.82.89.192 = type { i32, [0 x i8], i32, [0 x i8], %"Option<usize>.0.7.14.21.28.35.49.63.70.77.84.187", [0 x i8], %"Option<usize>.0.7.14.21.28.35.49.63.70.77.84.187", [0 x i8], { i8*, void (i8*)** }, [0 x i8], %"Iter<fmt::ArgumentV1>.4.11.18.25.32.39.53.67.74.81.88.191", [0 x i8], { %ArgumentV1.2.9.16.23.30.37.51.65.72.79.86.189*, i16 }, [0 x i8], i8, [3 x i8] }
%"Option<usize>.0.7.14.21.28.35.49.63.70.77.84.187" = type { i16, [0 x i16], [1 x i16] }
%"Iter<fmt::ArgumentV1>.4.11.18.25.32.39.53.67.74.81.88.191" = type { %ArgumentV1.2.9.16.23.30.37.51.65.72.79.86.189*, [0 x i8], %ArgumentV1.2.9.16.23.30.37.51.65.72.79.86.189*, [0 x i8], %"PhantomData<&fmt::ArgumentV1>.3.10.17.24.31.38.52.66.73.80.87.190", [0 x i8] }
%ArgumentV1.2.9.16.23.30.37.51.65.72.79.86.189 = type { %Void.1.8.15.22.29.36.50.64.71.78.85.188*, [0 x i8], i8 (%Void.1.8.15.22.29.36.50.64.71.78.85.188*, %FmtFormatter.5.12.19.26.33.40.54.68.75.82.89.192*)*, [0 x i8] }
%Void.1.8.15.22.29.36.50.64.71.78.85.188 = type { {}, [0 x i8] }
%"PhantomData<&fmt::ArgumentV1>.3.10.17.24.31.38.52.66.73.80.87.190" = type {}
%EscapeUnicode.6.13.20.27.34.41.55.69.76.83.90.193 = type { i32, [0 x i8], i16, [0 x i8], i8, [1 x i8] }

@str.2y = external constant [7 x i8]

; Function Attrs: uwtable
define void @EscapeDefaultStateDebug(%FmtFormatter.5.12.19.26.33.40.54.68.75.82.89.192* dereferenceable(32)) unnamed_addr #0 personality i32 (...)* @rust_eh_personality {
start:
  %_47 = alloca %EscapeUnicode.6.13.20.27.34.41.55.69.76.83.90.193*, align 2
  switch i2 undef, label %bb4 [
    i2 0, label %_ZN4core3fmt8builders10DebugTuple6finish17h8f1e007edfc4d0a6E.exit
  ]

_ZN4core3fmt8builders10DebugTuple6finish17h8f1e007edfc4d0a6E.exit: ; preds = %start
  ret void

bb4:                                              ; preds = %start
  %1 = getelementptr inbounds %FmtFormatter.5.12.19.26.33.40.54.68.75.82.89.192, %FmtFormatter.5.12.19.26.33.40.54.68.75.82.89.192* %0, i16 0, i32 8, i32 1
  %2 = load void (i8*)**, void (i8*)*** %1, align 2, !noalias !0, !nonnull !9
  %3 = getelementptr inbounds void (i8*)*, void (i8*)** %2, i16 3
  %4 = bitcast void (i8*)** %3 to i8 ({}*, i8*, i16)**
  %5 = load i8 ({}*, i8*, i16)*, i8 ({}*, i8*, i16)** %4, align 2, !invariant.load !9, !noalias !0, !nonnull !9
  %6 = tail call i8 %5({}* nonnull undef, i8* noalias nonnull readonly getelementptr inbounds ([7 x i8], [7 x i8]* @str.2y, i16 0, i16 0), i16 7), !noalias !10
  unreachable
}

declare i32 @rust_eh_personality(...) unnamed_addr

attributes #0 = { uwtable }

!0 = !{!1, !3, !5, !6, !8}
!1 = distinct !{!1, !2, !"_ZN4core3fmt9Formatter9write_str17hf8083fd39f1b0c0cE: argument 0"}
!2 = distinct !{!2, !"_ZN4core3fmt9Formatter9write_str17hf8083fd39f1b0c0cE"}
!3 = distinct !{!3, !4, !"_ZN4core3fmt8builders15debug_tuple_new17ha94131b12062f03fE: argument 0"}
!4 = distinct !{!4, !"_ZN4core3fmt8builders15debug_tuple_new17ha94131b12062f03fE"}
!5 = distinct !{!5, !4, !"_ZN4core3fmt8builders15debug_tuple_new17ha94131b12062f03fE: argument 1"}
!6 = distinct !{!6, !7, !"_ZN4core3fmt9Formatter11debug_tuple17hadc8e3bda8cfb00cE: argument 0"}
!7 = distinct !{!7, !"_ZN4core3fmt9Formatter11debug_tuple17hadc8e3bda8cfb00cE"}
!8 = distinct !{!8, !7, !"_ZN4core3fmt9Formatter11debug_tuple17hadc8e3bda8cfb00cE: argument 1"}
!9 = !{}
!10 = !{!3, !6}

I'm not actually sure if Rust or LLVM should be handling this...

@shepmaster shepmaster added A-libcore Affects compiling the core library has-reduced-testcase A small LLVM IR file exists that demonstrates the problem labels Apr 28, 2017
@shepmaster
Copy link
Member Author

The debug log from llc

@shepmaster
Copy link
Member Author

And the machine code when we are doing register allocation

# Machine code for function EscapeDefaultStateDebug: NoPHIs, TracksLiveness
Frame Objects:
  fi#0: size=2, align=2, at location [SP+2]
Function Live Ins: %R25R24 in %vreg0

BB#0: derived from LLVM BB %start
    Live Ins: %R25R24
	%vreg11<def> = COPY %R25R24; DREGS:%vreg11
	%vreg1<def> = LDIRdK 0; LD8:%vreg1
	CPIRdK %vreg1, 0, %SREG<imp-def>, %SREG<imp-use,kill>; LD8:%vreg1
	BRNEk <BB#2>, %SREG<imp-use,kill>
	RJMPk <BB#1>
    Successors according to CFG: BB#1(0x7ffff800 / 0x80000000 = 100.00%) BB#2(0x00000800 / 0x80000000 = 0.00%)

BB#1: derived from LLVM BB %_ZN4core3fmt8builders10DebugTuple6finish17h8f1e007edfc4d0a6E.exit
    Predecessors according to CFG: BB#0
	RET

BB#2: derived from LLVM BB %bb4
    Predecessors according to CFG: BB#0
	%vreg13<def> = COPY %vreg11; DREGS:%vreg13,%vreg11
	%vreg14<def> = COPY %vreg13; PTRDISPREGS:%vreg14 DREGS:%vreg13
	%vreg5<earlyclobber,def> = LDDWRdPtrQ %vreg14, 18; mem:LD2[%2](noalias=!1,!3,!5,!6,!8)(dereferenceable) PTRDISPREGS:%vreg5,%vreg14
	%vreg4<earlyclobber,def> = LDDWRdPtrQ %vreg5, 6; mem:LD2[%5](noalias=!1,!3,!5,!6,!8)(invariant) DREGS:%vreg4 PTRDISPREGS:%vreg5
	ADJCALLSTACKDOWN 0, %SP<imp-def,dead>, %SREG<imp-def,dead>, %SP<imp-use>
	%vreg6<def> = LDIWRdK <ga:@str.2y>; DLDREGS:%vreg6
	%vreg7<def> = LDIWRdK 7; DLDREGS:%vreg7
	%R23R22<def> = COPY %vreg6; DLDREGS:%vreg6
	%R21R20<def> = COPY %vreg7; DLDREGS:%vreg7
	%R31R30<def> = COPY %vreg4; DREGS:%vreg4
	ICALL %R31R30, %R25R24<undef>, %R23R22, %R21R20, <regmask %R2 %R3 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R11 %R12 %R13 %R14 %R15 %R16 %R17 %R28 %R29 %R3R2 %R5R4 %R7R6 %R9R8 %R11R10 %R13R12 %R15R14 %R17R16 %R29R28>, %SP<imp-use>, %R31R30<imp-use>, %SP<imp-def>, %R24<imp-def,dead>
	ADJCALLSTACKUP 0, 0, %SP<imp-def,dead>, %SREG<imp-def,dead>, %SP<imp-use>

# End machine code for function EscapeDefaultStateDebug.

@dylanmckay
Copy link
Member

This is almost certainly related to avr-llvm/llvm#1. The bug was really hard to fix (I had to figure out how the greedy register allocator worked and make a change to it so that it would attempt to split unspillable live ranges.

I'm surprised to see this again. Hopefully it's easier to solve this time lol

@shepmaster
Copy link
Member Author

One thing I notice but don't know if it's important: When selectOrSplit PTRDISPREGS is called at the end of the debug log, it doesn't pick R29R28, only R31R30. That seems like it would artifically increase the contention of registers.

@dylanmckay
Copy link
Member

That may be because R29R28 is the frame pointer register, which is enabled for functions in a few cases including if there was a stack spill.

@shepmaster
Copy link
Member Author

One thing I haven't been able to figure out yet: why would something be unspillable? With 32 registers, it seems like it should be conceptually pretty hard to actually run out...

@dylanmckay
Copy link
Member

dylanmckay commented Apr 30, 2017 via email

@shepmaster
Copy link
Member Author

only support using pointers from the X, Y, and Z registers.

only have two pointers in registers at a time in some cases.

But in this case, it looks like we are using PTRDISPREGS which only has 2 registers, Y and Z. If Y is taken up, then we only have Z, which fits with the output. Is using this register class, instead of PTRREGS suspicious at all?

@dylanmckay
Copy link
Member

Is using this register class, instead of PTRREGS suspicious at all?

Nah, it looks like it's PTRDISPREGS because the instruction is lddw, which (I can't remember why) but it only supports a small set of registers, named PTRDISPREGS because they can only be used in thie displacement instructions.

@shepmaster
Copy link
Member Author

I've been trying to comment out parts of libcore to get to the next error, but it feels like this is occurring for every implementation of core::fmt::Debug, of which there are a lot... probably means it's an important issue.

@dylanmckay
Copy link
Member

I've got an unfinished patch locally for #38 which may reduce the number of these (reserve the frame pointer in less cases).

@shepmaster
Copy link
Member Author

The alloca definitely plays a part - the resulting value is never used when it fails, but removing that line cause the compilation to succeed. Some quick searching seems to indicate that a frame pointer is often used when alloca is.

@shepmaster
Copy link
Member Author

I'm continuing to try to drive down the example size. This is what I have now:

source_filename = "bugpoint-output-87a64d6.bc"
target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-n8"
target triple = "avr"

%"fmt::Formatter" = type { i32, { i8*, void (i8*)** } }

@str.1b = external constant [0 x i8]

define void @"TryFromIntError::Debug"(%"fmt::Formatter"* dereferenceable(32)) unnamed_addr #0 personality i32 (...)* @rust_eh_personality {
start:
  %builder = alloca i8, align 8
  %1 = getelementptr inbounds %"fmt::Formatter", %"fmt::Formatter"* %0, i16 0, i32 1
  %2 = bitcast { i8*, void (i8*)** }* %1 to {}**
  %3 = load {}*, {}** %2, align 2
  %4 = getelementptr inbounds %"fmt::Formatter", %"fmt::Formatter"* %0, i16 0, i32 1, i32 1
  %5 = load void (i8*)**, void (i8*)*** %4, align 2
  %6 = getelementptr inbounds void (i8*)*, void (i8*)** %5, i16 3
  %7 = bitcast void (i8*)** %6 to i8 ({}*, i8*, i16)**
  %8 = load i8 ({}*, i8*, i16)*, i8 ({}*, i8*, i16)** %7, align 2
  %9 = tail call i8 %8({}* nonnull %3, i8* noalias nonnull readonly getelementptr inbounds ([0 x i8], [0 x i8]* @str.1b, i16 0, i16 0), i16 15)
  unreachable
}

declare i32 @rust_eh_personality(...) unnamed_addr

attributes #0 = { uwtable }

Things I've noticed:

  • Removing alloca - compiles. My guess is that alloca is triggering the frame pointer, reducing available registers.
  • Removing the first member of Formatter - compiles. My guess is that this allows us to use the non-displacement instructions.

@shepmaster
Copy link
Member Author

shepmaster commented May 1, 2017

Continuing to minimize:

source_filename = "bugpoint-output-87a64d6.bc"
target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-n8"
target triple = "avr"

%"fmt::Formatter" = type { i32, {}*, i8 ({}*)** }

define void @"TryFromIntError::Debug"(%"fmt::Formatter"*) {
start:
  %builder = alloca i8
  %1 = getelementptr %"fmt::Formatter", %"fmt::Formatter"* %0, i8 0, i32 1
  %2 = load {}*, {}** %1
  %3 = getelementptr %"fmt::Formatter", %"fmt::Formatter"* %0, i8 0, i32 2
  %4 = load i8 ({}*)**, i8 ({}*)*** %3
  %5 = load i8 ({}*)*, i8 ({}*)** %4
  %6 = tail call i8 %5({}* %2)
  unreachable
}
********** GREEDY REGISTER ALLOCATION **********
********** Function: TryFromIntError::Debug
********** Compute Spill Weights **********
********** Function: TryFromIntError::Debug
********** INTERVALS **********
R24 [0B,16r:0)[128r,160r:2)[160r,160d:1)  0@0B-phi 1@160r 2@128r
R25 [0B,16r:0)[128r,160r:1)  0@0B-phi 1@128r
%vreg2 [16r,96r:0)  0@16r
%vreg3 [80e,144r:0)  0@80e
%vreg4 [48e,80r:0)  0@48e
%vreg5 [96e,128r:0)  0@96e
RegMasks: 160r
********** MACHINEINSTRS **********
# Machine code for function TryFromIntError::Debug: NoPHIs, TracksLiveness
Frame Objects:
  fi#0: size=1, align=1, at location [SP+2]
Function Live Ins: %R25R24 in %vreg0

0B	BB#0: derived from LLVM BB %start
	    Live Ins: %R25R24
16B		%vreg2<def> = COPY %R25R24; PTRDISPREGS:%vreg2
48B		%vreg4<earlyclobber,def> = LDDWRdPtrQ %vreg2, 6; mem:LD2[%3] PTRDISPREGS:%vreg4,%vreg2
80B		%vreg3<earlyclobber,def> = LDWRdPtr %vreg4; mem:LD2[%4] DREGS:%vreg3 PTRDISPREGS:%vreg4
96B		%vreg5<earlyclobber,def> = LDDWRdPtrQ %vreg2, 4; mem:LD2[%1] DREGS:%vreg5 PTRDISPREGS:%vreg2
112B		ADJCALLSTACKDOWN 0, %SP<imp-def,dead>, %SREG<imp-def,dead>, %SP<imp-use>
128B		%R25R24<def> = COPY %vreg5; DREGS:%vreg5
144B		%R31R30<def> = COPY %vreg3; DREGS:%vreg3
160B		ICALL %R31R30<kill>, %R25R24, <regmask %R2 %R3 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R11 %R12 %R13 %R14 %R15 %R16 %R17 %R28 %R29 %R3R2 %R5R4 %R7R6 %R9R8 %R11R10 %R13R12 %R15R14 %R17R16 %R29R28>, %SP<imp-use>, %R31R30<imp-use>, %SP<imp-def>, %R24<imp-def,dead>
176B		ADJCALLSTACKUP 0, 0, %SP<imp-def,dead>, %SREG<imp-def,dead>, %SP<imp-use>

# End machine code for function TryFromIntError::Debug.


selectOrSplit DREGS:%vreg3 [80e,144r:0)  0@80e w=4.344086e-03
AllocationOrder(DREGS) = [ %R25R24 %R19R18 %R21R20 %R23R22 %R31R30 %R27R26 %R17R16 %R15R14 %R13R12 %R11R10 %R9R8 %R7R6 %R5R4 %R3R2 ]
hints: %R31R30
assigning %vreg3 to %R31R30: R30 [80e,144r:0)  0@80e R31 [80e,144r:0)  0@80e

selectOrSplit DREGS:%vreg5 [96e,128r:0)  0@96e w=4.665127e-03
hints: %R25R24
assigning %vreg5 to %R25R24: R24 [96e,128r:0)  0@96e R25 [96e,128r:0)  0@96e

selectOrSplit PTRDISPREGS:%vreg2 [16r,96r:0)  0@16r w=6.250000e-03
AllocationOrder(PTRDISPREGS) = [ %R31R30 ] (sub-class)
RS_Assign Cascade 0
should evict: %vreg3 [80e,144r:0)  0@80e w= 4.344086e-03
should evict: %vreg3 [80e,144r:0)  0@80e w= 4.344086e-03
evicting %R31R30 interference: Cascade 1
unassigning %vreg3 from %R31R30: R30 R31
assigning %vreg2 to %R31R30: R30 [16r,96r:0)  0@16r R31 [16r,96r:0)  0@16r
queuing new interval: %vreg3 [80e,144r:0)  0@80e

selectOrSplit DREGS:%vreg3 [80e,144r:0)  0@80e w=4.344086e-03
hints: %R31R30
missed hint %R31R30
assigning %vreg3 to %R19R18: R18 [80e,144r:0)  0@80e R19 [80e,144r:0)  0@80e

selectOrSplit PTRDISPREGS:%vreg4 [48e,80r:0)  0@48e w=INF
RS_Assign Cascade 0
evicting %R31R30 interference: Cascade 2
unassigning %vreg2 from %R31R30: R30 R31
assigning %vreg4 to %R31R30: R30 [48e,80r:0)  0@48e R31 [48e,80r:0)  0@48e
queuing new interval: %vreg2 [16r,96r:0)  0@16r

selectOrSplit PTRDISPREGS:%vreg2 [16r,96r:0)  0@16r w=6.250000e-03
RS_Assign Cascade 2
wait for second round
queuing new interval: %vreg2 [16r,96r:0)  0@16r

selectOrSplit PTRDISPREGS:%vreg2 [16r,96r:0)  0@16r w=6.250000e-03
RS_Split Cascade 2
Analyze counted 3 instrs in 1 blocks, through 0 blocks.
tryLocalSplit:  16r 48r 96r
%R31R30 16r-48r i=INF extend
%R31R30 48r-96r i=INF end
Split around 3 individual instrs.
AllocationOrder(DREGS) = [ %R25R24 %R19R18 %R21R20 %R23R22 %R31R30 %R27R26 %R17R16 %R15R14 %R13R12 %R11R10 %R9R8 %R7R6 %R5R4 %R3R2 ]
    skip:	16r	%vreg2<def> = COPY %R25R24; PTRDISPREGS:%vreg2
AllocationOrder(PTRDISPREGS) = [ %R31R30 ] (sub-class)
    enterIntvBefore 48r: valno 0
    leaveIntvAfter 48r: valno 0
    useIntv [40r;48r): [40r;48r):1
    enterIntvBefore 96r: valno 0
    leaveIntvAfter 96r: not live
    useIntv [88r;112B): [40r;48r):1 [88r;112B):2
Multi-mapped complement 0@44r for parent 0@16r hoist to BB#0 44r
Direct complement def at 16r
Removing 1 back-copies.
Removing 44r	%vreg8<def> = COPY %vreg2; PTRDISPREGS:%vreg8,%vreg2
  blit [16r,96r:0): [16r;40r)=0(%vreg8)(recalc) [40r;48r)=1(%vreg9):0 [48r;88r)=0(%vreg8)(recalc) [88r;96r)=2(%vreg10):0
  rewr BB#0	16r:0	%vreg8<def> = COPY %R25R24; PTRDISPREGS:%vreg8
  rewr BB#0	48B:1	%vreg4<earlyclobber,def> = LDDWRdPtrQ %vreg9, 6; mem:LD2[%3] PTRDISPREGS:%vreg4,%vreg9
  rewr BB#0	96B:2	%vreg5<earlyclobber,def> = LDDWRdPtrQ %vreg10, 4; mem:LD2[%1] DREGS:%vreg5 PTRDISPREGS:%vreg10
  rewr BB#0	40B:0	%vreg9<def> = COPY %vreg8; PTRDISPREGS:%vreg9,%vreg8
  rewr BB#0	88B:0	%vreg10<def> = COPY %vreg8; PTRDISPREGS:%vreg10,%vreg8
Inflated %vreg8 to DREGS
queuing new interval: %vreg8 [16r,88r:0)  0@16r
queuing new interval: %vreg9 [40r,48r:0)  0@40r
queuing new interval: %vreg10 [88r,96r:0)  0@88r

selectOrSplit DREGS:%vreg8 [16r,88r:0)  0@16r w=6.419491e-03
hints: %R25R24
assigning %vreg8 to %R25R24: R24 [16r,88r:0)  0@16r R25 [16r,88r:0)  0@16r

selectOrSplit PTRDISPREGS:%vreg9 [40r,48r:0)  0@40r w=INF
RS_Spill Cascade 0
Try last chance recoloring for %vreg9 [40r,48r:0)  0@40r
Try to assign: %vreg9 [40r,48r:0)  0@40r to %R31R30
unassigning %vreg4 from %R31R30: R30 R31
assigning %vreg9 to %R31R30: R30 [40r,48r:0)  0@40r R31 [40r,48r:0)  0@40r
Try to recolor: %vreg4 [48e,80r:0)  0@48e
RS_Assign Cascade 2
wait for second round
Fail to assign: %vreg9 [40r,48r:0)  0@40r to %R31R30
unassigning %vreg9 from %R31R30: R30 R31
assigning %vreg4 to %R31R30: R30 [48e,80r:0)  0@48e R31 [48e,80r:0)  0@48e
LLVM ERROR: ran out of registers during register allocation

@shepmaster
Copy link
Member Author

@dylanmckay yo, why does LDWRdPtr use PTRDISPREGS? Shouldn't the ld instruction work with PTRREGS?

Changing that locally allows this example code to compile (but I'm not sure it's correct as I get a rogue cli in the output)

@shepmaster
Copy link
Member Author

Also, I'm pretty sure that even with one register, the example LLVM IR should be able to be compiled to machine code. I feel like something's not quite right with spilling / reordering algorithm. I was trying to write out a proposed register allocation when I stumbled on the LDWRdPtr comment.

@dylanmckay
Copy link
Member

@dylanmckay yo, why does LDWRdPtr use PTRDISPREGS? Shouldn't the ld instruction work with PTRREGS?

I'll have a look in a few hours and get back to you.

Also, I'm pretty sure that even with one register, the example LLVM IR should be able to be compiled to machine code. I feel like something's not quite right with spilling / reordering algorithm. I was trying to write out a proposed register allocation when I stumbled on the LDWRdPtr comment.

Yeah, it is a bug in LLVM. I'll get more details in a bit but from memory, when LLVM runs out of registers it isn't smart enough to perform a cross-class copy (i.e. copy a PTRREGS to a DREGS) temporarily. I don't think we would have this issue if PTRREGS == DREGS. I believe the original AVR-LLVM developers put something on bugzilla, I'll dig it up.

@shepmaster
Copy link
Member Author

Changing that locally allows this example code to compile

Unfortunately, the real code still fails; guess that that one extra register wasn't enough.

@shepmaster
Copy link
Member Author

Yeah, it is a bug in LLVM. I'll get more details in a bit but from memory, when LLVM runs out of registers it isn't smart enough to perform a cross-class copy

How scary would implementing something like that be? It feels like changing the register allocator is A Big Deal. Beyond scariness, how complicated would you guess?

@jackpot51
Copy link

Guys, I have been following changes. I have removed bignum, and i128/u128 and am now on this error. If you need me to hack out Debug implementations I can do that too - then I will post my changes on a fork

@dylanmckay
Copy link
Member

dylanmckay commented May 3, 2017

Here's the patch where we decided to use the generic relaxation pass.

Here's the bug which discusses the register allocator problem.

@dylanmckay
Copy link
Member

How scary would implementing something like that be? It feels like changing the register allocator is A Big Deal. Beyond scariness, how complicated would you guess?

It's not too scary - there are a lot of test cases for the register allocator and so it is hard to break. I've worked on it before, but funnily enough I also caused a small regression that was discovered a few weeks later.

how complicated would you guess?

It depends on how much support there is for it. I'm sure there's some sort of mechanism for finding free registers in a register class and performing temporary copies to it, it shouldn't be too hard to make it work across class boundaries famous last words

@shepmaster
Copy link
Member Author

to use the generic relaxation pass.

@dylanmckay did you want to put this on #44 ?

the register allocator problem.

That does seem like the same root cause; too bad the earlier fix didn't solve it. 😿

@shepmaster
Copy link
Member Author

@jackpot51 there's no easy way to know which Rust constructs will trigger a given LLVM issue. Essentially, my process has been:

  1. attempt to build libcore
  2. get a failure
  3. reduce a testcase
  4. report that testcase here
  5. attempt to comment out the offending code
  6. repeat

You are welcome to try to comment out all the Debug implementations in libcore and see if that gets past this issue. It's possible that other constructs generate the same style of code and will be subject to the same problem.

@dylanmckay
Copy link
Member

dylanmckay commented May 3, 2017

It looks like the register allocator, upon seeing a live variable, deduces all of the allocatable registers (by looking at the required register class) and attempts to select one (RegAllocGreedy.cpp:2525).

If it can not find one, it attempts to evict a register, but it is given the exact same list of allocatable registers as before, only containing the registers from the original register class (RegAllocGreedy.cpp:2551).

I think it'll be doable to teach it to do this I was mixing up splitting and eviction.

I was mixing up eviction and splitting

@brainlag
Copy link

As far as I understand the problem occurs when LDDWRdPtrQ uses the ptrdispregs register class as target register like below in line 48B. This should work, but the allocator can't deal with this for some reason. So from my testing, it seams like (and I might be totally wrong on this) the allocator reserves the Z register for the ICALL instruction and then the register class ptrdispregs only has 1 register left and we can't use Y for source and destination. Removing the Z register from DREGS fixes the problem but removing Y register does not. Coincidence? Maybe or ICALL does reserve the Z register.

40B	  %11:ptrdispregs = COPY %10:dregs
48B	  early-clobber %4:ptrdispregs = LDDWRdPtrQ %11:ptrdispregs, 6 :: (dereferenceable load 2 from %ir.4)
80B	  early-clobber %3:dregs = LDDWRdPtrQ %4:ptrdispregs, 6 :: (load 2 from %ir.7)
88B	  %12:ptrdispregs = COPY %10:dregs
96B	  early-clobber %5:dregs = LDDWRdPtrQ %12:ptrdispregs, 4 :: (dereferenceable load 2 from %ir.2)

Sure we could fix up the register allocator but we don't have the resources. We should just disallow machine instructions like 48B and live with the extra copy instruction here and there.

Comments?

@dylanmckay
Copy link
Member

the allocator reserves the Z register for the ICALL instruction

Correct

and then the register class ptrdispregs only has 1 register left and we can't use Y for source and destination. Removing the Z register from DREGS fixes the problem but removing Y register does not. Coincidence?

Explained very well - I will edit the description to include this

Perhaps the Y register is being used as a frame pointer? All functions which have spills or allocas will always reserve Y to store the frame pointer (code).

We should just disallow machine instructions like 48B and live with the extra copy instruction here and there.

That sounds like a reasonable tradeoff to me.

I'm not sure what you mean by "like 48B" - what properties distinguish 48B from other wide load instructions?

I wonder how the LDDWRdPtrQ instruction was inserted in this case.

@brainlag
Copy link

I'm not sure what you mean by "like 48B" - what properties distinguish 48B from other wide load instructions?

The target is ptrdispregs register class while at 80B and 96B it is dregs.

@brainlag
Copy link

brainlag commented Mar 20, 2018

Here is my workaround for this bug and #95:

diff --git a/lib/Target/AVR/AVRInstrInfo.td b/lib/Target/AVR/AVRInstrInfo.td
index a2129cc0e2e..601cfa8d861 100644
--- a/lib/Target/AVR/AVRInstrInfo.td
+++ b/lib/Target/AVR/AVRInstrInfo.td
@@ -1222,7 +1222,7 @@ isReMaterializable = 1 in
   // ldd Rd,   P+q
   // ldd Rd+1, P+q+1
   let Constraints = "@earlyclobber $dst" in
-  def LDDWRdPtrQ : Pseudo<(outs DREGS:$dst),
+  def LDDWRdPtrQ : Pseudo<(outs DREGS_WITHOUT_Z:$dst),
                           (ins memri:$memri),
                           "lddw\t$dst, $memri",
                           [(set i16:$dst, (load addr:$memri))]>,
diff --git a/lib/Target/AVR/AVRRegisterInfo.td b/lib/Target/AVR/AVRRegisterInfo.td
index 8162f12052b..c139ca6863d 100644
--- a/lib/Target/AVR/AVRRegisterInfo.td
+++ b/lib/Target/AVR/AVRRegisterInfo.td
@@ -157,6 +157,18 @@ def DREGS : RegisterClass<"AVR", [i16], 8,
     R9R8, R7R6, R5R4, R3R2, R1R0
   )>;
 
+// Main 16-bit pair register class.
+def DREGS_WITHOUT_Z : RegisterClass<"AVR", [i16], 8,
+  (
+    // Return value and arguments.
+    add R25R24, R19R18, R21R20, R23R22,
+    // Scratch registers.
+    R27R26,
+    // Callee saved registers.
+    R29R28, R17R16, R15R14, R13R12, R11R10,
+    R9R8, R7R6, R5R4, R3R2, R1R0
+  )>;
+
 // 16-bit register class for immediate instructions.
 def DLDREGS : RegisterClass<"AVR", [i16], 8,
   (
diff --git a/test/CodeGen/AVR/pseudo/LDDWRdPtrQ-same-src-dst.mir b/test/CodeGen/AVR/pseudo/LDDWRdPtrQ-same-src-dst.mir
index df69f5fffa5..72b20d39d68 100644
--- a/test/CodeGen/AVR/pseudo/LDDWRdPtrQ-same-src-dst.mir
+++ b/test/CodeGen/AVR/pseudo/LDDWRdPtrQ-same-src-dst.mir
@@ -25,11 +25,11 @@ body: |
 
     ; CHECK-LABEL: test_lddwrdptrq
 
-    ; CHECK:      ldd [[SCRATCH:r[0-9]+]], Z+10
+    ; CHECK:      ldd [[SCRATCH:r[0-9]+]], Y+10
     ; CHECK-NEXT: push [[SCRATCH]]
-    ; CHECK-NEXT: ldd [[SCRATCH]], Z+11
-    ; CHECK-NEXT: mov r31, [[SCRATCH]]
-    ; CHECK-NEXT: pop r30
+    ; CHECK-NEXT: ldd [[SCRATCH]], Y+11
+    ; CHECK-NEXT: mov r29, [[SCRATCH]]
+    ; CHECK-NEXT: pop r28
 
-    early-clobber $r31r30 = LDDWRdPtrQ undef $r31r30, 10
+    early-clobber $r29r28 = LDDWRdPtrQ undef $r29r28, 10
 ...
diff --git a/test/CodeGen/AVR/pseudo/LDDWRdPtrQ.mir b/test/CodeGen/AVR/pseudo/LDDWRdPtrQ.mir
index 59b3ce8b602..96d3809ed2d 100644
--- a/test/CodeGen/AVR/pseudo/LDDWRdPtrQ.mir
+++ b/test/CodeGen/AVR/pseudo/LDDWRdPtrQ.mir
@@ -18,8 +18,8 @@ body: |
 
     ; CHECK-LABEL: test_lddwrdptrq
 
-    ; CHECK:      ldd     r30, Y+10
-    ; CHECK-NEXT: ldd     r31, Y+11
+    ; CHECK:      ldd     r28, Z+10
+    ; CHECK-NEXT: ldd     r29, Z+11
 
-    early-clobber $r31r30 = LDDWRdPtrQ undef $r29r28, 10
+    early-clobber $r29r28 = LDDWRdPtrQ undef $r31r30, 10
 ...
diff --git a/test/CodeGen/AVR/rust-avr-bug-37.ll b/test/CodeGen/AVR/rust-avr-bug-37.ll
new file mode 100644
index 00000000000..9c269d3dab1
--- /dev/null
+++ b/test/CodeGen/AVR/rust-avr-bug-37.ll
@@ -0,0 +1,25 @@
+; RUN: llc < %s -march=avr | FileCheck %s
+
+%"fmt::Formatter" = type { i32, { i8*, void (i8*)** } }
+
+@str.1b = external constant [0 x i8]
+
+define void @"TryFromIntError::Debug"(%"fmt::Formatter"* dereferenceable(32)) unnamed_addr #0 personality i32 (...)* @rust_eh_personality {
+; CHECK-LABEL: "TryFromIntError::Debug"
+start:
+  %builder = alloca i8, align 8
+  %1 = getelementptr inbounds %"fmt::Formatter", %"fmt::Formatter"* %0, i16 0, i32 1
+  %2 = bitcast { i8*, void (i8*)** }* %1 to {}**
+  %3 = load {}*, {}** %2, align 2
+  %4 = getelementptr inbounds %"fmt::Formatter", %"fmt::Formatter"* %0, i16 0, i32 1, i32 1
+  %5 = load void (i8*)**, void (i8*)*** %4, align 2
+  %6 = getelementptr inbounds void (i8*)*, void (i8*)** %5, i16 3
+  %7 = bitcast void (i8*)** %6 to i8 ({}*, i8*, i16)**
+  %8 = load i8 ({}*, i8*, i16)*, i8 ({}*, i8*, i16)** %7, align 2
+  %9 = tail call i8 %8({}* nonnull %3, i8* noalias nonnull readonly getelementptr inbounds ([0 x i8], [0 x i8]* @str.1b, i16 0, i16 0), i16 15)
+  unreachable
+}
+
+declare i32 @rust_eh_personality(...) unnamed_addr
+
+attributes #0 = { uwtable }
\ No newline at end of file
diff --git a/test/CodeGen/AVR/rust-avr-bug-95.ll b/test/CodeGen/AVR/rust-avr-bug-95.ll
new file mode 100644
index 00000000000..9534ceb26e7
--- /dev/null
+++ b/test/CodeGen/AVR/rust-avr-bug-95.ll
@@ -0,0 +1,37 @@
+; RUN: llc < %s -march=avr | FileCheck %s
+
+%"fmt::Formatter.1.77.153.229.305.381.1673" = type { [0 x i8], i32, [0 x i8], i32, [0 x i8], i8, [0 x i8], %"option::Option<usize>.0.76.152.228.304.380.1672", [0 x i8], %"option::Option<usize>.0.76.152.228.304.380.1672", [0 x i8], { {}*, {}* }, [0 x i8], { i8*, i8* }, [0 x i8], { [0 x { i8*, i8* }]*, i16 }, [0 x i8] }
+%"option::Option<usize>.0.76.152.228.304.380.1672" = type { [0 x i8], i8, [2 x i8] }
+
+@str.4S = external constant [5 x i8]
+
+; Function Attrs: uwtable
+define void @"_ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E"(%"fmt::Formatter.1.77.153.229.305.381.1673"* dereferenceable(27) %__arg_0) unnamed_addr #0 personality i32 (...)* @rust_eh_personality {
+; CHECK-LABEL: "_ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E"
+start:
+  %0 = getelementptr inbounds %"fmt::Formatter.1.77.153.229.305.381.1673", %"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0, i16 0, i32 11, i32 0
+  %1 = load {}*, {}** %0, align 1, !noalias !0, !nonnull !9
+  %2 = getelementptr inbounds %"fmt::Formatter.1.77.153.229.305.381.1673", %"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0, i16 0, i32 11, i32 1
+  %3 = bitcast {}** %2 to i1 ({}*, [0 x i8]*, i16)***
+  %4 = load i1 ({}*, [0 x i8]*, i16)**, i1 ({}*, [0 x i8]*, i16)*** %3, align 1, !noalias !0, !nonnull !9
+  %5 = getelementptr inbounds i1 ({}*, [0 x i8]*, i16)*, i1 ({}*, [0 x i8]*, i16)** %4, i16 3
+  %6 = load i1 ({}*, [0 x i8]*, i16)*, i1 ({}*, [0 x i8]*, i16)** %5, align 1, !invariant.load !9, !noalias !0, !nonnull !9
+  %7 = tail call zeroext i1 %6({}* nonnull %1, [0 x i8]* noalias nonnull readonly bitcast ([5 x i8]* @str.4S to [0 x i8]*), i16 5), !noalias !10
+  unreachable
+}
+
+declare i32 @rust_eh_personality(...) unnamed_addr
+
+attributes #0 = { uwtable }
+
+!0 = !{!1, !3, !5, !6, !8}
+!1 = distinct !{!1, !2, !"_ZN3lib3fmt9Formatter9write_str17ha1a9656fc66ccbe5E: %data.0"}
+!2 = distinct !{!2, !"_ZN3lib3fmt9Formatter9write_str17ha1a9656fc66ccbe5E"}
+!3 = distinct !{!3, !4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E: argument 0"}
+!4 = distinct !{!4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E"}
+!5 = distinct !{!5, !4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E: %name.0"}
+!6 = distinct !{!6, !7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E: argument 0"}
+!7 = distinct !{!7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E"}
+!8 = distinct !{!8, !7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E: %name.0"}
+!9 = !{}
+!10 = !{!3, !6}
\ No newline at end of file

shepmaster pushed a commit to avr-rust/libcore that referenced this issue Apr 25, 2018
shepmaster pushed a commit to avr-rust/libcore that referenced this issue Apr 25, 2018
@leo60228
Copy link

Is there a reason the above couldn't be merged?

@dylanmckay
Copy link
Member

I'm happy with merging it, so long as we file an AVR backend bug at https://bugs.llvm.org/

@dylanmckay
Copy link
Member

I've written a bug report here. I should get around to merging this tomorrow.

@leo60228
Copy link

Should I convert this to a PR? What branch should I do it on? I've only compiled master, but I don't have time to test anyway (compiling is so slow...).

@TimNN
Copy link

TimNN commented Oct 15, 2018

Friendly ping :) @brainlag's patch something that could be submitted upstream or something to be kept around in avr-rust?

Note that after applying this patch to fix one assertion (and already having some others applied for different issues) I'm getting the following assertion when compiling stock-libcore (Edit: when compiling with opt-level > 1):

rustc: /home/logic/avr/src/src/llvm/lib/CodeGen/MachineInstr.cpp:233: void llvm::MachineInstr::addOperand(llvm::MachineFunction&, const llvm::MachineOperand&): Assertion `(isImpReg || Op.isRegMask() || MCID->isVariadic() || OpNo < MCID->getNumOperands() || isMetaDataOp) && "Trying to add an operand to a machine instr that is already done!"' failed.

Could that be related to the proposed patch?

@TimNN
Copy link

TimNN commented Oct 15, 2018

I don't think this issue is related since after minimization it reproduces with an unpatched llc. (Sorry for the noise).

@dylanmckay
Copy link
Member

dylanmckay commented Nov 5, 2018

Upstreamed in r346114.

Great work @brainlag, you even added tests

Also raised LLVM bug PR39553 to track the workaround and a proper fix.

earl pushed a commit to earl/llvm-mirror that referenced this issue Nov 5, 2018
This is an AVR-specific workaround for a limitation of the register
allocator that only exposes itself on targets with high register
contention like AVR, which only has three pointer registers.

The three pointer registers are X, Y, and Z.
In most nontrivial functions, Y is reserved for the frame pointer,
as per the calling convention. This leaves X and Z. Some instructions,
such as LPM ("load program memory"), are only defined for the Z
register. Sometimes this just leaves X.

When the backend generates a LDDWRdPtrQ instruction with Z as the
destination pointer, it usually trips up the register allocator
with this error message:

  LLVM ERROR: ran out of registers during register allocation

This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction
from ever using the Z register as an operand. This gives the
register allocator a bit more space to allocate, fixing the
regalloc exhaustion error.

Here is a description from the patch author Peter Nimmervoll

  As far as I understand the problem occurs when LDDWRdPtrQ uses
  the ptrdispregs register class as target register. This should work, but
  the allocator can't deal with this for some reason. So from my testing,
  it seams like (and I might be totally wrong on this) the allocator reserves
  the Z register for the ICALL instruction and then the register class
  ptrdispregs only has 1 register left and we can't use Y for source and
  destination. Removing the Z register from DREGS fixes the problem but
  removing Y register does not.

More information about the bug can be found on the avr-rust issue
tracker at avr-rust/rust-legacy-fork#37.

A bug has raised to track the removal of this workaround and a proper
fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553.

Patch by Peter Nimmervoll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346114 91177308-0d34-0410-b5e6-96231b3b80d8
dylanmckay added a commit to dylanmckay/llvm that referenced this issue Nov 5, 2018
This is an AVR-specific workaround for a limitation of the register
allocator that only exposes itself on targets with high register
contention like AVR, which only has three pointer registers.

The three pointer registers are X, Y, and Z.
In most nontrivial functions, Y is reserved for the frame pointer,
as per the calling convention. This leaves X and Z. Some instructions,
such as LPM ("load program memory"), are only defined for the Z
register. Sometimes this just leaves X.

When the backend generates a LDDWRdPtrQ instruction with Z as the
destination pointer, it usually trips up the register allocator
with this error message:

  LLVM ERROR: ran out of registers during register allocation

This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction
from ever using the Z register as an operand. This gives the
register allocator a bit more space to allocate, fixing the
regalloc exhaustion error.

Here is a description from the patch author Peter Nimmervoll

  As far as I understand the problem occurs when LDDWRdPtrQ uses
  the ptrdispregs register class as target register. This should work, but
  the allocator can't deal with this for some reason. So from my testing,
  it seams like (and I might be totally wrong on this) the allocator reserves
  the Z register for the ICALL instruction and then the register class
  ptrdispregs only has 1 register left and we can't use Y for source and
  destination. Removing the Z register from DREGS fixes the problem but
  removing Y register does not.

More information about the bug can be found on the avr-rust issue
tracker at avr-rust/rust-legacy-fork#37.

A bug has raised to track the removal of this workaround and a proper
fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553.

Patch by Peter Nimmervoll
dylanmckay pushed a commit to dylanmckay/llvm that referenced this issue Nov 5, 2018
This is an AVR-specific workaround for a limitation of the register
allocator that only exposes itself on targets with high register
contention like AVR, which only has three pointer registers.

The three pointer registers are X, Y, and Z.
In most nontrivial functions, Y is reserved for the frame pointer,
as per the calling convention. This leaves X and Z. Some instructions,
such as LPM ("load program memory"), are only defined for the Z
register. Sometimes this just leaves X.

When the backend generates a LDDWRdPtrQ instruction with Z as the
destination pointer, it usually trips up the register allocator
with this error message:

  LLVM ERROR: ran out of registers during register allocation

This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction
from ever using the Z register as an operand. This gives the
register allocator a bit more space to allocate, fixing the
regalloc exhaustion error.

Here is a description from the patch author Peter Nimmervoll

  As far as I understand the problem occurs when LDDWRdPtrQ uses
  the ptrdispregs register class as target register. This should work, but
  the allocator can't deal with this for some reason. So from my testing,
  it seams like (and I might be totally wrong on this) the allocator reserves
  the Z register for the ICALL instruction and then the register class
  ptrdispregs only has 1 register left and we can't use Y for source and
  destination. Removing the Z register from DREGS fixes the problem but
  removing Y register does not.

More information about the bug can be found on the avr-rust issue
tracker at avr-rust/rust-legacy-fork#37.

A bug has raised to track the removal of this workaround and a proper
fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553.

Patch by Peter Nimmervoll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346114 91177308-0d34-0410-b5e6-96231b3b80d8
@shepmaster shepmaster added A-llvm Affects the LLVM AVR backend has-llvm-commit This issue should be fixed in upstream LLVM labels Nov 5, 2018
llvm-git-migration pushed a commit to llvm-git-prototype/llvm that referenced this issue Nov 6, 2018
This is an AVR-specific workaround for a limitation of the register
allocator that only exposes itself on targets with high register
contention like AVR, which only has three pointer registers.

The three pointer registers are X, Y, and Z.
In most nontrivial functions, Y is reserved for the frame pointer,
as per the calling convention. This leaves X and Z. Some instructions,
such as LPM ("load program memory"), are only defined for the Z
register. Sometimes this just leaves X.

When the backend generates a LDDWRdPtrQ instruction with Z as the
destination pointer, it usually trips up the register allocator
with this error message:

  LLVM ERROR: ran out of registers during register allocation

This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction
from ever using the Z register as an operand. This gives the
register allocator a bit more space to allocate, fixing the
regalloc exhaustion error.

Here is a description from the patch author Peter Nimmervoll

  As far as I understand the problem occurs when LDDWRdPtrQ uses
  the ptrdispregs register class as target register. This should work, but
  the allocator can't deal with this for some reason. So from my testing,
  it seams like (and I might be totally wrong on this) the allocator reserves
  the Z register for the ICALL instruction and then the register class
  ptrdispregs only has 1 register left and we can't use Y for source and
  destination. Removing the Z register from DREGS fixes the problem but
  removing Y register does not.

More information about the bug can be found on the avr-rust issue
tracker at avr-rust/rust-legacy-fork#37.

A bug has raised to track the removal of this workaround and a proper
fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553.

Patch by Peter Nimmervoll

llvm-svn=346114
shepmaster pushed a commit to avr-rust/llvm that referenced this issue Nov 21, 2018
@dylanmckay
Copy link
Member

The root cause has been fixed in upstream LLVM.

Raise #126 to track removal of the hack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-libcore Affects compiling the core library A-llvm Affects the LLVM AVR backend has-llvm-commit This issue should be fixed in upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem
Projects
None yet
Development

No branches or pull requests

7 participants