-
Notifications
You must be signed in to change notification settings - Fork 14k
[RISCV] Factor out common SiFive7 scheduling model into an abstraction layer #144442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>
@llvm/pr-subscribers-backend-risc-v Author: Min-Yih Hsu (mshockwave) ChangesIn preparation for sifive-x390's scheduling model, which shares quite a lot with the existing SiFive7 scheduling model, this patch factors out some of the components that will share between them. Notably:
Split out from #143938 Patch is 157.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144442.diff 31 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
index c1d7cd4a716e7..071b64571fe3c 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
@@ -186,1121 +186,1188 @@ class SiFive7AnyToGPRBypass<SchedRead read, int cycles = 2>
WriteIRem, WriteIRem32,
WriteLDB, WriteLDH, WriteLDW, WriteLDD]>;
-// SiFive7 machine model for scheduling and other instruction cost heuristics.
-def SiFive7Model : SchedMachineModel {
- let MicroOpBufferSize = 0; // Explicitly set to zero since SiFive7 is in-order.
- let IssueWidth = 2; // 2 micro-ops are dispatched per cycle.
- let LoadLatency = 3;
- let MispredictPenalty = 3;
- let CompleteModel = 0;
- let EnableIntervals = true;
- let UnsupportedFeatures = [HasStdExtZbkb, HasStdExtZbkc, HasStdExtZbkx,
- HasStdExtZcmt, HasStdExtZknd, HasStdExtZkne,
- HasStdExtZknh, HasStdExtZksed, HasStdExtZksh,
- HasStdExtZkr];
-}
-
// The SiFive7 microarchitecture has three pipelines: A, B, V.
// Pipe A can handle memory, integer alu and vector operations.
// Pipe B can handle integer alu, control flow, integer multiply and divide,
// and floating point computation.
// The V pipeline is modeled by the VCQ, VA, VL, and VS resources.
-let SchedModel = SiFive7Model in {
-let BufferSize = 0 in {
-def SiFive7PipeA : ProcResource<1>;
-def SiFive7PipeB : ProcResource<1>;
-def SiFive7IDiv : ProcResource<1>; // Int Division
-def SiFive7FDiv : ProcResource<1>; // FP Division/Sqrt
-def SiFive7VA : ProcResource<1>; // Arithmetic sequencer
-def SiFive7VL : ProcResource<1>; // Load sequencer
-def SiFive7VS : ProcResource<1>; // Store sequencer
-// The VCQ accepts instructions from the the A Pipe and holds them until the
-// vector unit is ready to dequeue them. The unit dequeues up to one instruction
-// per cycle, in order, as soon as the sequencer for that type of instruction is
-// available. This resource is meant to be used for 1 cycle by all vector
-// instructions, to model that only one vector instruction may be dequeued at a
-// time. The actual dequeueing into the sequencer is modeled by the VA, VL, and
-// VS sequencer resources below. Each of them will only accept a single
-// instruction at a time and remain busy for the number of cycles associated
-// with that instruction.
-def SiFive7VCQ : ProcResource<1>; // Vector Command Queue
-}
-
-def SiFive7PipeAB : ProcResGroup<[SiFive7PipeA, SiFive7PipeB]>;
-
-defvar SiFive7VLEN = 512;
-
-// Branching
-let Latency = 3 in {
-def : WriteRes<WriteJmp, [SiFive7PipeB]>;
-def : WriteRes<WriteJal, [SiFive7PipeB]>;
-def : WriteRes<WriteJalr, [SiFive7PipeB]>;
-}
-
-//Short forward branch
-def : WriteRes<WriteSFB, [SiFive7PipeA, SiFive7PipeB]> {
- let Latency = 3;
- let NumMicroOps = 2;
-}
-
-// Integer arithmetic and logic
-let Latency = 3 in {
-def : WriteRes<WriteIALU, [SiFive7PipeAB]>;
-def : WriteRes<WriteIALU32, [SiFive7PipeAB]>;
-def : WriteRes<WriteShiftImm, [SiFive7PipeAB]>;
-def : WriteRes<WriteShiftImm32, [SiFive7PipeAB]>;
-def : WriteRes<WriteShiftReg, [SiFive7PipeAB]>;
-def : WriteRes<WriteShiftReg32, [SiFive7PipeAB]>;
-}
+multiclass SiFive7ProcResources {
+ let BufferSize = 0 in {
+ def PipeA : ProcResource<1>;
+ def PipeB : ProcResource<1>;
+
+ def IDiv : ProcResource<1>; // Int Division
+ def FDiv : ProcResource<1>; // FP Division/Sqrt
+
+ def VA : ProcResource<1>; // Arithmetic sequencer
+
+ def VL : ProcResource<1>; // Load sequencer
+ def VS : ProcResource<1>; // Store sequencer
+ // The VCQ accepts instructions from the the A Pipe and holds them until the
+ // vector unit is ready to dequeue them. The unit dequeues up to one instruction
+ // per cycle, in order, as soon as the sequencer for that type of instruction is
+ // available. This resource is meant to be used for 1 cycle by all vector
+ // instructions, to model that only one vector instruction may be dequeued at a
+ // time. The actual dequeueing into the sequencer is modeled by the VA, VL, and
+ // VS sequencer resources below. Each of them will only accept a single
+ // instruction at a time and remain busy for the number of cycles associated
+ // with that instruction.
+ def VCQ : ProcResource<1>; // Vector Command Queue
+ }
-// Integer multiplication
-let Latency = 3 in {
-def : WriteRes<WriteIMul, [SiFive7PipeB]>;
-def : WriteRes<WriteIMul32, [SiFive7PipeB]>;
+ def PipeAB : ProcResGroup<[!cast<ProcResource>(NAME#"PipeA"),
+ !cast<ProcResource>(NAME#"PipeB")]>;
}
-// Integer division
-def : WriteRes<WriteIDiv, [SiFive7PipeB, SiFive7IDiv]> {
- let Latency = 66;
- let ReleaseAtCycles = [1, 65];
-}
-def : WriteRes<WriteIDiv32, [SiFive7PipeB, SiFive7IDiv]> {
- let Latency = 34;
- let ReleaseAtCycles = [1, 33];
-}
+multiclass SiFive7WriteResBase<int VLEN,
+ ProcResourceKind PipeA, ProcResourceKind PipeB, ProcResourceKind PipeAB,
+ ProcResourceKind IDiv, ProcResourceKind FDiv,
+ ProcResourceKind VA, ProcResourceKind VL, ProcResourceKind VS,
+ ProcResourceKind VCQ> {
-// Integer remainder
-def : WriteRes<WriteIRem, [SiFive7PipeB, SiFive7IDiv]> {
- let Latency = 66;
- let ReleaseAtCycles = [1, 65];
-}
-def : WriteRes<WriteIRem32, [SiFive7PipeB, SiFive7IDiv]> {
- let Latency = 34;
- let ReleaseAtCycles = [1, 33];
-}
+ // Branching
+ let Latency = 3 in {
+ def : WriteRes<WriteJmp, [PipeB]>;
+ def : WriteRes<WriteJal, [PipeB]>;
+ def : WriteRes<WriteJalr, [PipeB]>;
+ }
-// Bitmanip
-let Latency = 3 in {
-// Rotates are in the late-B ALU.
-def : WriteRes<WriteRotateImm, [SiFive7PipeB]>;
-def : WriteRes<WriteRotateImm32, [SiFive7PipeB]>;
-def : WriteRes<WriteRotateReg, [SiFive7PipeB]>;
-def : WriteRes<WriteRotateReg32, [SiFive7PipeB]>;
+ //Short forward branch
+ def : WriteRes<WriteSFB, [PipeA, PipeB]> {
+ let Latency = 3;
+ let NumMicroOps = 2;
+ }
-// clz[w]/ctz[w] are in the late-B ALU.
-def : WriteRes<WriteCLZ, [SiFive7PipeB]>;
-def : WriteRes<WriteCLZ32, [SiFive7PipeB]>;
-def : WriteRes<WriteCTZ, [SiFive7PipeB]>;
-def : WriteRes<WriteCTZ32, [SiFive7PipeB]>;
+ // Integer arithmetic and logic
+ let Latency = 3 in {
+ def : WriteRes<WriteIALU, [PipeAB]>;
+ def : WriteRes<WriteIALU32, [PipeAB]>;
+ def : WriteRes<WriteShiftImm, [PipeAB]>;
+ def : WriteRes<WriteShiftImm32, [PipeAB]>;
+ def : WriteRes<WriteShiftReg, [PipeAB]>;
+ def : WriteRes<WriteShiftReg32, [PipeAB]>;
+ }
-// cpop[w] look exactly like multiply.
-def : WriteRes<WriteCPOP, [SiFive7PipeB]>;
-def : WriteRes<WriteCPOP32, [SiFive7PipeB]>;
+ // Integer multiplication
+ let Latency = 3 in {
+ def : WriteRes<WriteIMul, [PipeB]>;
+ def : WriteRes<WriteIMul32, [PipeB]>;
+ }
-// orc.b is in the late-B ALU.
-def : WriteRes<WriteORCB, [SiFive7PipeB]>;
+ // Integer division
+ def : WriteRes<WriteIDiv, [PipeB, IDiv]> {
+ let Latency = 66;
+ let ReleaseAtCycles = [1, 65];
+ }
+ def : WriteRes<WriteIDiv32, [PipeB, IDiv]> {
+ let Latency = 34;
+ let ReleaseAtCycles = [1, 33];
+ }
-// min/max are in the late-B ALU
-def : WriteRes<WriteIMinMax, [SiFive7PipeB]>;
+ // Integer remainder
+ def : WriteRes<WriteIRem, [PipeB, IDiv]> {
+ let Latency = 66;
+ let ReleaseAtCycles = [1, 65];
+ }
+ def : WriteRes<WriteIRem32, [PipeB, IDiv]> {
+ let Latency = 34;
+ let ReleaseAtCycles = [1, 33];
+ }
-// rev8 is in the late-A and late-B ALUs.
-def : WriteRes<WriteREV8, [SiFive7PipeAB]>;
+ // Bitmanip
+ let Latency = 3 in {
+ // Rotates are in the late-B ALU.
+ def : WriteRes<WriteRotateImm, [PipeB]>;
+ def : WriteRes<WriteRotateImm32, [PipeB]>;
+ def : WriteRes<WriteRotateReg, [PipeB]>;
+ def : WriteRes<WriteRotateReg32, [PipeB]>;
-// shNadd[.uw] is on the early-B and late-B ALUs.
-def : WriteRes<WriteSHXADD, [SiFive7PipeB]>;
-def : WriteRes<WriteSHXADD32, [SiFive7PipeB]>;
-}
+ // clz[w]/ctz[w] are in the late-B ALU.
+ def : WriteRes<WriteCLZ, [PipeB]>;
+ def : WriteRes<WriteCLZ32, [PipeB]>;
+ def : WriteRes<WriteCTZ, [PipeB]>;
+ def : WriteRes<WriteCTZ32, [PipeB]>;
-// Single-bit instructions
-// BEXT[I] instruction is available on all ALUs and the other instructions
-// are only available on the SiFive7B pipe.
-let Latency = 3 in {
-def : WriteRes<WriteSingleBit, [SiFive7PipeB]>;
-def : WriteRes<WriteSingleBitImm, [SiFive7PipeB]>;
-def : WriteRes<WriteBEXT, [SiFive7PipeAB]>;
-def : WriteRes<WriteBEXTI, [SiFive7PipeAB]>;
-}
+ // cpop[w] look exactly like multiply.
+ def : WriteRes<WriteCPOP, [PipeB]>;
+ def : WriteRes<WriteCPOP32, [PipeB]>;
-// Memory
-def : WriteRes<WriteSTB, [SiFive7PipeA]>;
-def : WriteRes<WriteSTH, [SiFive7PipeA]>;
-def : WriteRes<WriteSTW, [SiFive7PipeA]>;
-def : WriteRes<WriteSTD, [SiFive7PipeA]>;
-def : WriteRes<WriteFST16, [SiFive7PipeA]>;
-def : WriteRes<WriteFST32, [SiFive7PipeA]>;
-def : WriteRes<WriteFST64, [SiFive7PipeA]>;
-
-let Latency = 3 in {
-def : WriteRes<WriteLDB, [SiFive7PipeA]>;
-def : WriteRes<WriteLDH, [SiFive7PipeA]>;
-def : WriteRes<WriteLDW, [SiFive7PipeA]>;
-def : WriteRes<WriteLDD, [SiFive7PipeA]>;
-}
+ // orc.b is in the late-B ALU.
+ def : WriteRes<WriteORCB, [PipeB]>;
-let Latency = 2 in {
-def : WriteRes<WriteFLD16, [SiFive7PipeA]>;
-def : WriteRes<WriteFLD32, [SiFive7PipeA]>;
-def : WriteRes<WriteFLD64, [SiFive7PipeA]>;
-}
+ // min/max are in the late-B ALU
+ def : WriteRes<WriteIMinMax, [PipeB]>;
-// Atomic memory
-def : WriteRes<WriteAtomicSTW, [SiFive7PipeA]>;
-def : WriteRes<WriteAtomicSTD, [SiFive7PipeA]>;
+ // rev8 is in the late-A and late-B ALUs.
+ def : WriteRes<WriteREV8, [PipeAB]>;
-let Latency = 3 in {
-def : WriteRes<WriteAtomicW, [SiFive7PipeA]>;
-def : WriteRes<WriteAtomicD, [SiFive7PipeA]>;
-def : WriteRes<WriteAtomicLDW, [SiFive7PipeA]>;
-def : WriteRes<WriteAtomicLDD, [SiFive7PipeA]>;
-}
+ // shNadd[.uw] is on the early-B and late-B ALUs.
+ def : WriteRes<WriteSHXADD, [PipeB]>;
+ def : WriteRes<WriteSHXADD32, [PipeB]>;
+ }
-// Half precision.
-let Latency = 5 in {
-def : WriteRes<WriteFAdd16, [SiFive7PipeB]>;
-def : WriteRes<WriteFMul16, [SiFive7PipeB]>;
-def : WriteRes<WriteFMA16, [SiFive7PipeB]>;
-}
-let Latency = 3 in {
-def : WriteRes<WriteFSGNJ16, [SiFive7PipeB]>;
-def : WriteRes<WriteFMinMax16, [SiFive7PipeB]>;
-}
+ // Single-bit instructions
+ // BEXT[I] instruction is available on all ALUs and the other instructions
+ // are only available on the B pipe.
+ let Latency = 3 in {
+ def : WriteRes<WriteSingleBit, [PipeB]>;
+ def : WriteRes<WriteSingleBitImm, [PipeB]>;
+ def : WriteRes<WriteBEXT, [PipeAB]>;
+ def : WriteRes<WriteBEXTI, [PipeAB]>;
+ }
-let Latency = 14, ReleaseAtCycles = [1, 13] in {
-def : WriteRes<WriteFDiv16, [SiFive7PipeB, SiFive7FDiv]>;
-def : WriteRes<WriteFSqrt16, [SiFive7PipeB, SiFive7FDiv]>;
-}
+ // Memory
+ def : WriteRes<WriteSTB, [PipeA]>;
+ def : WriteRes<WriteSTH, [PipeA]>;
+ def : WriteRes<WriteSTW, [PipeA]>;
+ def : WriteRes<WriteSTD, [PipeA]>;
+ def : WriteRes<WriteFST16, [PipeA]>;
+ def : WriteRes<WriteFST32, [PipeA]>;
+ def : WriteRes<WriteFST64, [PipeA]>;
+
+ let Latency = 3 in {
+ def : WriteRes<WriteLDB, [PipeA]>;
+ def : WriteRes<WriteLDH, [PipeA]>;
+ def : WriteRes<WriteLDW, [PipeA]>;
+ def : WriteRes<WriteLDD, [PipeA]>;
+ }
-// Single precision.
-let Latency = 5 in {
-def : WriteRes<WriteFAdd32, [SiFive7PipeB]>;
-def : WriteRes<WriteFMul32, [SiFive7PipeB]>;
-def : WriteRes<WriteFMA32, [SiFive7PipeB]>;
-}
-let Latency = 3 in {
-def : WriteRes<WriteFSGNJ32, [SiFive7PipeB]>;
-def : WriteRes<WriteFMinMax32, [SiFive7PipeB]>;
-}
+ let Latency = 2 in {
+ def : WriteRes<WriteFLD16, [PipeA]>;
+ def : WriteRes<WriteFLD32, [PipeA]>;
+ def : WriteRes<WriteFLD64, [PipeA]>;
+ }
-def : WriteRes<WriteFDiv32, [SiFive7PipeB, SiFive7FDiv]> { let Latency = 27;
- let ReleaseAtCycles = [1, 26]; }
-def : WriteRes<WriteFSqrt32, [SiFive7PipeB, SiFive7FDiv]> { let Latency = 27;
- let ReleaseAtCycles = [1, 26]; }
+ // Atomic memory
+ def : WriteRes<WriteAtomicSTW, [PipeA]>;
+ def : WriteRes<WriteAtomicSTD, [PipeA]>;
-// Double precision
-let Latency = 7 in {
-def : WriteRes<WriteFAdd64, [SiFive7PipeB]>;
-def : WriteRes<WriteFMul64, [SiFive7PipeB]>;
-def : WriteRes<WriteFMA64, [SiFive7PipeB]>;
-}
-let Latency = 3 in {
-def : WriteRes<WriteFSGNJ64, [SiFive7PipeB]>;
-def : WriteRes<WriteFMinMax64, [SiFive7PipeB]>;
-}
+ let Latency = 3 in {
+ def : WriteRes<WriteAtomicW, [PipeA]>;
+ def : WriteRes<WriteAtomicD, [PipeA]>;
+ def : WriteRes<WriteAtomicLDW, [PipeA]>;
+ def : WriteRes<WriteAtomicLDD, [PipeA]>;
+ }
-def : WriteRes<WriteFDiv64, [SiFive7PipeB, SiFive7FDiv]> { let Latency = 56;
- let ReleaseAtCycles = [1, 55]; }
-def : WriteRes<WriteFSqrt64, [SiFive7PipeB, SiFive7FDiv]> { let Latency = 56;
- let ReleaseAtCycles = [1, 55]; }
-
-// Conversions
-let Latency = 3 in {
-def : WriteRes<WriteFCvtI32ToF16, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtI32ToF32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtI32ToF64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtI64ToF16, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtI64ToF32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtI64ToF64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF16ToI32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF16ToI64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF16ToF32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF16ToF64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF32ToI32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF32ToI64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF32ToF16, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF32ToF64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF64ToI32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF64ToI64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF64ToF16, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF64ToF32, [SiFive7PipeB]>;
-
-def : WriteRes<WriteFClass16, [SiFive7PipeB]>;
-def : WriteRes<WriteFClass32, [SiFive7PipeB]>;
-def : WriteRes<WriteFClass64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCmp16, [SiFive7PipeB]>;
-def : WriteRes<WriteFCmp32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCmp64, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovI16ToF16, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovF16ToI16, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovI32ToF32, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovF32ToI32, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovI64ToF64, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovF64ToI64, [SiFive7PipeB]>;
-}
+ // Half precision.
+ let Latency = 5 in {
+ def : WriteRes<WriteFAdd16, [PipeB]>;
+ def : WriteRes<WriteFMul16, [PipeB]>;
+ def : WriteRes<WriteFMA16, [PipeB]>;
+ }
+ let Latency = 3 in {
+ def : WriteRes<WriteFSGNJ16, [PipeB]>;
+ def : WriteRes<WriteFMinMax16, [PipeB]>;
+ }
-// 6. Configuration-Setting Instructions
-let Latency = 3 in {
-def : WriteRes<WriteVSETVLI, [SiFive7PipeA]>;
-def : WriteRes<WriteVSETIVLI, [SiFive7PipeA]>;
-def : WriteRes<WriteVSETVL, [SiFive7PipeA]>;
-}
+ let Latency = 14, ReleaseAtCycles = [1, 13] in {
+ def : WriteRes<WriteFDiv16, [PipeB, FDiv]>;
+ def : WriteRes<WriteFSqrt16, [PipeB, FDiv]>;
+ }
-// 7. Vector Loads and Stores
-// Unit-stride loads and stores can operate at the full bandwidth of the memory
-// pipe. The memory pipe is DLEN bits wide on x280.
-foreach mx = SchedMxList in {
- defvar Cycles = SiFive7GetCyclesDefault<mx>.c;
- defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
- let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
- defm "" : LMULWriteResMX<"WriteVLDE", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
- defm "" : LMULWriteResMX<"WriteVLDFF", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+ // Single precision.
+ let Latency = 5 in {
+ def : WriteRes<WriteFAdd32, [PipeB]>;
+ def : WriteRes<WriteFMul32, [PipeB]>;
+ def : WriteRes<WriteFMA32, [PipeB]>;
+ }
+ let Latency = 3 in {
+ def : WriteRes<WriteFSGNJ32, [PipeB]>;
+ def : WriteRes<WriteFMinMax32, [PipeB]>;
}
- let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
- defm "" : LMULWriteResMX<"WriteVSTE", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
-}
-foreach mx = SchedMxList in {
- defvar Cycles = SiFive7GetMaskLoadStoreCycles<mx>.c;
- defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
- let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
- defm "" : LMULWriteResMX<"WriteVLDM", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
- let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
- defm "" : LMULWriteResMX<"WriteVSTM", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
-}
+ def : WriteRes<WriteFDiv32, [PipeB, FDiv]> {
+ let Latency = 27;
+ let ReleaseAtCycles = [1, 26];
+ }
+ def : WriteRes<WriteFSqrt32, [PipeB, FDiv]> {
+ let Latency = 27;
+ let ReleaseAtCycles = [1, 26];
+ }
-// Strided loads and stores operate at one element per cycle and should be
-// scheduled accordingly. Indexed loads and stores operate at one element per
-// cycle, and they stall the machine until all addresses have been generated,
-// so they cannot be scheduled. Indexed and strided loads and stores have LMUL
-// specific suffixes, but since SEW is already encoded in the name of the
-// resource, we do not need to use LMULSEWXXX constructors. However, we do
-// use the SEW from the name to determine the number of Cycles.
-
-foreach mx = SchedMxList in {
- defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
- defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 8, SiFive7VLEN>.c;
- defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
- defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS8", VLDSX0Pred, [SiFive7VCQ, SiFive7VL],
- 4, [0, 1], [1, !add(1, VLDSX0Cycles)], !add(3, Cycles),
- [0, 1], [1, !add(1, Cycles)], mx, IsWorstCase>;
- let Latency = !add(3, Cycles), AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
- defm "" : LMULWriteResMX<"WriteVLDUX8", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
- defm "" : LMULWriteResMX<"WriteVLDOX8", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+ // Double precision
+ let Latency = 7 in {
+ def : WriteRes<WriteFAdd64, [PipeB]>;
+ def : WriteRes<WriteFMul64, [PipeB]>;
+ def : WriteRes<WriteFMA64, [PipeB]>;
}
- let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
- defm "" : LMULWriteResMX<"WriteVSTS8", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
- defm "" : LMULWriteResMX<"WriteVSTUX8", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
- defm "" : LMULWriteResMX<"WriteVSTOX8", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+ let Latency = 3 in {
+ def : WriteRes<WriteFSGNJ64, [PipeB]>;
+ def : WriteRes<WriteFMinMax64, [PipeB]>;
}
-}
-// TODO: The MxLists need to be filtered by EEW. We only need to support
-// LMUL >= SEW_min/ELEN. Here, the smallest EEW prevents us from having MF8
-// since LMUL >= 16/64.
-foreach mx = ["MF4", "MF2", "M1", "M2", "M4", "M8"] in {
- defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
- defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 16, SiFive7VLEN>.c;
- defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
- defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS16", VLDSX0Pred, [SiFive7VCQ, SiFive7VL],
- 4, [0, 1], [1, !add(1, VLDSX0Cycles)], !add(3, Cycles),
- [0, 1], [1, !add(1, Cycles)], mx, IsWorstCase>;
- let Latency = !add(3, Cycles), AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
- defm "" : LMULWriteResMX<"WriteVLDUX16", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
- defm "" : LMULWriteResMX<"WriteVLDOX16", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+
+ def : WriteRes<WriteFDiv64, [PipeB, FDiv]> {
+ let Latency = 56;
+ let ReleaseAtCycles = [1, 55];
}
- let Latency = 1, AcquireAtCycles ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still large but I tried to scan all the changes, more eyes are needed.
LGTM.
In preparation for sifive-x390's scheduling model, which shares quite a lot with the existing SiFive7 scheduling model, this patch factors out some of the components that will share between them. Notably:
SiFive7ProcResources
. Similarly, WriteRes entries and bypass entries (i.e. ReadAdvance) are also factored out into their own multiclass:SiFive7WriteResBase
andSiFive7ReadAdvance
, respectively.SiFive7ProcResources
,SiFive7WriteResBase
, andSiFive7ReadAdvance
are encapsulated into a bigger multiclass,SiFive7SchedResources
, which configures these components with parameters passed from the template arguments. An example configure value would be the VLEN.Split out from #143938