-
Notifications
You must be signed in to change notification settings - Fork 283
Support defining how instructions consume delay slots #6868
Copy link
Copy link
Closed
Labels
Component: ArchitectureIssue needs changes to an architecture pluginIssue needs changes to an architecture pluginEffort: MediumIssues require < 1 month of workIssues require < 1 month of workImpact: MediumIssue is impactful with a bad, or no, workaroundIssue is impactful with a bad, or no, workaroundState: DuplicateIssue is a duplicate of another issueIssue is a duplicate of another issue
Milestone
Metadata
Metadata
Assignees
Labels
Component: ArchitectureIssue needs changes to an architecture pluginIssue needs changes to an architecture pluginEffort: MediumIssues require < 1 month of workIssues require < 1 month of workImpact: MediumIssue is impactful with a bad, or no, workaroundIssue is impactful with a bad, or no, workaroundState: DuplicateIssue is a duplicate of another issueIssue is a duplicate of another issue
Type
Fields
Give feedbackNo fields configured for Enhancement.
What is the feature you'd like to have?
The architecture API supports defining variable delay slots for branch instructions. After a branch instruction with branch delay, each decoded instruction consumes one delay slot. It would be great if an architecture plugin could define the number of delay slots an instruction consumes.
Is your feature request related to a problem?
For instruction with branch delay and instruction level parallelism, defining the branch delay without knowledge about future instructions fails. Instructions that are executed in parallel only might be encoded separately, but consume only a single delay slot. Similarly, a single instruction might consume multiple delay slots. Correct
Are any alternative solutions acceptable?
It should be possible to work around the missing feature. An architecture plugin either needs to look ahead or keep state for delayed branches. Adapting from suggestions mentioned in #866, one could lie about the maximum instruction size, increase it such that all instruction that might execute in a delay slot are included and calculate the actual branch delay for branch instructions. While this or similar solutions might work, the requested feature would make it much simpler and cleaner IMO.
Additional Information:
I am currently working on an architecture plugin for a TI DSP. It has a branch delay of 5. Up to 8 instructions can be executed in parallel. Each instruction encodes if it is executed in parallel with the next instruction. Additionally, there is a NOP instruction that can take multiple cycles to wait for instructions like branches to complete.
This feature would be useful for any architecture with branch delay and ILP.
Currently, the
InstructionInfo.delaySlotsfield seems to be used only for branch instructions. This field could be used to define how instructions consume delay slots. Then 0 could be used for parallel execution of instructions orn>1for instructions like my NOP.