Conversation
Contributor
|
#rerun tests |
FinnWilkinson
approved these changes
Sep 17, 2021
Contributor
FinnWilkinson
left a comment
There was a problem hiding this comment.
All looks good to me
Contributor
|
#rerun tests |
seunghun1ee
reviewed
Sep 30, 2021
Contributor
seunghun1ee
left a comment
There was a problem hiding this comment.
I found a bug that can be critical, so I commit the fix straight to this branch.
Please have a look
jj16791
approved these changes
Oct 1, 2021
Reversing the order of rewinding was done to rewind destination registers in correct history order. This change prevents register alias table leaving wrong mapping on rewinding. Ultimately, this fixes the issue where some operands get their values from incorrect register because of the wrong mapping.
If the pc_ was not aligned to blockSize boundary and the fetchBuffer_ was empty, the fetchData would not be copied but used directly as an optimization. However, if the fetchData was not enough to start decoding, the function would exit and the fetchData would be loss. To fix the bug, the optimization was removed and fetchData is always copied onto the fetchBuffer_. The optimization did not provide any performance improvement on the M1 Mac Mini.
The new ror implementation only works for type widths that are a power of 2. Instead of using arithmetic substraction, we are computing the modular inverse of amount (mod type_width). Using modular inverse instead of subtraction will not cause undefined behaviour when amount is 0. That is the only difference.
On decode, operand 0 of RET was set to LR. This was problematic as it always used LR even if an operand was given. To stop this, `InstructionMetadata.cc` now sets operand 0 of RET as LR only when `operandCount` is zero.
Closed
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When SimEng flushes an instruction that sets multiple destinations as same register, the rewinding of register renaming fails. This is because the order of applying rewinding by
historyTable_is in the wrong way.Ultimately, it keeps the wrong physical register (which was freed) in
mappingTable_. Therefore, the order of callingrewind()was reversed to have the correct order of updatingmappingTable_withhistoryTable_.In
FetchUnit::tick, If thepc_was not aligned to the blockSize boundary and thefetchBuffer_was empty, thefetchDatawould not be copied but used directly as an optimization. However, if thefetchDatawas not enough to start decoding right away, the function would exit and thefetchDatawould be lost. To fix the bug, the optimization was removed andfetchDatais always copied onto thefetchBuffer_. There was no observed performance difference.Additionally, the pointer to the buffer passed to
predecodeis not guaranteed to be aligned. This caused misalignment bugs asaarch64::predecodewas expecting it to be 4 byte aligned. A workaround proposed by @FinnWilkinson fixed the bug by copying the buffer into a local variable.The
RORimplementation was found to be buggy, hence a modified version using modular arithmetic was implemented.