-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC] LLVM vectorization improvements #6533
[GSoC] LLVM vectorization improvements #6533
Conversation
test/llvm/vectorization/SKIPIF
Outdated
@@ -0,0 +1 @@ | |||
CHPL_LLVM!=llvm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file should end in a newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or something... never sure what the red characters in GitHub here at the end mean.
for i in vectorizeOnly(0..511) { | ||
A[A[i]] = B[i]; | ||
A[i] = B[i+1]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this case vectorize without addLoopVectorizationHint? With or without the run-time check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway I'd consider passing -vectorize-scev-check-threshold=0 -runtime-memory-check-threshold=0 LLVM options to disable the runtime check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for testing purposes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding loop vectorization (llvm.loop.vectorize.enable
) is not necessary to get anything vectorized, it only makes sense when we want to disable vectorization of particular loop. What is important is adding llvm.mem.parallel_loop_access
, because it greatly improves vectorization (we can remove runtime checks entirely and stuff gets vectorized anyway). I couldn't figure out working case by myself so I took this from llvm's test suite, llvm.mem.parallel_loop_access
is necessary to get this vectorized, because it's very tricky and not obvious that there is no loop carried dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, cool
I'd like to see this in two PRs:
|
@mppf |
compiler/util/clangUtil.cpp
Outdated
@@ -54,6 +54,8 @@ | |||
|
|||
#include "llvmDebug.h" | |||
|
|||
#include "llvm/Support/TargetRegistry.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must be missing something - why is this #include now necessary? You didn't change anything else in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh well, apparently I forgot to remove this include, it's not necessary for anything to work.
|
compiler/codegen/CForLoop.cpp
Outdated
#ifdef HAVE_LLVM | ||
namespace | ||
{ | ||
llvm::MDNode* addLoopVectorizationHint(llvm::Instruction* instruction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect this function to be static
rather than in an anonymous namespace. Arguably a style issue, but making it static
will fit in with the rest of the compiler. Vs when defining a type, you can't use static
.
compiler/codegen/CForLoop.cpp
Outdated
if(fNoVectorize == false && isOrderIndependent()) | ||
{ | ||
llvm::MDNode* llvmLoopMetadata = addLoopVectorizationHint(endLoopBranch); | ||
for(auto it = blockStmtBody->begin(); it != blockStmtBody->end(); it++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer ++it
to it++
for this kind of loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note with LLVM 5.0 (and not before) the recommendation changes to
for (Instruction &I : BB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code might only add parallel loop access metadata to simple loops.
What if we have
var sum = 0;
for i in vectorizeOnly(1..100) with (+ reduce sum) {
if i > 10 then
sum += i;
else
sum += 1;
}
? Won't this loop over blockStmtBody instructions not visit those instructions "inside" the conditional?
@@ -78,6 +91,8 @@ struct GenInfo { | |||
llvm::IRBuilder<> *builder; | |||
LayeredValueTable *lvt; | |||
|
|||
std::stack<LoopData> loopStack; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this stack only ever contains elements with parallel=true and since you're not extending other loop generation (while loops, other loops, ...), I think this stack should just store llvm::MDNode* for parallel loops and be called something like parallelLoopStack
.
Do you envision some case in which this stack will need to contain non-parallel loops?
@@ -389,6 +389,13 @@ llvm::StoreInst* codegenStoreLLVM(llvm::Value* val, | |||
llvm::MDNode* tbaa = NULL; | |||
if( USE_TBAA && valType ) tbaa = valType->symbol->llvmTbaaNode; | |||
if( tbaa ) ret->setMetadata(llvm::LLVMContext::MD_tbaa, tbaa); | |||
|
|||
if(!info->loopStack.empty()) { | |||
const auto &loopData = info->loopStack.top(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note in a comment here and in the PR message that you're only adding parallel_loop_access
hints for the innermost loops.
I'd like to see a future PR that adds the parallel loop access to function calls; see https://github.com/chapel-lang/chapel/blob/master/compiler/codegen/expr.cpp#L2222 |
I tested this, with this patch:
with --llvm --vectorize --fast and I saw 1 failure:
Please have a look at disabling the vectorize.enable by default (I think it's fine to leave in the code, but make it off by default somehow) and at getting llvmPrintIrFull to work in that configuration:
One way to do that would be to make llvmPrintIrFull.skipif containing:
Once you address these issues I think this is ready to merge. Thanks. |
Passed full local testing. |
Add vectorization improvement tests [PR by @coodie - thanks! Reviewed/tested by @mppf] This PR adds vectorization test testing changes related to #6533. Tests are done using FileCheck. It's best if this PR is merged after FileCheck is integrated into chapel's test suite because current solution adds PREDIFF script which runs FileCheck, and for every test empty .good file has to be added.
This branch introduces improvements to LLVM vectorization.
Improvements relate to adding "llvm.mem.parallel_loop_access" metadata to instructions inside parallel loops. Specifically we add the metadata to loads and stores in loops marked as order independent (vectorizeOnly for example). There are two ways of adding metadata to loops:
Implementation covers only 1).