[GSoC] LLVM vectorization improvements #6533

coodie · 2017-06-23T18:35:00Z

This branch introduces improvements to LLVM vectorization.

Improvements relate to adding "llvm.mem.parallel_loop_access" metadata to instructions inside parallel loops. Specifically we add the metadata to loads and stores in loops marked as order independent (vectorizeOnly for example). There are two ways of adding metadata to loops:

Metadata refers to the innermost loop instruction is in.
Metadata refers to the stack of nested loops that instruction is in.

Implementation covers only 1).

mppf · 2017-06-23T18:58:15Z

test/llvm/vectorization/SKIPIF

@@ -0,0 +1 @@
+CHPL_LLVM!=llvm


I think this file should end in a newline

Or something... never sure what the red characters in GitHub here at the end mean.

mppf · 2017-06-23T19:01:57Z

test/llvm/vectorization/parallel_loop.chpl

+  for i in vectorizeOnly(0..511) {
+      A[A[i]] = B[i];
+      A[i] = B[i+1];
+  }


Does this case vectorize without addLoopVectorizationHint? With or without the run-time check?

Anyway I'd consider passing -vectorize-scev-check-threshold=0 -runtime-memory-check-threshold=0 LLVM options to disable the runtime check.

(for testing purposes)

Adding loop vectorization (llvm.loop.vectorize.enable) is not necessary to get anything vectorized, it only makes sense when we want to disable vectorization of particular loop. What is important is adding llvm.mem.parallel_loop_access, because it greatly improves vectorization (we can remove runtime checks entirely and stuff gets vectorized anyway). I couldn't figure out working case by myself so I took this from llvm's test suite, llvm.mem.parallel_loop_access is necessary to get this vectorized, because it's very tricky and not obvious that there is no loop carried dependency.

mppf · 2017-06-23T19:02:44Z

I'd like to see this in two PRs:

PR adding the testing, but as .futures if they don't vectorize right without the hints
PR adding the hints and moving those .futures to regular tests

coodie · 2017-06-27T12:11:55Z

@mppf
I've split this PR into testing and implementation, and synchronized both of branches with master.

mppf · 2017-06-27T13:46:24Z

compiler/util/clangUtil.cpp

@@ -54,6 +54,8 @@

 #include "llvmDebug.h"

+#include "llvm/Support/TargetRegistry.h"


I must be missing something - why is this #include now necessary? You didn't change anything else in this file.

Oh well, apparently I forgot to remove this include, it's not necessary for anything to work.

mppf · 2017-06-27T14:45:17Z

test this with --vectorize

mppf · 2017-06-27T14:55:24Z

compiler/codegen/CForLoop.cpp

+#ifdef HAVE_LLVM
+namespace
+{
+  llvm::MDNode* addLoopVectorizationHint(llvm::Instruction* instruction)


I'd expect this function to be static rather than in an anonymous namespace. Arguably a style issue, but making it static will fit in with the rest of the compiler. Vs when defining a type, you can't use static.

mppf · 2017-06-27T14:58:31Z

compiler/codegen/CForLoop.cpp

+    if(fNoVectorize == false && isOrderIndependent())
+    {
+        llvm::MDNode* llvmLoopMetadata = addLoopVectorizationHint(endLoopBranch);
+        for(auto it = blockStmtBody->begin(); it != blockStmtBody->end(); it++)


Prefer ++it to it++ for this kind of loop

Note with LLVM 5.0 (and not before) the recommendation changes to

for (Instruction &I : BB)

I think this code might only add parallel loop access metadata to simple loops.
What if we have

var sum = 0; for i in vectorizeOnly(1..100) with (+ reduce sum) { if i > 10 then sum += i; else sum += 1; }

? Won't this loop over blockStmtBody instructions not visit those instructions "inside" the conditional?

mppf · 2017-07-06T14:09:55Z

compiler/include/codegen.h

@@ -78,6 +91,8 @@ struct GenInfo {
  llvm::IRBuilder<> *builder;
  LayeredValueTable *lvt;

+  std::stack<LoopData> loopStack;


Since this stack only ever contains elements with parallel=true and since you're not extending other loop generation (while loops, other loops, ...), I think this stack should just store llvm::MDNode* for parallel loops and be called something like parallelLoopStack.
Do you envision some case in which this stack will need to contain non-parallel loops?

mppf · 2017-07-06T14:11:49Z

compiler/codegen/expr.cpp

@@ -389,6 +389,13 @@ llvm::StoreInst* codegenStoreLLVM(llvm::Value* val,
  llvm::MDNode* tbaa = NULL;
  if( USE_TBAA && valType ) tbaa = valType->symbol->llvmTbaaNode;
  if( tbaa ) ret->setMetadata(llvm::LLVMContext::MD_tbaa, tbaa);
+
+  if(!info->loopStack.empty()) {
+    const auto &loopData = info->loopStack.top();


Note in a comment here and in the PR message that you're only adding parallel_loop_access hints for the innermost loops.

mppf · 2017-07-06T14:14:52Z

I'd like to see a future PR that adds the parallel loop access to function calls; see https://github.com/chapel-lang/chapel/blob/master/compiler/codegen/expr.cpp#L2222

mppf · 2017-07-06T16:42:45Z

I tested this, with this patch:

diff --git a/compiler/codegen/CForLoop.cpp b/compiler/codegen/CForLoop.cpp
index bcf60f8..9f9dc84 100644
--- a/compiler/codegen/CForLoop.cpp
+++ b/compiler/codegen/CForLoop.cpp
@@ -40,7 +40,7 @@ static llvm::MDNode* generateLoopMetadata(bool vectori
ze)
   auto tmpNode        = llvm::MDNode::getTemporary(ctx, llvm::None);
   args.push_back(tmpNode.get());
 
-  if(vectorize)
+  if(vectorize && 0 /* off for testing */)
   {
     llvm::Metadata *loopVectorizeEnable[] = { llvm::MDString::get(ctx, "llvm.lo
op.vectorize.enable"),
                                               llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(llvm::Type::getInt1Ty(ctx), true))};

with --llvm --vectorize --fast

and I saw 1 failure:

  compflags/coodie/llvmPrintIr/llvmPrintIrFull.chpl

Please have a look at disabling the vectorize.enable by default (I think it's fine to leave in the code, but make it off by default somehow) and at getting llvmPrintIrFull to work in that configuration:

start_test -compopts --llvm -compopts --vectorize -compopts --fast compflags/coodie/llvmPrintIr/

One way to do that would be to make llvmPrintIrFull.skipif containing:

COMPOPTS <= --fast

Once you address these issues I think this is ready to merge. Thanks.

…_loop_access metadata adding

mppf · 2017-07-06T19:43:08Z

Passed full local testing.

@coodie

Add vectorization improvement tests [PR by @coodie - thanks! Reviewed/tested by @mppf] This PR adds vectorization test testing changes related to #6533. Tests are done using FileCheck. It's best if this PR is merged after FileCheck is integrated into chapel's test suite because current solution adds PREDIFF script which runs FileCheck, and for every test empty .good file has to be added.

mppf reviewed Jun 23, 2017

View reviewed changes

coodie mentioned this pull request Jun 27, 2017

[GSoC] Add vectorization improvement tests #6548

Merged

mppf reviewed Jun 27, 2017

View reviewed changes

Add parallel_loop_access metadata generation

fd64585

mppf reviewed Jul 6, 2017

View reviewed changes

coodie added 2 commits July 6, 2017 19:35

Remove adding llvm.loop.vectorize.enable and fix llvmPrintIr tests

37ac0d0

Change name of parameter in generateLoopMetadata and explain parallel…

3a02cd2

…_loop_access metadata adding

mppf merged commit 76d4354 into chapel-lang:master Jul 6, 2017

mppf mentioned this pull request Jul 10, 2017

Improving the LLVM backend #5043

Closed

7 tasks

coodie changed the title ~~LLVM vectorization improvements~~ [GSoC] LLVM vectorization improvements Aug 28, 2017

mppf mentioned this pull request Sep 17, 2018

LLVM vectorization #11162

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC] LLVM vectorization improvements #6533

[GSoC] LLVM vectorization improvements #6533

coodie commented Jun 23, 2017 •

edited

Loading

mppf Jun 23, 2017

mppf Jun 23, 2017

mppf Jun 23, 2017

mppf Jun 23, 2017

mppf Jun 23, 2017

coodie Jun 23, 2017

mppf Jun 23, 2017

mppf commented Jun 23, 2017

coodie commented Jun 27, 2017

mppf Jun 27, 2017

coodie Jun 27, 2017

mppf commented Jun 27, 2017 •

edited

Loading

mppf Jun 27, 2017

mppf Jun 27, 2017

mppf Jun 27, 2017

mppf Jun 27, 2017

mppf Jul 6, 2017

mppf Jul 6, 2017

mppf commented Jul 6, 2017

mppf commented Jul 6, 2017

mppf commented Jul 6, 2017

		@@ -54,6 +54,8 @@

		#include "llvmDebug.h"

		#include "llvm/Support/TargetRegistry.h"

		@@ -0,0 +1 @@
		CHPL_LLVM!=llvm

[GSoC] LLVM vectorization improvements #6533

[GSoC] LLVM vectorization improvements #6533

Conversation

coodie commented Jun 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mppf commented Jun 23, 2017

coodie commented Jun 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mppf commented Jun 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mppf commented Jul 6, 2017

mppf commented Jul 6, 2017

mppf commented Jul 6, 2017

coodie commented Jun 23, 2017 •

edited

Loading

mppf commented Jun 27, 2017 •

edited

Loading