Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experiment with llvm vectorization passes #4786

Closed
StefanKarpinski opened this issue Nov 12, 2013 · 12 comments
Closed

experiment with llvm vectorization passes #4786

StefanKarpinski opened this issue Nov 12, 2013 · 12 comments

Comments

@StefanKarpinski
Copy link
Sponsor Member

This may be more applicable once we work with LLVM 3.4 (which may also depend on switching to using MCJIT), but there is now a fairly significant amount of support for autovectorization in LLVM.

@ArchRobison
Copy link
Contributor

I'm interested in experimenting with adding an annotation akin to the OpenMP 4.0 "pragma omp simd". It would convey the information that exact sequential semantics are not required. For example, bounds checking would still happen, but a failed bounds check might terminate a loop earlier than if the sequential semantics were followed. Without that grant of permissiveness, autovectorizers are often thwarted.

@StefanKarpinski
Copy link
Sponsor Member Author

Also definitely work considering, although I would, of course, prefer to avoid pragmas where possible.

@simonster
Copy link
Member

I tried to enable the loop vectorizer in #3929, but I couldn't get it to work because of the way jl_value_t was getting handled in the instruction combining pass. This could use attention from someone who knows more about LLVM than I do. I also tried enabling the SLP vectorizer, but that made building the sysimg extremely slow; it seems like Julia sometimes ends up compiling functions with absurdly large numbers of variables.

@StefanKarpinski
Copy link
Sponsor Member Author

that made building the sysimg extremely slow

Might be worth it? Of course, the concern is more that it will make code compilation after building the system image very slow too.

@JeffBezanson
Copy link
Sponsor Member

It'd probably be best to selectively enable it for non-huge functions.

@lindahua
Copy link
Contributor

I agree with introducing some way to selectively enable LLVM auto-vectorization for a small set of functions for testing purpose.

If this works, micro-optimization like those in #5205 will no longer be needed.

@ArchRobison
Copy link
Contributor

I'm partway through implementing vectorization of loops that are marked by the programmer. The scheme is inspired by OpenMP 4.0's pragma omp simd. I was planning to create an issue describing what I have after the holidays. Here is a summary of what I'm trying to do:

  1. Have the programmer apply @inbounds to eliminate bounds checks. Maybe in the future we can make LLVM eliminate unnecessary checks. Item 2 should greatly help the necessary analysis. Or in the long term, just vectorize bounds checks too.
  2. Leverage type-based alias analysis in LLVM. I have this working. It greatly helps hoisting loop-invariant loads of like arrayptr and arraylen, and thus solves the "second issue" mentioned in issue WIP: Enable LLVM loop vectorizer #3929. Alas it's not much use without item 1, because code in the shadow of a bounds check is no longer guaranteed to be executed each iteration. I'd really like to find a way to teach LLVM to "speculatively" hoist such loads of fields in jl_array_t.
  3. Have the programmer mark the for loop with a @simd macro. The macro transforms the loop into an equivalent while loop that uses a loop test based on < instead of <=. That solves the "first issue" mentioned in issue WIP: Enable LLVM loop vectorizer #3929 without having to set no-signed-wrap. (The < test will do the wrong thing if the original upper loop bound was INT_MAX. But so would no-signed-wrap.) Indeed items 1-3 are enough to vectorize simple loops that don't have memory dependence issues or for which LLVM can insert a run-time dependence test. (Sometimes it does, sometimes it doesn't. I haven't figured out its decision logic yet.)
  4. Have the @simd macro tell the LLVM loop vectorizer to ignore memory dependencies when considering whether to vectorize a loop. This is the reason that OpenMP 4.0 added the equivalent feature. I'm working on this part now. I was planning to attach LLVM Metadata to the "loop latch" BasicBlock, but found out today that LLVM currently does not allow attaching metadata to a BasicBlock (sigh). I'm going to try hacking something that attaches the metadata to an instruction in the block, and then figure out what I have to modify in LLVM to propagate the information.
    In lieu of step 4, my current prototoype makes the loop vectorizer ignore memory dependences in any Julia function with a name beginning with banana. That's useful for experiments, but probably not production worthy :-)

@StefanKarpinski
Copy link
Sponsor Member Author

In lieu of step 4, my current prototoype makes the loop vectorizer ignore memory dependences in any Julia function with a name beginning with banana. That's useful for experiments, but probably not production worthy :-)

Seems like a fine interface to me.

@ViralBShah
Copy link
Member

I like the idea of the banana interface too. :-)

@jiahao
Copy link
Member

jiahao commented Apr 7, 2014

Presumably closed by PR above?

@simonster
Copy link
Member

This isn't complete until we have the SLPVectorizer (#6271)

@JeffBezanson
Copy link
Sponsor Member

I think this is well underway and subsumed by more specific issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants