compiler optimization tracker #3440

JeffBezanson · 2013-06-18T08:03:44Z

This is an umbrella issue for compiler and other low-level optimizations. I'm including ones that are already done, to make it more interesting.

compiler:

RTS:

compress ASTs
method hash tables
more efficient method and cache representation
gc: 1-bit refcount (gc enhancements #261)
flatten arrays of tuples
use realloc in growing arrays (complicated by alignment constraints)
avoid extra buffer copy in uv_write
allow scheduler to be a function call, avoiding double task switch to get to next task
avoid some stack copies in Task; e.g. for tasks that never yield

larger projects:

bounds check elimination
cache generated code (suggestions for faster startup #260)
inlining function arguments
henchmen unrolling
better type info for tasks/produce/consume
SIMD support (SIMD types #2299)

performance-related features:

inbounds macro (RFC: macros for disabling bounds checks #3268, Provide mechanism for turning off bounds checking #1392)
bounds check intrinsic
sizeof intrinsic
expose llvm select (inlined, eager-evaluated conditional function)
inline declaration (inline keyword? #1106)
pure function declaration (Pure function notation #414)
improve behavior of globals (Type annotations in global scope #964)
better support for in-place ops (Improvement needed for inplace arithmetics #3424, in-place assignment operator? #249, Functions that produce results in pre-defined outputs #1115, Vectorization Roadmap #16285)

The text was updated successfully, but these errors were encountered:

ViralBShah · 2013-06-18T08:08:04Z

This is a really nice and useful tracker! We should also link these to our release notes.

timholy · 2013-06-18T10:42:19Z

It's not obvious to me that you gave yourself credit for your awesome recent work in making let faster :-). Presumably it's bundled into something else?

Thanks for this list. Mildly embarrassed by how many of the issues were filed by me, without me being able to do much to fix most of them.

timholy · 2013-06-18T10:53:59Z

...but here I go again...

In addition to "flatten arrays of tuples" is it at all feasible to "flatten arrays of types with fixed size"? See https://groups.google.com/d/msg/julia-dev/QOZfPkdRQwk/O-DgzNxbQegJ.

StefanKarpinski · 2013-06-18T15:08:45Z

We did ask for "ungracious demands" from the outset ;-)

lindahua · 2013-06-18T16:19:38Z

Great stuff! I believe this will take the performance of Julia to a new height when done.

- make it an error to resize an array with shared data (fixes #3430) - now able to use realloc to grow arrays (part of #3440, helps #3441) - the new scheme is simpler. one Array owns the data, instead of tracking the buffers separately as mallocptr_t - Array data can be allocated inline, with malloc, or from a pool

blakejohnson · 2013-06-25T14:35:11Z

What about the kind of constant folding Stefan mentions in #2741 ?

IainNZ · 2013-06-25T22:42:11Z

Thinking out loud: how hard would it be to set up something like http://speed.pypy.org/ ?
On the server it seems clear what is required, but do we have a good body of performance tests? If this is something desirable, I'm sure its something the community could pull together without any demands on core developers...

staticfloat · 2013-06-25T23:32:56Z

@IainNZ; I'm working on that right now. :)

We have a small corpus of performance tests in test/perf2 but I do believe we will need some work to come up with small, self-contained and relevant performance tests. I will open an issue regarding this in the near future, once I have more concrete work done.

…orking helps #2050, part of #3440

JeffBezanson · 2013-07-03T19:20:10Z

related: 620141b

happens to make sparse mul faster; #942

this removes stores and loads, and if we can reduce the number of roots in a function to zero we save significant overhead. ref #3440

malmaud · 2013-10-24T20:56:39Z

I'd be interested in helping out on one of these issues, but I don't want to step on anyone's toes or duplicate effort. Can I get some guidance on which of these issues is in need of manpower?

staticfloat · 2013-12-07T10:07:59Z

@JeffBezanson Is "bounds check elimination" covered by @inbounds?

JeffBezanson · 2013-12-07T17:23:55Z

No; ideally there should be some automatic bounds check removal.

johnmyleswhite · 2013-12-07T17:30:35Z

What level of automaticity do you have in mind? I'd love it if bounds-checking were automatically turned off for loops that look like:

for i in 1:length(x)
    x[i] += 1
end

JeffBezanson · 2013-12-07T17:33:02Z

Yes, that's the kind of case that can be handled automatically.

ArchRobison · 2014-02-19T23:01:10Z

By the way, the TBAA support in #5355 partially fixes the ``hoist 1-d array metadata loads'' item. It's partial because bounds checks will stop the hoisting because LLVM can't prove it's legal and profitable to hoist a load up over a branch.

yuyichao · 2015-01-08T02:02:23Z

@staticfloat Is there any update on having a performance test suite and sth similar to speed.pypy.org. I found it very useful when working on optimizations (in addition to running the current tests to make sure I'm not breaking anything) to have somthing like this to measure the performance gain (or regression in some cases). Would be awesome to have github integration but sth I can run locally is also good enough.

JeffBezanson · 2015-01-08T04:30:09Z

Running make in test/perf will run our benchmark suite.

ghost · 2015-01-08T04:37:48Z

@yuyichao: There is also the Julia Speed Center.

Edit: It appears to not have any new updates since June 2014 though, which was around when I last had a look at it.

ViralBShah · 2015-01-08T07:43:06Z

@staticfloat can shed some light on the history there.

staticfloat · 2015-01-08T08:06:43Z

Yes, I think @yuyichao was hinting at that when he pinged me. Essentially, I just haven't had time to rewrite speed.julialang.org in something more flexible than the original adaptation from the codespeed project. It's the next big Julia project I will tackle, but unfortunately time for big projects is thin for me at the moment.

ViralBShah · 2015-04-23T13:12:09Z

Is it fair to tick off SIMD types on this list with tuple overhaul?

ihnorton · 2015-04-23T13:29:39Z

I think ensuring proper alignment for SIMD is still TODO.
On Apr 23, 2015 9:12 AM, "Viral B. Shah" notifications@github.com wrote:

Is it fair to tick off SIMD types on this list with tuple overhaul?

—
Reply to this email directly or view it on GitHub
#3440 (comment).

ArchRobison · 2015-04-23T14:16:08Z

My impression is that recent Intel SIMD hardware is much less performance picky about alignment than it used to be, but I have not run experiments.

simonster · 2015-04-23T14:37:12Z

One bit that is (I think?) still missing is the ability to do vectorized arithmetic on tuples (or something that wraps them) without llvmcall.

ArchRobison · 2015-04-23T16:05:54Z

The optional (under -O) SLPVectorizer pass used to vectorize tuple arithmetic without any special wrapper. But it is failing to do so.

pao · 2015-04-23T17:38:51Z

My impression is that recent Intel SIMD hardware is much less performance picky about alignment than it used to be, but I have not run experiments.

And getting good performance out of less-recent hardware shouldn't be ignored if it's reasonable to get.

ScottPJones · 2015-05-02T13:17:50Z

OK, this is great (thanks for the reference that led me here @timholy) Which boxes that are not checked are already largely in place? Thanks.

timholy · 2015-05-02T14:12:59Z

I certainly am not qualified to give a whole list, but "more general inlining" and "inline declararation" are presumably largely done (there are still ambitions to introduce inlining as controlled from the call site), and there's already some bounds check elimination and SIMD support.

vtjnash · 2015-11-18T06:09:32Z

checked off:
better type info for tasks/produce/consume (Channels)
SIMD support (#2299)
more general inlining
inline declaration (#1106)
pure function declaration (#414)

since the basic version of each of these has been implemented

eschnett · 2015-11-18T13:21:33Z

You may want to add @fastmath code generation.

vtjnash · 2017-09-12T21:52:02Z

Checked off a few more. We now have all of these, or they are tracked in another existing issues.

ghost assigned JeffBezanson Jun 18, 2013

ViralBShah mentioned this issue Jun 18, 2013

Poor performance of OpenBLAS matrix-vector multiplication on small sizes #3239

Closed

JeffBezanson mentioned this issue Jun 30, 2013

getfield performance #3588

Closed

JeffBezanson added a commit that referenced this issue Jul 2, 2013

compiler: remove temp vars that block vararg tuple elimination from w…

7fe5847

…orking helps #2050, part of #3440

JeffBezanson added a commit that referenced this issue Jul 3, 2013

optimize ... when delegating an arglist. ref #3440

0d39ee4

JeffBezanson added a commit that referenced this issue Jul 3, 2013

avoid redundant re-loads out of GC frame. ref #3440

4626669

happens to make sparse mul faster; #942

JeffBezanson mentioned this issue Jul 5, 2013

RFC: macros for disabling bounds checks #3268

Merged

JeffBezanson added a commit that referenced this issue Jul 7, 2013

do a better job of avoiding unnecessary GC roots in codegen

ec3311a

this removes stores and loads, and if we can reduce the number of roots in a function to zero we save significant overhead. ref #3440

lindahua mentioned this issue Dec 10, 2013

Roadmap for 0.3 #4853

Closed

21 tasks

simonster mentioned this issue Dec 16, 2013

Ideas for better static type inference JuliaData/DataFrames.jl#451

Closed

simonster mentioned this issue Jan 20, 2014

transpose/permutedims with undefined values fails #5448

Closed

simonster mentioned this issue Feb 11, 2014

Replace PooledDataArray with NominalVariable/CategoricalVariable JuliaStats/DataArrays.jl#73

Closed

timholy mentioned this issue Jun 19, 2014

Staged Functions #7311

Closed

simonster mentioned this issue Jul 9, 2014

Interface for high-performance indexing operations JuliaStats/DataArrays.jl#71

Open

timholy mentioned this issue Jan 9, 2015

WIP: Staged functions #7474

Merged

4 tasks

JeffBezanson mentioned this issue Mar 2, 2015

WIP: redesign of tuples and tuple types #10380

Merged

timholy mentioned this issue May 2, 2015

Update the Performance Tip doc #11077

Closed

yuyichao mentioned this issue May 6, 2015

Constant folding/propagation of functions #11159

Closed

yuyichao mentioned this issue Sep 11, 2015

Weak type inference in a special case #13082

Closed

yuyichao mentioned this issue Oct 1, 2015

Const bindings not optimized in local scope functions #13395

Closed

tkelman mentioned this issue Oct 29, 2015

Tuple type member matrix vs. simple matrix performance difference #13816

Closed

vtjnash closed this as completed Nov 18, 2015

vtjnash reopened this Nov 18, 2015

timholy mentioned this issue Mar 27, 2016

Decouple dispatch and specialization #11339

Open

wbhart mentioned this issue Aug 17, 2016

Stack allocation of tuples/immutables containing heap references #18084

Closed

JeffBezanson modified the milestones: 1.x, 1.0 May 2, 2017

vtjnash closed this as completed Sep 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler optimization tracker #3440

compiler optimization tracker #3440

JeffBezanson commented Jun 18, 2013 •

edited by vtjnash

ViralBShah commented Jun 18, 2013

timholy commented Jun 18, 2013

timholy commented Jun 18, 2013

StefanKarpinski commented Jun 18, 2013

lindahua commented Jun 18, 2013

blakejohnson commented Jun 25, 2013

IainNZ commented Jun 25, 2013

staticfloat commented Jun 25, 2013

JeffBezanson commented Jul 3, 2013

malmaud commented Oct 24, 2013

staticfloat commented Dec 7, 2013

JeffBezanson commented Dec 7, 2013

johnmyleswhite commented Dec 7, 2013

JeffBezanson commented Dec 7, 2013

ArchRobison commented Feb 19, 2014

yuyichao commented Jan 8, 2015

JeffBezanson commented Jan 8, 2015

ghost commented Jan 8, 2015

ViralBShah commented Jan 8, 2015

staticfloat commented Jan 8, 2015

ViralBShah commented Apr 23, 2015

ihnorton commented Apr 23, 2015

ArchRobison commented Apr 23, 2015

simonster commented Apr 23, 2015

ArchRobison commented Apr 23, 2015

pao commented Apr 23, 2015

ScottPJones commented May 2, 2015

timholy commented May 2, 2015

vtjnash commented Nov 18, 2015

eschnett commented Nov 18, 2015

vtjnash commented Sep 12, 2017

compiler optimization tracker #3440

compiler optimization tracker #3440

Comments

JeffBezanson commented Jun 18, 2013 • edited by vtjnash

ViralBShah commented Jun 18, 2013

timholy commented Jun 18, 2013

timholy commented Jun 18, 2013

StefanKarpinski commented Jun 18, 2013

lindahua commented Jun 18, 2013

blakejohnson commented Jun 25, 2013

IainNZ commented Jun 25, 2013

staticfloat commented Jun 25, 2013

JeffBezanson commented Jul 3, 2013

malmaud commented Oct 24, 2013

staticfloat commented Dec 7, 2013

JeffBezanson commented Dec 7, 2013

johnmyleswhite commented Dec 7, 2013

JeffBezanson commented Dec 7, 2013

ArchRobison commented Feb 19, 2014

yuyichao commented Jan 8, 2015

JeffBezanson commented Jan 8, 2015

ghost commented Jan 8, 2015

ViralBShah commented Jan 8, 2015

staticfloat commented Jan 8, 2015

ViralBShah commented Apr 23, 2015

ihnorton commented Apr 23, 2015

ArchRobison commented Apr 23, 2015

simonster commented Apr 23, 2015

ArchRobison commented Apr 23, 2015

pao commented Apr 23, 2015

ScottPJones commented May 2, 2015

timholy commented May 2, 2015

vtjnash commented Nov 18, 2015

eschnett commented Nov 18, 2015

vtjnash commented Sep 12, 2017

JeffBezanson commented Jun 18, 2013 •

edited by vtjnash