Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler optimization tracker #3440

Closed
58 of 65 tasks
JeffBezanson opened this issue Jun 18, 2013 · 38 comments
Closed
58 of 65 tasks

compiler optimization tracker #3440

JeffBezanson opened this issue Jun 18, 2013 · 38 comments
Assignees
Labels
performance Must go faster

Comments

@JeffBezanson
Copy link
Sponsor Member

JeffBezanson commented Jun 18, 2013

This is an umbrella issue for compiler and other low-level optimizations. I'm including ones that are already done, to make it more interesting.

compiler:

RTS:

  • compress ASTs
  • method hash tables
  • more efficient method and cache representation
  • gc: 1-bit refcount (gc enhancements #261)
  • flatten arrays of tuples
  • use realloc in growing arrays (complicated by alignment constraints)
  • avoid extra buffer copy in uv_write
  • allow scheduler to be a function call, avoiding double task switch to get to next task
  • avoid some stack copies in Task; e.g. for tasks that never yield

larger projects:

performance-related features:

@ghost ghost assigned JeffBezanson Jun 18, 2013
@ViralBShah
Copy link
Member

This is a really nice and useful tracker! We should also link these to our release notes.

@timholy
Copy link
Sponsor Member

timholy commented Jun 18, 2013

It's not obvious to me that you gave yourself credit for your awesome recent work in making let faster :-). Presumably it's bundled into something else?

Thanks for this list. Mildly embarrassed by how many of the issues were filed by me, without me being able to do much to fix most of them.

@timholy
Copy link
Sponsor Member

timholy commented Jun 18, 2013

...but here I go again...

In addition to "flatten arrays of tuples" is it at all feasible to "flatten arrays of types with fixed size"? See https://groups.google.com/d/msg/julia-dev/QOZfPkdRQwk/O-DgzNxbQegJ.

@StefanKarpinski
Copy link
Sponsor Member

We did ask for "ungracious demands" from the outset ;-)

@lindahua
Copy link
Contributor

Great stuff! I believe this will take the performance of Julia to a new height when done.

JeffBezanson added a commit that referenced this issue Jun 20, 2013
- make it an error to resize an array with shared data (fixes #3430)
- now able to use realloc to grow arrays (part of #3440, helps #3441)
- the new scheme is simpler. one Array owns the data, instead of
  tracking the buffers separately as mallocptr_t
- Array data can be allocated inline, with malloc, or from a pool
@blakejohnson
Copy link
Contributor

What about the kind of constant folding Stefan mentions in #2741 ?

@IainNZ
Copy link
Member

IainNZ commented Jun 25, 2013

Thinking out loud: how hard would it be to set up something like http://speed.pypy.org/ ?
On the server it seems clear what is required, but do we have a good body of performance tests? If this is something desirable, I'm sure its something the community could pull together without any demands on core developers...

@staticfloat
Copy link
Sponsor Member

@IainNZ; I'm working on that right now. :)

We have a small corpus of performance tests in test/perf2 but I do believe we will need some work to come up with small, self-contained and relevant performance tests. I will open an issue regarding this in the near future, once I have more concrete work done.

@JeffBezanson
Copy link
Sponsor Member Author

related: 620141b

JeffBezanson added a commit that referenced this issue Jul 3, 2013
happens to make sparse mul faster; #942
JeffBezanson added a commit that referenced this issue Jul 7, 2013
this removes stores and loads, and if we can reduce the number of roots
in a function to zero we save significant overhead.
ref #3440
@malmaud
Copy link
Contributor

malmaud commented Oct 24, 2013

I'd be interested in helping out on one of these issues, but I don't want to step on anyone's toes or duplicate effort. Can I get some guidance on which of these issues is in need of manpower?

@staticfloat
Copy link
Sponsor Member

@JeffBezanson Is "bounds check elimination" covered by @inbounds?

@JeffBezanson
Copy link
Sponsor Member Author

No; ideally there should be some automatic bounds check removal.

@johnmyleswhite
Copy link
Member

What level of automaticity do you have in mind? I'd love it if bounds-checking were automatically turned off for loops that look like:

for i in 1:length(x)
    x[i] += 1
end

@JeffBezanson
Copy link
Sponsor Member Author

Yes, that's the kind of case that can be handled automatically.

@ArchRobison
Copy link
Contributor

By the way, the TBAA support in #5355 partially fixes the ``hoist 1-d array metadata loads'' item. It's partial because bounds checks will stop the hoisting because LLVM can't prove it's legal and profitable to hoist a load up over a branch.

@yuyichao
Copy link
Contributor

yuyichao commented Jan 8, 2015

@staticfloat Is there any update on having a performance test suite and sth similar to speed.pypy.org. I found it very useful when working on optimizations (in addition to running the current tests to make sure I'm not breaking anything) to have somthing like this to measure the performance gain (or regression in some cases). Would be awesome to have github integration but sth I can run locally is also good enough.

@JeffBezanson
Copy link
Sponsor Member Author

Running make in test/perf will run our benchmark suite.

@ghost
Copy link

ghost commented Jan 8, 2015

@yuyichao: There is also the Julia Speed Center.

Edit: It appears to not have any new updates since June 2014 though, which was around when I last had a look at it.

@ViralBShah
Copy link
Member

@staticfloat can shed some light on the history there.

@staticfloat
Copy link
Sponsor Member

Yes, I think @yuyichao was hinting at that when he pinged me. Essentially, I just haven't had time to rewrite speed.julialang.org in something more flexible than the original adaptation from the codespeed project. It's the next big Julia project I will tackle, but unfortunately time for big projects is thin for me at the moment.

@ViralBShah
Copy link
Member

Is it fair to tick off SIMD types on this list with tuple overhaul?

@ihnorton
Copy link
Member

I think ensuring proper alignment for SIMD is still TODO.
On Apr 23, 2015 9:12 AM, "Viral B. Shah" notifications@github.com wrote:

Is it fair to tick off SIMD types on this list with tuple overhaul?


Reply to this email directly or view it on GitHub
#3440 (comment).

@ArchRobison
Copy link
Contributor

My impression is that recent Intel SIMD hardware is much less performance picky about alignment than it used to be, but I have not run experiments.

@simonster
Copy link
Member

One bit that is (I think?) still missing is the ability to do vectorized arithmetic on tuples (or something that wraps them) without llvmcall.

@ArchRobison
Copy link
Contributor

The optional (under -O) SLPVectorizer pass used to vectorize tuple arithmetic without any special wrapper. But it is failing to do so.

@pao
Copy link
Member

pao commented Apr 23, 2015

My impression is that recent Intel SIMD hardware is much less performance picky about alignment than it used to be, but I have not run experiments.

And getting good performance out of less-recent hardware shouldn't be ignored if it's reasonable to get.

@ScottPJones
Copy link
Contributor

OK, this is great (thanks for the reference that led me here @timholy) Which boxes that are not checked are already largely in place? Thanks.

@timholy
Copy link
Sponsor Member

timholy commented May 2, 2015

I certainly am not qualified to give a whole list, but "more general inlining" and "inline declararation" are presumably largely done (there are still ambitions to introduce inlining as controlled from the call site), and there's already some bounds check elimination and SIMD support.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 18, 2015

checked off:
better type info for tasks/produce/consume (Channels)
SIMD support (#2299)
more general inlining
inline declaration (#1106)
pure function declaration (#414)

since the basic version of each of these has been implemented

@vtjnash vtjnash closed this as completed Nov 18, 2015
@vtjnash vtjnash reopened this Nov 18, 2015
@eschnett
Copy link
Contributor

You may want to add @fastmath code generation.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 12, 2017

Checked off a few more. We now have all of these, or they are tracked in another existing issues.

@vtjnash vtjnash closed this as completed Sep 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests