New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC unbox more immutables #18632

Open
wants to merge 25 commits into
from

Conversation

Projects
None yet
9 participants
@carnaval
Contributor

carnaval commented Sep 22, 2016

The codegen/gc part of this is basically working.
I'm now wondering about semantics and I'd like us to discuss the following issues a bit before I clean up the code and we start the review (there is a bunch of duplicate paths in codegen that can be merged together/simplified and some things are plain wrong and/or inefficient).

This patch allows us to unbox most immutables. By unbox I mean : allocate/store them on the stack, inline them in other objects and inline them in arrays.
Why most ? There are (for now) two problems : cycles and #undef.

Cycles are a fundamental problem, if A has a field of type B and B of type A, we obviously can't inline them into each other. The cycle needs to be broken, the annoying part is that it should be done in a predictable way. For now, on this PR, it's done in DFS order which means that for example the layout of B will differ if we ever instantiated an A before. Not good. Proposal I remember about that (Jameson @ juliacon iirc) was to make types boxed iff they are part of any field cycle.
#undef is annoying because it makes a difference at the julia level between isbits types and other immutables. To minimize breakage I've gone the route of preserving the current behavior.

So if A has a pointer field and we make, e.g., an uninitialized array of A, this branch uses the nullity of the field of A as a marker that the corresponding slot in the array is #undef. This only works if the field of a valid instance of A can never be null, i.e., if A.ninitialized >= field_index_of_the_ptr_field.

This makes most code (at least all the test suite :-)) work but I think the following rules are really weird :

A type T will be inlined into fields/arrays and stack allocated if

  • it is immutable
  • it is not possible to reach itself through a sequence of field access
  • it has at least one never-#undef pointer field or no pointer fields at all

The only difference between a type that is boxed or not is memory layout, but I'd assume that we want that to be easily predictable since for example people routinely interface with C.

A proposed alternative by Yichao was to make it entierly opt-in and error out if inlining was not possible. I'm worried this will lead to yet-another-annotation that people will sprinkle everywhere.

For performance, specially crafted tests (like summing lines of a very skinny matrix using subarrays) show some improvements by avoiding gc allocation. Not super satisfying for now and casual inspection of generated asm shows a lot of stack movement. We can work on that though, probably by improving llvm's visibility of our rooting mechanism and/or just using statepoints.

(to sweeten the deal I've thrown in improved undef ref errors)

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Sep 22, 2016

Member

Awesome!!

Probably a predictable request: is it possible to have the part that's just an optimization (stack allocation) first, without changing the layout of anything? That could probably be merged very quickly.

How does stack marking work in the gc?

Member

JeffBezanson commented Sep 22, 2016

Awesome!!

Probably a predictable request: is it possible to have the part that's just an optimization (stack allocation) first, without changing the layout of anything? That could probably be merged very quickly.

How does stack marking work in the gc?

@carnaval

This comment has been minimized.

Show comment
Hide comment
@carnaval

carnaval Sep 22, 2016

Contributor

Yeah I'm just afraid that it'll make things worse since everytime the immutable will go in/out of local scope it'll have to be unboxed/boxed so we may end up making more boxes than today.

The gc objects on the stack have a pointer to them in the gc frame and the special treatment in gc is done by checking if they are inside the task stack's bounds.
With statepoints we could avoid that altogether and put the alloca's sp-offset directly in the stackmap, as well as the constant tag value, approaching zero-runtime cost.

Contributor

carnaval commented Sep 22, 2016

Yeah I'm just afraid that it'll make things worse since everytime the immutable will go in/out of local scope it'll have to be unboxed/boxed so we may end up making more boxes than today.

The gc objects on the stack have a pointer to them in the gc frame and the special treatment in gc is done by checking if they are inside the task stack's bounds.
With statepoints we could avoid that altogether and put the alloca's sp-offset directly in the stackmap, as well as the constant tag value, approaching zero-runtime cost.

@nalimilan

This comment has been minimized.

Show comment
Hide comment
@nalimilan

nalimilan Sep 22, 2016

Contributor

Am I right that this will dramatically improve the performance of Nullable? :-)

Contributor

nalimilan commented Sep 22, 2016

Am I right that this will dramatically improve the performance of Nullable? :-)

// VecElement types are unwrapped in LLVM.
addr = strct.V;
else
assert(0);

This comment has been minimized.

@vchuravy

vchuravy Sep 22, 2016

Member

That assert should probably go away?

@vchuravy

vchuravy Sep 22, 2016

Member

That assert should probably go away?

@yuyichao

This comment has been minimized.

Show comment
Hide comment
@yuyichao

yuyichao Sep 22, 2016

Member

My argument for opt-in is also (local) predictability.

The question is that for given types T1 and T2, what information do we need to determine if Tuple{T1,T2} will be stored inline. I think we should make this only rely on the properties of Tuple, T1 and T2 but not their interactions. This is not the case if we treat any types in the cycle as non-inlineable. Since T2 might have a field Tuple{Int,T2}, in which case Tuple{Int,Float64} is inlined, Tuple{Float64,T2} is inlined, but Tuple{Int,T2} will not be. If we make this opt-in and mark Tuple as a inlined type, then this will be predictable with local informations (in fact, it only relies on Tuple). There will be an error if T2 is marked as inlined too during type construction time.

Member

yuyichao commented Sep 22, 2016

My argument for opt-in is also (local) predictability.

The question is that for given types T1 and T2, what information do we need to determine if Tuple{T1,T2} will be stored inline. I think we should make this only rely on the properties of Tuple, T1 and T2 but not their interactions. This is not the case if we treat any types in the cycle as non-inlineable. Since T2 might have a field Tuple{Int,T2}, in which case Tuple{Int,Float64} is inlined, Tuple{Float64,T2} is inlined, but Tuple{Int,T2} will not be. If we make this opt-in and mark Tuple as a inlined type, then this will be predictable with local informations (in fact, it only relies on Tuple). There will be an error if T2 is marked as inlined too during type construction time.

@musm musm referenced this pull request Sep 27, 2016

Closed

benchmark results repository #34

@ChrisRackauckas ChrisRackauckas referenced this pull request Nov 7, 2016

Closed

ODE/DAE solver interface #5

1 of 4 tasks complete
@quinnj

This comment has been minimized.

Show comment
Hide comment
@quinnj

quinnj Nov 17, 2016

Member

Healthy bump for 0.6 timeline; I think we'd all love to see this get in.

Member

quinnj commented Nov 17, 2016

Healthy bump for 0.6 timeline; I think we'd all love to see this get in.

@vchuravy vchuravy added this to the 1.0 milestone Dec 24, 2016

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Dec 24, 2016

Member

I would have liked to see this in 0.6 since I think this is a very valuable optimisation especially for RefValue and it is necessary on the GPU for some core features, but with feature-freeze for v0.6 in about a week this doesn't look likely.

It would be awesome to have this early on for the next release cycle!

Member

vchuravy commented Dec 24, 2016

I would have liked to see this in 0.6 since I think this is a very valuable optimisation especially for RefValue and it is necessary on the GPU for some core features, but with feature-freeze for v0.6 in about a week this doesn't look likely.

It would be awesome to have this early on for the next release cycle!

@yuyichao

This comment has been minimized.

Show comment
Hide comment
@yuyichao

yuyichao Dec 24, 2016

Member

This is actually independent of stack allocation of unescaped RefValue

Member

yuyichao commented Dec 24, 2016

This is actually independent of stack allocation of unescaped RefValue

@vtjnash

This comment has been minimized.

Show comment
Hide comment
@vtjnash

vtjnash Dec 24, 2016

Member

I would have liked to see this in 0.6 since I think this is a very valuable optimisation especially for RefValue

Just thought I'd drop by to point out that this PR won't alter RefValue (that optimization is, most nearly, d189cb3). They are relatively orthogonal optimizations.

Member

vtjnash commented Dec 24, 2016

I would have liked to see this in 0.6 since I think this is a very valuable optimisation especially for RefValue

Just thought I'd drop by to point out that this PR won't alter RefValue (that optimization is, most nearly, d189cb3). They are relatively orthogonal optimizations.

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Dec 24, 2016

Member

Yeah sorry mixed the two things up in my head, since this includes a version of d189cb3 with stack_new

Member

vchuravy commented Dec 24, 2016

Yeah sorry mixed the two things up in my head, since this includes a version of d189cb3 with stack_new

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Jul 20, 2017

Member

This is an optimization and therefore not release blocking. Realistically, this PR is not going to be merged in this form, so we may as well close it and take it off the 1.0 milestone.

Member

StefanKarpinski commented Jul 20, 2017

This is an optimization and therefore not release blocking. Realistically, this PR is not going to be merged in this form, so we may as well close it and take it off the 1.0 milestone.

@StefanKarpinski StefanKarpinski removed this from the 1.0 milestone Jul 20, 2017

@davidanthoff

This comment has been minimized.

Show comment
Hide comment
@davidanthoff

davidanthoff Jul 20, 2017

Contributor

Is there a high level issue that tracks progress on this general theme? Would be good to have something open that refers to this optimization.

Contributor

davidanthoff commented Jul 20, 2017

Is there a high level issue that tracks progress on this general theme? Would be good to have something open that refers to this optimization.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
Member

StefanKarpinski commented Jul 20, 2017

cc @Keno

@vtjnash

This comment has been minimized.

Show comment
Hide comment
@vtjnash

vtjnash Jul 20, 2017

Member

Would be nice to keep this open to make it easier to find, since while it won't be merged, we are likely to take many pieces from it.

Member

vtjnash commented Jul 20, 2017

Would be nice to keep this open to make it easier to find, since while it won't be merged, we are likely to take many pieces from it.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Jul 21, 2017

Member

Why don't you make a "high level issue that tracks progress on this general theme" instead and link to this PR from there along with all the other relevant PRs?

Member

StefanKarpinski commented Jul 21, 2017

Why don't you make a "high level issue that tracks progress on this general theme" instead and link to this PR from there along with all the other relevant PRs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment