Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC unbox more immutables #18632

Closed
wants to merge 25 commits into from
Closed

WIP/RFC unbox more immutables #18632

wants to merge 25 commits into from

Conversation

@carnaval
Copy link
Contributor

carnaval commented Sep 22, 2016

The codegen/gc part of this is basically working.
I'm now wondering about semantics and I'd like us to discuss the following issues a bit before I clean up the code and we start the review (there is a bunch of duplicate paths in codegen that can be merged together/simplified and some things are plain wrong and/or inefficient).

This patch allows us to unbox most immutables. By unbox I mean : allocate/store them on the stack, inline them in other objects and inline them in arrays.
Why most ? There are (for now) two problems : cycles and #undef.

Cycles are a fundamental problem, if A has a field of type B and B of type A, we obviously can't inline them into each other. The cycle needs to be broken, the annoying part is that it should be done in a predictable way. For now, on this PR, it's done in DFS order which means that for example the layout of B will differ if we ever instantiated an A before. Not good. Proposal I remember about that (Jameson @ juliacon iirc) was to make types boxed iff they are part of any field cycle.
#undef is annoying because it makes a difference at the julia level between isbits types and other immutables. To minimize breakage I've gone the route of preserving the current behavior.

So if A has a pointer field and we make, e.g., an uninitialized array of A, this branch uses the nullity of the field of A as a marker that the corresponding slot in the array is #undef. This only works if the field of a valid instance of A can never be null, i.e., if A.ninitialized >= field_index_of_the_ptr_field.

This makes most code (at least all the test suite :-)) work but I think the following rules are really weird :

A type T will be inlined into fields/arrays and stack allocated if

  • it is immutable
  • it is not possible to reach itself through a sequence of field access
  • it has at least one never-#undef pointer field or no pointer fields at all

The only difference between a type that is boxed or not is memory layout, but I'd assume that we want that to be easily predictable since for example people routinely interface with C.

A proposed alternative by Yichao was to make it entierly opt-in and error out if inlining was not possible. I'm worried this will lead to yet-another-annotation that people will sprinkle everywhere.

For performance, specially crafted tests (like summing lines of a very skinny matrix using subarrays) show some improvements by avoiding gc allocation. Not super satisfying for now and casual inspection of generated asm shows a lot of stack movement. We can work on that though, probably by improving llvm's visibility of our rooting mechanism and/or just using statepoints.

(to sweeten the deal I've thrown in improved undef ref errors)

@JeffBezanson
Copy link
Member

JeffBezanson commented Sep 22, 2016

Awesome!!

Probably a predictable request: is it possible to have the part that's just an optimization (stack allocation) first, without changing the layout of anything? That could probably be merged very quickly.

How does stack marking work in the gc?

@carnaval
Copy link
Contributor Author

carnaval commented Sep 22, 2016

Yeah I'm just afraid that it'll make things worse since everytime the immutable will go in/out of local scope it'll have to be unboxed/boxed so we may end up making more boxes than today.

The gc objects on the stack have a pointer to them in the gc frame and the special treatment in gc is done by checking if they are inside the task stack's bounds.
With statepoints we could avoid that altogether and put the alloca's sp-offset directly in the stackmap, as well as the constant tag value, approaching zero-runtime cost.

@nalimilan
Copy link
Contributor

nalimilan commented Sep 22, 2016

Am I right that this will dramatically improve the performance of Nullable? :-)

// VecElement types are unwrapped in LLVM.
addr = strct.V;
else
assert(0);

This comment has been minimized.

Copy link
@vchuravy

vchuravy Sep 22, 2016

Member

That assert should probably go away?

@yuyichao
Copy link
Contributor

yuyichao commented Sep 22, 2016

My argument for opt-in is also (local) predictability.

The question is that for given types T1 and T2, what information do we need to determine if Tuple{T1,T2} will be stored inline. I think we should make this only rely on the properties of Tuple, T1 and T2 but not their interactions. This is not the case if we treat any types in the cycle as non-inlineable. Since T2 might have a field Tuple{Int,T2}, in which case Tuple{Int,Float64} is inlined, Tuple{Float64,T2} is inlined, but Tuple{Int,T2} will not be. If we make this opt-in and mark Tuple as a inlined type, then this will be predictable with local informations (in fact, it only relies on Tuple). There will be an error if T2 is marked as inlined too during type construction time.

@quinnj
Copy link
Member

quinnj commented Nov 17, 2016

Healthy bump for 0.6 timeline; I think we'd all love to see this get in.

@vchuravy
Copy link
Member

vchuravy commented Dec 24, 2016

I would have liked to see this in 0.6 since I think this is a very valuable optimisation especially for RefValue and it is necessary on the GPU for some core features, but with feature-freeze for v0.6 in about a week this doesn't look likely.

It would be awesome to have this early on for the next release cycle!

@yuyichao
Copy link
Contributor

yuyichao commented Dec 24, 2016

This is actually independent of stack allocation of unescaped RefValue

@vtjnash
Copy link
Member

vtjnash commented Dec 24, 2016

I would have liked to see this in 0.6 since I think this is a very valuable optimisation especially for RefValue

Just thought I'd drop by to point out that this PR won't alter RefValue (that optimization is, most nearly, d189cb3). They are relatively orthogonal optimizations.

@vchuravy
Copy link
Member

vchuravy commented Dec 24, 2016

Yeah sorry mixed the two things up in my head, since this includes a version of d189cb3 with stack_new

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jul 20, 2017

This is an optimization and therefore not release blocking. Realistically, this PR is not going to be merged in this form, so we may as well close it and take it off the 1.0 milestone.

@StefanKarpinski StefanKarpinski removed this from the 1.0 milestone Jul 20, 2017
@davidanthoff
Copy link
Contributor

davidanthoff commented Jul 20, 2017

Is there a high level issue that tracks progress on this general theme? Would be good to have something open that refers to this optimization.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jul 20, 2017

cc @Keno

@vtjnash
Copy link
Member

vtjnash commented Jul 20, 2017

Would be nice to keep this open to make it easier to find, since while it won't be merged, we are likely to take many pieces from it.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jul 21, 2017

Why don't you make a "high level issue that tracks progress on this general theme" instead and link to this PR from there along with all the other relevant PRs?

@Keno
Copy link
Member

Keno commented Jan 17, 2020

@vtjnash are you ready to close this issue now that you've implemented many of the pieces?

@vtjnash vtjnash closed this Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

None yet

10 participants
You can’t perform that action at this time.