Skip to content

Conversation

@pchintalapudi
Copy link
Member

Stack allocation of small arrays can reduce GC pressure and occasionally expose additional optimization opportunities to LLVM.

@pchintalapudi
Copy link
Member Author

pchintalapudi commented Dec 28, 2021

Since my branches are part of a fork, I've created a PR comparing this one and the array removal PR immediately before it here: pchintalapudi#5

@DilumAluthge
Copy link
Member

cc: @chriselrod

@pchintalapudi pchintalapudi added compiler:codegen Generation of LLVM IR and native code performance Must go faster arrays [a, r, r, a, y, s] labels Dec 29, 2021
@pchintalapudi pchintalapudi force-pushed the pc/stack-array branch 3 times, most recently from 4843ca0 to b6cc766 Compare January 7, 2022 09:00
@pchintalapudi pchintalapudi marked this pull request as draft January 9, 2022 21:37
@pchintalapudi
Copy link
Member Author

Marking as draft because #43487, #43547, #43384, and #43548 are all still under review.

@pchintalapudi
Copy link
Member Author

I think I've squashed that bug and a few others now, and slightly upgraded the stack allocation process itself. Do you think you could try again now @chriselrod? Just as a note, arrays of references aren't currently being stack allocated because I haven't figured out how to make stack allocation play nicely with the garbage collector yet.

@chriselrod
Copy link
Contributor

I think I've squashed that bug and a few others now, and slightly upgraded the stack allocation process itself. Do you think you could try again now @chriselrod? Just as a note, arrays of references aren't currently being stack allocated because I haven't figured out how to make stack allocation play nicely with the garbage collector yet.

The vectors contain structs wrapping UInt8. They're isbits.
However, these vectors are held in structs themselves.
There was another issue where IIRC you and others said this may cause issues with the escape analysis?

I see no difference in allocations.

@N5N3
Copy link
Member

N5N3 commented Jan 24, 2022

I tried the code here to resolve TTFP in #43725.
It works well: the allocation disappeared and LLVM properly generated simd IR.
(For comparasion, MArray version failed to vectoralize.)
I only have one problem: is it a must to force inlining all the call on "small array"?
During my trial, even a copyto! will block the optimization.

@Moelf
Copy link
Contributor

Moelf commented Nov 2, 2022

This seems stalled, but it's really cool, can we push this through the goal line?

@KristofferC
Copy link
Member

This also need some examples / benchmarks to show the effect.

@vchuravy
Copy link
Member

vchuravy commented Nov 3, 2022

We have abandoned this work for now in order to focus on some other work. There were some correctness concerns around the changes in alloc-opt IIRC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrays [a, r, r, a, y, s] compiler:codegen Generation of LLVM IR and native code performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants