-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TapedTask
type annotations to storage[:tapedtask]
#121
Conversation
This reduces the number of allocations for ```julia @time callback !== nothing && callback() ``` inside `(tf::TapedFunction)(args...; callback=nothing)` from 1 allocation: 48 bytes to 0 allocations.
Why not? If there are other additional optimizations, we can register another version with them. We can register as many versions as we want, and according to ColPrac we even should make a new release as soon as new bugfixes or features are merged into the main branch (it's a bit less clear if only tests are updated). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'm curious did you check if a function barrier would fix these cases as well?
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Would you prefer function barriers if they fix these cases? And if yes, could you tell me what's the benefit here? I only learned today about function barriers so they have to sink in a bit more |
And to answer your question: nope. I don't mind trying it out. But, in that case, could you give an example of how I can rewrite one of the changes to a function barrier? 🙂 |
The advantage of a function barrier is that you don't have to reason about types and can write code more generically and don't limit the implementation to the annotated type. Additionally, the part of the function that you move to a separate function will be compiled for the types of whatever it is called with and hence (potentially) Julia can generate more efficient code than if there is an abstract type annotation. |
I've tested it at both. For the first one, it also results in 0 allocations when writing: function producer_core(ttask)
if length(ttask.produced_val) > 0
val = pop!(ttask.produced_val)
put!(ttask.produce_ch, val)
take!(ttask.consume_ch) # wait for next consumer
end
return nothing
end
producer() = producer_core(current_task().storage[:tapedtask]) instead of having one method and a type annotation. For the other one, it results in 2 allocations: function produce_core(val, ttask)
length(ttask.produced_val) > 1 &&
error("There is a produced value which is not consumed.")
push!(ttask.produced_val, val)
return nothing
end
function produce(val)
is_in_tapedtask() || return nothing
@time produce_core(val, current_task().storage[:tapedtask])
end whereas function produce_core(val)
ttask = current_task().storage[:tapedtask]::TapedTask
length(ttask.produced_val) > 1 &&
error("There is a produced value which is not consumed.")
push!(ttask.produced_val, val)
return nothing
end
function produce(val)
is_in_tapedtask() || return nothing
@time produce_core(val)
end results in only 1 allocation. Also, the type annotations in this case should reduce compile time slightly because it avoids compiling one method instance, it should avoid a method table lookup and also it is easier for the reader to know that a Let me know how to continue from here to get this PR merged |
Did you compare timings, i.e. compile time and run time? I would assume that a function barrier is useful in particular if there are large optimization and hence performamce gains from knowing type parameters. |
In this case, it shouldn't matter because |
Oh somehow I assumed it had a type parameter |
I was thinking exactly the same this afternoon and have experimented with making struct TapedTask{F}
task::Task
tf::TapedFunction{F}
produce_ch::Channel{Any}
consume_ch::Channel{Int}
produced_val::Vector{Any}
function TapedTask{F}( ... )
end in the hope that it would fix inference at points where For the real performance gains, it would be great if we can get one or more structs to be stack allocated. Unfortunately, that's currently not possible at all because we can't allocate a Another thing that I noticed is that |
I think it would be good to change the type nevertheless if there is no compelling reason to keep the abstract field. Generally fields with abstract types should be avoided if possible. |
I guess it might be useful to do this in a separate PR first and then reevaluate this PR. |
Okay. Just to check. That’s a breaking change then because CTask = TapedTask is exported |
Yes, since it is exported it would be breaking. Would be good to evaluate if the alias is needed - I guess it exists for historical reasons (when there was only CTask but no TapedTask) but it seems a bit confusing without clear benefit. And additionally it's defined incorrectly it seems, Line 19 in eca834b
const CTask = TapedTask . I opened #122.
|
Many thanks @rikhuijzer! |
@rikhuijzer Did you rerun the benchmarks after the |
Now, the function barrier should be better indeed because the type hint is not concrete. I've just released only Libtask 0.7. That release doesn't contain this PR. You or I can look into changing it to a function barrier and release once that's done. |
I tested it yesterday and I couldn't find any differences in speed. My guess is that |
In all these instances, the compiler wasn't able to figure out the type in Julia 1.7.2. By adding the type annotations, the number of allocations decreased. For example, this PR reduces the number of allocations for
inside
(tf::TapedFunction)(args...; callback=nothing)
from 1 allocation (48 bytes) to 0 allocations according to@time
.I ran the same selection of the Turing tests again as in #119 with
@time
:master
this PR
So, this saves about 40 seconds and 7 GB of allocations on the Turing tests.
Let's not register a new version after merging this PR. I think that I can find some other optimization soon.