Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure makes a copy #146

Open
linusheck opened this issue Apr 24, 2023 · 4 comments · May be fixed by #165
Open

Restructure makes a copy #146

linusheck opened this issue Apr 24, 2023 · 4 comments · May be fixed by #165
Labels
enhancement New feature or request

Comments

@linusheck
Copy link

linusheck commented Apr 24, 2023

In some situations, you have to restructure a lot if you use Flux, for instance if you want to run your batches as seperate solves in DiffEqFlux using an EnsembleProblem. You have to use something like a ComponentArray to pass the parameters through the solver and to let the adjoint methods do their work in differentiating the solve. But restructuring using a ComponentArray is unreasonably(?) slow in Flux. Switching to Lux eliminates those problems, but it seems like something that could be implemented better in Flux or ComponentArrays.

Example code:

using Flux, Profile, PProf, Random, ComponentArrays
layer_size = 256
layer = Flux.Dense(layer_size => layer_size)
params_, re = Flux.destructure(layer)
params = ComponentArray(params_)
function eval(steps, input)
	vec = input
	for i in 1:steps
		vec = re(params)(vec)
	end
end
Profile.clear()
@profile eval(100000, rand(Float32, layer_size))
pprof(; web=true)

Here, we spend 10% of the time in sgemv matrix multiplication, another 10% in the rest of the Dense call and about 75% in the Restructure. This gets worse if the networks are smaller. As far as I can read the flame graph, the restructure seems to spend a lot of time in the GC:

0d19ec3a732aaf409e85b06ebb4ebe6392198e20_2_1044x1000

Could there be a way to mitigate this specific problem? In particular if you use the same parameters. I think this would make some example code a lot faster too.

I also opened a discourse about this because I'm not sure if it's an issue with Flux specifically: https://discourse.julialang.org/t/flux-restructure-for-componentarrays-jl-unreasonably-slow/97849/3

@mcabbott
Copy link
Member

I think the story here is that Restructure makes a new copy of the model, which allocates, and this costs about 30μs:

julia> @btime $re($params);  # This is the reconstruction cost
  min 11.125 μs, mean 29.891 μs (24 allocations, 257.95 KiB)

julia> @btime copy($params);  # ... and it's mostly allocation, same mean:
  min 6.007 μs, mean 29.209 μs (2 allocations, 257.05 KiB)

Normally the allocation of the forward pass dwarfs this, but if you don't make batches and do this for every sample, then the model allocation dominates.

julia> input = rand(Float32, layer_size);  # As above, batch size of 1

julia> @btime $layer($input);  # ... hence alloc much smaller here, unusual.
  min 6.800 μs, mean 8.970 μs (2 allocations, 2.12 KiB)

julia> @btime $re($params)($input);  # Complete, as above
  min 20.458 μs, mean 51.587 μs (26 allocations, 260.08 KiB)

This isn't a regime anyone got around to optimising. It would probably be easy to make Restructure either (1) copy into the existing model, or (2) create a new model out of views not copies.

Note that ComponentArrays is doing nothing here, params isa Vector. I think you should think of ComponentArrays as another idea in the same direction as destructure, "instead" not "and". They differ in lots of ways, of course.

Note also that allocations in the gradient will typically be larger than in the forward pass. When last I checked, destructure does this in a more efficient way than ComponentArrays, which made a copy of the full parameter vector repeatedly. This will matter (for larger models) in exactly the regime you have selected.

@linusheck
Copy link
Author

linusheck commented Apr 24, 2023

Thanks for your thorough reply! Good to know that ComponentArrays isn't doing anything. The offending line seems to be this one:

ProjectTo(y)(reshape(flat[o .+ (1:length(y))], axes(y))) # ProjectTo is just correcting eltypes

I wonder if just using ProjectTo a bit smarter might fix the allocations and use views

@linusheck linusheck changed the title Restructure for ComponentArrays.jl unreasonably slow Restructure unreasonably slow Apr 24, 2023
@mcabbott mcabbott changed the title Restructure unreasonably slow Restructure makes a copy Apr 24, 2023
@mcabbott
Copy link
Member

Changing that line to use a view would probably work fine. ProjectTo won't mind, but I'm not sure whether all operations downstream will be happy with a reshaped view (e.g. will CuArray dispatch work?)

Maybe the bigger question is what the interface should be, e.g. whether this should be a keyword option.

Or whether it should be a separate destructure! which copies into the model. That may solve both questions.

@mcabbott mcabbott transferred this issue from FluxML/Flux.jl May 1, 2023
@mcabbott mcabbott added the enhancement New feature or request label Oct 9, 2023
@mcabbott mcabbott linked a pull request Nov 2, 2023 that will close this issue
@CarloLucibello
Copy link
Member

I think we should have a copyless alternative to restructure, possible name ps, re = trainable_components(x), that returns a ComponentArray ps and a reconstruction function re(ps) == x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants