-
-
Notifications
You must be signed in to change notification settings - Fork 617
Closed
Labels
Description
Package Version
Status ~/.julia/environments/v1.8/Project.toml [587475ba] Flux v0.13.8 https://github.com/FluxML/Flux.jl.git#master
Julia Version
julia version 1.8.2
OS / Environment
OS: Arch Linux x86_64
Kernel: 6.0.6-arch1-1
Describe the bug
Flux appears to use an increasing amount of memory when it's reconstruction function and Adam optimizer is used with Distributed.jl.
Steps to Reproduce
Warning: This will quickly eat up your ram
I've identified four conditions must be met in order for this leak to occur:
- Must be using Distributed
- Must be using an optimizer
- Must be running update! on a worker
- Must include reconstruction on a worker
using Distributed
addprocs(1)
@everywhere begin
using Flux
opt = Adam()
theta, re = Chain(Dense(1000 => 1000, tanh)) |> Flux.destructure
end
println("Beginning loop")
for i in 1:1000
println(i)
# Note: bug doesn't occur if this remotecall is on thread 1
fetch(remotecall(2) do
# Note: bug doesn't occur if you remove below line
re(theta)
1
end)
@everywhere begin
grad = theta * 0.01
# Note: bug doesn't occur if you switch out update! with this line
#theta .+= grad
Flux.Optimise.update!(opt, theta, grad)
end
endExpected Results
This bit of code should run to completion, using a relatively consistent amount of memory during each iteration of the loop, as nothing is saved during each loop.
Observed Results
Memory usage quickly increases indefinitely
Relevant log output
No response