-
-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Hamiltonian Neural Networks #370
Conversation
Zygote seems to complain about mutating arrays when backpropagating through the gradient of the NN. Here's a small repro: using Flux, Zygote
model = Chain(
Dense(2, 10),
Dense(10, 1)
)
forward(m, x) = gradient(x -> m(x)[1], x)[1]
x = rand(2, 1)
gradient(() -> sum(forward(model, x) - x), Flux.params(model)) Is this to be done in some other way? |
Nesting Zygote is an issue. I think you need to use ReverseDiff over Zygote, or Zygote over ReverseDiff. |
Yes, ReverseDiff over Zygote does solve this issue. |
For training HNNs the standard way (torchdyn, original paper) seems to be to supervise on the gradients of the network, rather than on the solution from the ODE Solver. I was wondering if we should have a |
Yeah, that would make sense. It would then match well with our collocation tools: https://diffeqflux.sciml.ai/dev/examples/collocation/ We should also do a non-smoothed collocation method via DataInterpolations, and demonstrate that on this HamiltonianNN |
Also, just a note, some of the models from the HNN paper are impossible to fit directly from the solve on the time-series. Specifically the double pendulum, since it's chaotic so the adjoints diverge exponentially fast. |
ReverseDiff seems to bug out on GPU with hnn = HamiltonianNN(
Chain(Dense(2, 64, relu), Dense(64, 1)) |> gpu
)
p = hnn.p;
hnn(rand(2, 1) |> gpu)
ReverseDiff.gradient(p -> sum(hnn(rand(2, 1) |> gpu, p)), p) |
Yes it does. I think we'll need to just skip that for now, open an issue, and grab what we can get. Keno's AD should fix this. |
Looks great! 🎉 |
@ChrisRackauckas could you elaborate a bit on this? |
Essentially one way to perform this fitting is to fit spliens like a cublic spline from DataInterpolations, and then take the derivative of the spline to give you estimated derivatives, and then perform the fit directly against those derivative approximations. It would be nice to have this version under the same interface as the smoothed one, which is essentially the same idea except using regression splines under some smoothing kernel to get derivatives of very noisy and dense data. |
….jl into ap/hamiltonian
Fixes #53
Fixes #357
Fixes #80 when this PR is used together with #355
TODOs:
Working on GPU?Will be fixed with Keno's AD Add support for Hamiltonian Neural Networks #370 (comment)HNN layerNeural Hamiltonian DE Layer