Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Hamiltonian Neural Networks #370

Merged
merged 22 commits into from
Jul 30, 2020

Conversation

avik-pal
Copy link
Member

@avik-pal avik-pal commented Jul 20, 2020

Fixes #53
Fixes #357
Fixes #80 when this PR is used together with #355

TODOs:

src/neural_de.jl Outdated Show resolved Hide resolved
@avik-pal
Copy link
Member Author

Zygote seems to complain about mutating arrays when backpropagating through the gradient of the NN. Here's a small repro:

using Flux, Zygote

model = Chain(
    Dense(2, 10),
    Dense(10, 1)
)

forward(m, x) = gradient(x -> m(x)[1], x)[1]

x = rand(2, 1)

gradient(() -> sum(forward(model, x) - x), Flux.params(model))

Is this to be done in some other way?

@ChrisRackauckas
Copy link
Member

Nesting Zygote is an issue. I think you need to use ReverseDiff over Zygote, or Zygote over ReverseDiff.

@avik-pal
Copy link
Member Author

Yes, ReverseDiff over Zygote does solve this issue.

@avik-pal
Copy link
Member Author

For training HNNs the standard way (torchdyn, original paper) seems to be to supervise on the gradients of the network, rather than on the solution from the ODE Solver. I was wondering if we should have a HamiltonianNN layer separate from the NeuralHamiltonianDE which simply returns the gradient and the latter just acts as a wrapper over the former to compute the solutions of the ODE.

@ChrisRackauckas
Copy link
Member

Yeah, that would make sense. It would then match well with our collocation tools:

https://diffeqflux.sciml.ai/dev/examples/collocation/
https://diffeqflux.sciml.ai/dev/Collocation/

We should also do a non-smoothed collocation method via DataInterpolations, and demonstrate that on this HamiltonianNN

@ChrisRackauckas
Copy link
Member

Also, just a note, some of the models from the HNN paper are impossible to fit directly from the solve on the time-series. Specifically the double pendulum, since it's chaotic so the adjoints diverge exponentially fast.

@avik-pal
Copy link
Member Author

avik-pal commented Jul 25, 2020

ReverseDiff seems to bug out on GPU with CuArray only supports bits types.

hnn = HamiltonianNN(
    Chain(Dense(2, 64, relu), Dense(64, 1)) |> gpu
)

p = hnn.p;

hnn(rand(2, 1) |> gpu)

ReverseDiff.gradient(p -> sum(hnn(rand(2, 1) |> gpu, p)), p)

@ChrisRackauckas
Copy link
Member

ReverseDiff seems to bug out on GPU with CuArray only supports bits types.

Yes it does. I think we'll need to just skip that for now, open an issue, and grab what we can get. Keno's AD should fix this.

@Vaibhavdixit02
Copy link
Member

Looks great! 🎉

@avik-pal
Copy link
Member Author

We should also do a non-smoothed collocation method via DataInterpolations, and demonstrate that on this HamiltonianNN

@ChrisRackauckas could you elaborate a bit on this?

@ChrisRackauckas
Copy link
Member

Essentially one way to perform this fitting is to fit spliens like a cublic spline from DataInterpolations, and then take the derivative of the spline to give you estimated derivatives, and then perform the fit directly against those derivative approximations. It would be nice to have this version under the same interface as the smoothed one, which is essentially the same idea except using regression splines under some smoothing kernel to get derivatives of very noisy and dense data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants