Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Deep Galerkin method #802

Merged
merged 26 commits into from
Mar 4, 2024
Merged

add Deep Galerkin method #802

merged 26 commits into from
Mar 4, 2024

Conversation

ayushinav
Copy link
Contributor

@ayushinav ayushinav commented Feb 8, 2024

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

Additional context

Adding the Deep Galerkin method described here and fixes #220. Not sure how to add further doc updates.

src/deep_galerkin.jl Outdated Show resolved Hide resolved
src/deep_galerkin.jl Outdated Show resolved Hide resolved
test/other_algs_test.jl Outdated Show resolved Hide resolved
src/deep_galerkin.jl Outdated Show resolved Hide resolved
src/deep_galerkin.jl Outdated Show resolved Hide resolved
src/deep_galerkin.jl Outdated Show resolved Hide resolved
test/other_algs_test.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
@sathvikbhagavan
Copy link
Member

@ayushinav can you add the test group in https://github.com/SciML/NeuralPDE.jl/blob/master/.github/workflows/CI.yml#L27 to actually run the tests and @ChrisRackauckas can you enable GHA workflows in the PR? (I guess once after @ayushinav pushes his latest changes especially adding the test group)

@ayushinav
Copy link
Contributor Author

@ChrisRackauckas and @sathvikbhagavan, I guess it successfully ran the tests. Please let me know if anything else needs to be added for this.

@ChrisRackauckas
Copy link
Member

This is missing docs but I think it's good to go.

@ayushinav
Copy link
Contributor Author

This is missing docs but I think it's good to go.

I thought the docs refer to >? in REPL, but I realize it might be something in the tutorials/ examples?

@sathvikbhagavan
Copy link
Member

I thought the docs refer to >? in REPL, but I realize it might be something in the tutorials/ examples?

Yes, we should add a tutorial for it.

@ayushinav
Copy link
Contributor Author

@ayushinav, one of the DGM tests is failing. Can you look into that?

Hopefully, the last commit will get it. It passes a better test on my system.

@ayushinav
Copy link
Contributor Author

@ChrisRackauckas looks like it's all good

@ChrisRackauckas
Copy link
Member

2 hour and 46 minute test suite: is going that far required for convergence?

@ayushinav
Copy link
Contributor Author

2 hour and 46 minute test suite: is going that far required for convergence?

We have 3 examples for testing. With as many logical configurations I could try, these seemed to work the best. I can increase the tolerance threshold to pass the test but that doesn't seem to me the best idea.

@ayushinav
Copy link
Contributor Author

Solving PDEs using Deep Galerkin Method

Overview

Deep Galerkin Method is a meshless deep learning algorithm to solve high dimensional PDEs. The algorithm does so by approximating the solution of a PDE with a neural network. The loss function of the network is defined in the similar spirit as PINNs, composed of PDE loss and boundary condition loss.

Since, the cost functions can be computationally intenstive to calculate, the algorithm does so by randomly sampling points and training on them, like stochastic gradient descent.

Algorithm

The authors of DGM suggest a network composed of LSTM-type layers that works well for most of the parabolic and quasi-parabolic PDEs.

$$\begin{align*} S^1 &= \sigma_1(W^1 \vec{x} + b^1); \\\ Z^l &= \sigma_1(U^{z,l} \vec{x} + W^{z,l} S^l + b^{z,l}); \quad l = 1, \ldots, L; \\\ G^l &= \sigma_1(U^{g,l} \vec{x} + W^{g,l} S_l + b^{g,l}); \quad l = 1, \ldots, L; \\\ R^l &= \sigma_1(U^{r,l} \vec{x} + W^{r,l} S^l + b^{r,l}); \quad l = 1, \ldots, L; \\\ H^l &= \sigma_2(U^{h,l} \vec{x} + W^{h,l}(S^l \cdot R^l) + b^{h,l}); \quad l = 1, \ldots, L; \\\ S^{l+1} &= (1 - G^l) \cdot H^l + Z^l \cdot S^{l}; \quad l = 1, \ldots, L; \\\ f(t, x; \theta) &= \sigma_{out}(W S^{L+1} + b). \end{align*}$$

where $\vec{x}$ is the concatenated vector of $(t, x)$ and $L$ is the number of LSTM type layers in the network.

Example

Let's try to solve the following Burger's equation using Deep Galerkin Method for $\alpha = 0.05$ and compare our solution with the finite difference method:

$$ \partial_t u(t, x) + u(t, x) \partial_x u(t, x) - \alpha \partial_{xx} u(t, x) = 0 $$

defined over

$$ t \in [0, 1], x \in [-1, 1] $$

with boundary conditions

$$\begin{align*} u(t, x) & = - sin(πx), \\\ u(t, -1) & = 0, \\\ u(t, 1) & = 0 \end{align*}$$

Copy- Pasteable code

using NeuralPDE
using ModelingToolkit, Optimization, OptimizationOptimisers
import Lux: tanh, identity
using Distributions
import ModelingToolkit: Interval, infimum, supremum
using MethodOfLines, OrdinaryDiffEq

@parameters x t
@variables u(..)

Dt= Differential(t)
Dx= Differential(x)
Dxx= Dx^2
α = 0.05;
# Burger's equation
eq= Dt(u(t,x)) + u(t,x) * Dx(u(t,x)) - α * Dxx(u(t,x)) ~ 0 

# boundary conditions
bcs= [
    u(0.0, x) ~ - sin*x),
    u(t, -1.0) ~ 0.0,
    u(t, 1.0) ~ 0.0
]

domains = [t  Interval(0.0, 1.0), x  Interval(-1.0, 1.0)]

# MethodOfLines, for FD solution
dx= 0.01
order = 2
discretization = MOLFiniteDifference([x => dx], t, saveat = 0.01)
@named pde_system = PDESystem(eq, bcs, domains, [t, x], [u(t,x)])
prob = discretize(pde_system, discretization)
sol= solve(prob, Tsit5())
ts = sol[t]
xs = sol[x] 

u_MOL = sol[u(t,x)]

# NeuralPDE, using Deep Galerkin Method
strategy = QuasiRandomTraining(4_000, minibatch= 500);
discretization= DeepGalerkin(2, 1, 50, 5, tanh, tanh, identity, strategy);
@named pde_system = PDESystem(eq, bcs, domains, [t, x], [u(t,x)]);
prob = discretize(pde_system, discretization);
global iter = 0;
callback = function (p, l)
    global iter += 1;
    if iter%20 == 0
        println("$iter => $l")
    end
    return false
end

res = Optimization.solve(prob, Adam(0.01); callback = callback, maxiters = 300);
phi = discretization.phi;

u_predict= [first(phi([t, x], res.minimizer)) for t in ts, x in xs]

diff_u = abs.(u_predict .- u_MOL);

using Plots
p1 = plot(tgrid, xgrid, u_MOL', linetype = :contourf, title = "FD");
p2 = plot(tgrid, xgrid, u_predict', linetype = :contourf, title = "predict");
p3 = plot(tgrid, xgrid, diff_u', linetype = :contourf, title = "error");
plot(p1, p2, p3)

burgers

@ChrisRackauckas
Copy link
Member

Since, the cost functions can be computationally intenstive to calculate, the algorithm does so by randomly sampling points and training on them, like stochastic gradient descent.

That's not entirely true. With quadrature it's not random. Delete that and I think this is good to go.

@ayushinav
Copy link
Contributor Author

ayushinav commented Mar 4, 2024

Since, the cost functions can be computationally intenstive to calculate, the algorithm does so by randomly sampling points and training on them, like stochastic gradient descent.

That's not entirely true. With quadrature it's not random. Delete that and I think this is good to go.

I agree that it's not entirely true for quadrature in general and in NeuralPDE.jl, we have the capability for different quadrature strategies as well, but for DGM, that's what they say. Here's the snippet from the paper, section 2.
image

Maybe I understood something wrong but wanted to clarify before going ahead.

@ChrisRackauckas
Copy link
Member

That's true in their paper but not with the implementation you have in the library.

@ayushinav
Copy link
Contributor Author

That's true in their paper but not with the implementation you have in the library.

I see. I thought using QuasiRandomTraining does the random sampling they do, but agree a user won't necessarily do that.
I'll push the changes for now and try to understand what's going on. Thanks!

@ChrisRackauckas
Copy link
Member

Then say, "In this instance we will demonstrate training using Quasi-Random Sampling, a technique that ...."

@ayushinav
Copy link
Contributor Author

Then say, "In this instance we will demonstrate training using Quasi-Random Sampling, a technique that ...."

Got it. That helps!

@ayushinav ayushinav reopened this Mar 4, 2024
@ayushinav ayushinav closed this Mar 4, 2024
@ayushinav ayushinav reopened this Mar 4, 2024
@ChrisRackauckas
Copy link
Member

Let's merge, but please follow up with some test cuts. I think we can cut the quasi-random points from 4000 😅

@ChrisRackauckas ChrisRackauckas merged commit a20efb6 into SciML:master Mar 4, 2024
15 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deep Galerkin Method
3 participants