Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insert nn to scatter_born can not back propagation #49

Closed
acse-wz19 opened this issue Feb 13, 2023 · 31 comments
Closed

insert nn to scatter_born can not back propagation #49

acse-wz19 opened this issue Feb 13, 2023 · 31 comments

Comments

@acse-wz19
Copy link

Can we insert a neural network in to scatter_born equation?Currently i try it failed.

@ar4
Copy link
Owner

ar4 commented Feb 13, 2023 via email

@acse-wz19
Copy link
Author

I want to insert a unet into scalar_born function as the parameters("scatter"), but it seems failed. I have tested the unet, the unet is correct.

@acse-wz19
Copy link
Author

it shows that the parameters need to do autograd have been changed by the inplace operation

@acse-wz19
Copy link
Author

this is the code I use to cooperate the deepwave

@acse-wz19
Copy link
Author

do you have any idea of this?😭

@ar4
Copy link
Owner

ar4 commented Feb 13, 2023 via email

@acse-wz19
Copy link
Author

I have done what you suggest but still have a problem. Showing like this:
RuntimeError Traceback (most recent call last)
in
47 )
48 epoch_loss += loss.item()
---> 49 loss.backward()
50 # loss.backward(retain_graph=True)
51 optimiser.step()

1 frames
/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
195 # some Python versions print out the first line of a multi-line function
196 # calls in the traceback and some print out the last line
--> 197 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
198 tensors, grad_tensors
, retain_graph, create_graph, inputs,
199 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

@acse-wz19
Copy link
Author

if I change back to loss.backward(retain_graph=True), it will generate another problem like this:

RuntimeError Traceback (most recent call last)
in
48 epoch_loss += loss.item()
49 # loss.backward()
---> 50 loss.backward(retain_graph=True)
51 optimiser.step()
52 # scatter.detach()

1 frames
/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
195 # some Python versions print out the first line of a multi-line function
196 # calls in the traceback and some print out the last line
--> 197 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
198 tensors, grad_tensors
, retain_graph, create_graph, inputs,
199 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 64, 1, 1]] is at version 6; expected version 5 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)

@acse-wz19
Copy link
Author

what I want to do is use nn to train the scatter, just what you said. But it always comes out this problem, so I can not update the parameters inside the nn.

@ar4
Copy link
Owner

ar4 commented Feb 13, 2023 via email

@ar4
Copy link
Owner

ar4 commented Feb 13, 2023 via email

@ar4
Copy link
Owner

ar4 commented Feb 13, 2023 via email

@acse-wz19
Copy link
Author

Yes! it can run now. But a quick question, why do we need to update each batch rather than each epoch? I read the example in deepwave documents, they are update each epoch

@acse-wz19
Copy link
Author

If we also update the background velocity using nn, will this work? do we need to set scatter.requires_grad = False? or they can update simultaneously

@ar4
Copy link
Owner

ar4 commented Feb 13, 2023

That is good news.

If you only wish to run your network and modify its parameters once each epoch, rather than each batch, then I think your code can be modified to achieve that. It might reduce runtime (by running your network less frequently) and make updates more stable (as they will be based on the gradients from an entire epoch rather than just one batch), but will probably also make convergence take longer as your model will be updated much less frequently, and will lose the randomness benefit of small batch updates. If you wish to do it, then something like this might work:

for epoch in range(n_epochs):
    epoch_loss = 0
    scatter = net(v_mig1.unsqueeze(0).unsqueeze(0))
    scatter1 = scatter.detach().squeeze(0).squeeze(0)
    scatter1.requires_grad_()
    optimiser1 = torch.optim.SGD([scatter1], lr=1)
    optimiser1.zero_grad()
    for batch in range(n_batch):
        batch_start = batch * n_shots_per_batch
        batch_end = min(batch_start + n_shots_per_batch, n_shots)
        if batch_end <= batch_start:
            continue
        s = slice(batch_start, batch_end)
        out = scalar_born(v_mig1, scatter1, dx, dt,
                          source_amplitudes=source_amplitudes[s],
                          source_locations=source_locations[s],
                          receiver_locations=receiver_locations[s],
                          pml_freq=freq)
        loss = (1e9 * loss_fn(out[-1] * mask[s],
                              observed_scatter_masked[s]))
        epoch_loss += loss.item()
        loss.backward()
    optimiser1.step()  # update scatter1
    scatter1 = scatter1.detach().unsqueeze(0).unsqueeze(0)
    # train net to produce scatter1
    for it in range(n_its):
        optimiser.zero_grad()
        scatter = net(v_mig1.unsqueeze(0).unsqueeze(0))
        loss = loss_fn(scatter, scatter1)
        loss.backward()
        optimiser.step()

There may be other, perhaps more elegant, ways. This one separates the estimation of the scattering model each epoch from running scalar_born. Within the loop over batches the gradients will only flow back to scatter1 - they will not flow back into your network - so the intermediate states of your network that were saved during the forward pass for the backward pass are not used inside this loop (and we avoid the problem of them being freed when loss.backward() is called, which caused your issue). After the loop over batches we then compare scatter (produced by your network) with scatter1 (updated after an epoch of Deepwave) and update your network parameter to try to match it.

In most cases the cost of running your neural network will be insignificant compared to the cost of running Deepwave, however, and so calling your neural network each batch (the way you currently do in your working code) will not substantially affect runtime (and avoids the complications in my code above of having multiple optimisers, etc.). You can still only update its parameters every epoch, rather than every batch, if you wish, by moving optimiser.zero_grad() and optimiser.step() outside the loop over batches. This will cause gradients to accumulate over batches and the accumulated update will only be applied to your network model parameters once per epoch.

A learning rate of 1 (combined with the large scaling applied to loss) might be appropriate for SGD when the velocity/scattering models are being updated directly, but when the parameters being updated are those of a neural network, then more traditional learning rates might be appropriate - you should check the amplitudes of the gradients of your network's parameters to see if they are reasonable.

Regarding your second question, yes, you can also update velocity (and also source amplitude, if you wish) simultaneously. You will need to set requires_grad=True for it, and add it to the list of parameters being updated by your optimiser. I suggest that if you do invert for velocity, then you use a method such as one of the ones discussed in the Deepwave examples, for limiting the potential ranges of velocities.

@acse-wz19
Copy link
Author

Thanks! That's quite useful, I will try as you suggest!!!

@acse-wz19
Copy link
Author

Hi,
if I just update the background velocity model and use scalar_born function, the loss is unchanged.
Actually, the background velocity heavily affects the RTM imaging performance, and it will also affect the scatter wavefields. The phenomenon of loss unchanged is quite confusing me if we input different background velocities.

@acse-wz19
Copy link
Author

Can you explain it a little bit?

@ar4
Copy link
Owner

ar4 commented Feb 13, 2023

Can you show me the code you used to conclude that the loss is unchanged when you update the background velocity model?

@ar4
Copy link
Owner

ar4 commented Feb 13, 2023

And do you mean that the loss didn't change when you used completely different velocity models, or only that it didn't change over iterations when you were inverting for the velocity model? If the latter, have you checked that the velocity model actually changed over iterations?

@acse-wz19
Copy link
Author

I use scalar_born to update the net, the net will generate velocity, I feed this velocity into the scalar_born. the loss for each epoch is not changed.like this

image

@acse-wz19
Copy link
Author

You mean the loss is unchanged but the velocity has already updated?

@ar4
Copy link
Owner

ar4 commented Feb 14, 2023 via email

@acse-wz19
Copy link
Author

acse-wz19 commented Feb 14, 2023 via email

@acse-wz19
Copy link
Author

acse-wz19 commented Feb 14, 2023 via email

@ar4
Copy link
Owner

ar4 commented Feb 14, 2023 via email

@acse-wz19
Copy link
Author

acse-wz19 commented Feb 15, 2023 via email

@ar4
Copy link
Owner

ar4 commented Feb 15, 2023 via email

@acse-wz19
Copy link
Author

acse-wz19 commented Feb 15, 2023 via email

@ar4
Copy link
Owner

ar4 commented Feb 23, 2023

Hi Weilin,

I am closing this issue, but please feel free to reopen it, or to create a new issue, if you have any more problems or questions.

@ar4 ar4 closed this as completed Feb 23, 2023
@acse-wz19
Copy link
Author

acse-wz19 commented Feb 23, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants