Hi, I try to use the discrete adjoint method and print the GPU memory usage during training.
memory before scheduling: 16.37 MB
Memory after scheduling: 16.38 MB
Memory after backward pass: 16.38 MB
Iter 0020 | Total Loss 0.609421
memory before scheduling: 16.48 MB
Memory after scheduling: 16.49 MB
Memory after backward pass: 16.49 MB
Iter 0040 | Total Loss 0.599637
memory before scheduling: 16.59 MB
Memory after scheduling: 16.59 MB
Memory after backward pass: 16.59 MB
Iter 0060 | Total Loss 0.530792
memory before scheduling: 16.70 MB
Memory after scheduling: 16.70 MB
Memory after backward pass: 16.70 MB
Iter 0080 | Total Loss 0.893818
memory before scheduling: 16.80 MB
Memory after scheduling: 16.81 MB
I add those lines in the code:
memory_before_scheduling = show_net_dyn_memory_usage()
pred_y = ode.odeint_adjoint(batch_y0, batch_t)
# memory_before_scheduling = show_net_dyn_memory_usage()
# pred_y = scheduling(epsilon=0)
# memory_after_scheduling = show_net_dyn_memory_usage()
loss = torch.mean(torch.abs(pred_y - batch_y))
memory_after_scheduling = show_net_dyn_memory_usage()
loss.backward()
memory_after_backward = show_net_dyn_memory_usage()
# memory_after_backward = show_net_dyn_memory_usage()
optimizer.step()
The GPU memory increases and the Memory after the backward is not the same as the memory before scheduling 16.37 MB, I think it will always be around 16.37 MB if the memory used during the backward is freed. I believe the original Pytorch backward will free the memory, so could you help me understand it?
If I make a mistake or the code implementation uses more memory,
Thanks!
Hi, I try to use the discrete adjoint method and print the GPU memory usage during training.
memory before scheduling: 16.37 MB
Memory after scheduling: 16.38 MB
Memory after backward pass: 16.38 MB
Iter 0020 | Total Loss 0.609421
memory before scheduling: 16.48 MB
Memory after scheduling: 16.49 MB
Memory after backward pass: 16.49 MB
Iter 0040 | Total Loss 0.599637
memory before scheduling: 16.59 MB
Memory after scheduling: 16.59 MB
Memory after backward pass: 16.59 MB
Iter 0060 | Total Loss 0.530792
memory before scheduling: 16.70 MB
Memory after scheduling: 16.70 MB
Memory after backward pass: 16.70 MB
Iter 0080 | Total Loss 0.893818
memory before scheduling: 16.80 MB
Memory after scheduling: 16.81 MB
I add those lines in the code:
memory_before_scheduling = show_net_dyn_memory_usage()
pred_y = ode.odeint_adjoint(batch_y0, batch_t)
# memory_before_scheduling = show_net_dyn_memory_usage()
# pred_y = scheduling(epsilon=0)
# memory_after_scheduling = show_net_dyn_memory_usage()
loss = torch.mean(torch.abs(pred_y - batch_y))
memory_after_scheduling = show_net_dyn_memory_usage()
loss.backward()
memory_after_backward = show_net_dyn_memory_usage()
# memory_after_backward = show_net_dyn_memory_usage()
optimizer.step()
The GPU memory increases and the Memory after the backward is not the same as the memory before scheduling 16.37 MB, I think it will always be around 16.37 MB if the memory used during the backward is freed. I believe the original Pytorch backward will free the memory, so could you help me understand it?
If I make a mistake or the code implementation uses more memory,
Thanks!