Why do we have detach() in self.alphas = alphas.detach() in Attention class in Chapter 9. #23

eisthf · 2021-07-26T09:18:21Z

I wonder why alphas is "detach()"ed and before saved to self.alphas in Attention class. I tried self.alphas = alphas, that is, without detach and trained the model. There is no difference in performance. So I believe the reason is in something else.

Thank you for your great teaching in your great book!

dvgodoy · 2021-07-26T20:17:19Z

Hi,

Thank you for supporting my work, and for your kind words :-)

Regarding the "detachment" of the alphas, the main idea is to prevent unintentional changes to the dynamic computation graph.
If you don't detach the alphas, it shouldn't change anything in the training process, as you already noticed.

But let's say you pause training, and decides to take a peek at the alphas. You may end-up performing an operation on them, and, since the graph keeps track of every operation performed on gradient-requiring tensors and its dependencies, it will impact the graph. That may be an issue if you resume training afterward.

In other circumstances, like the validation loop, we wrap the operations with a no_grad context manager to prevent potential problems.
The same goes for the detachment of the alphas - it's there as a safeguard, to make sure that it's totally safe to play with the values in self.alphas. It's also convenient, because you'd need to detach them anyway if you wanted to make the alphas Numpy arrays.

Hope it helps :-)

eisthf · 2021-07-26T23:51:05Z

Thank you so much! :-))

eisthf closed this as completed Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we have detach() in self.alphas = alphas.detach() in Attention class in Chapter 9. #23

Why do we have detach() in self.alphas = alphas.detach() in Attention class in Chapter 9. #23

eisthf commented Jul 26, 2021

dvgodoy commented Jul 26, 2021

eisthf commented Jul 26, 2021

Why do we have detach() in self.alphas = alphas.detach() in Attention class in Chapter 9. #23

Why do we have detach() in self.alphas = alphas.detach() in Attention class in Chapter 9. #23

Comments

eisthf commented Jul 26, 2021

dvgodoy commented Jul 26, 2021

eisthf commented Jul 26, 2021