Clarification regarding adding 1e-12 while calculating the scale #14

sayakpaul · 2021-02-23T05:23:23Z

Hi @davda54. Thank you for open-sourcing this implementation.

Wanted to know the reason behind adding 1e-12 in https://github.com/davda54/sam/blob/main/sam.py#L18?

The text was updated successfully, but these errors were encountered:

davda54 · 2021-02-23T07:12:28Z

Hi, thanks! :) A small positive number is added to the denominator for numerical stability — to avoid division by zero when grad_norm == 0.0.

sayakpaul · 2021-02-23T07:27:01Z

I see. That is what I had thought too. Thank you for confirming.

I am also assuming your e_w calculation is with respect to L2 norm as the authors assert that they get the optimal results with that?

davda54 · 2021-02-23T08:10:53Z

Exactly, I assume that p == q == 2 (similarly to the paper), which simplifies the equations.

sayakpaul closed this as completed Feb 23, 2021

davda54 added the question Further information is requested label Mar 25, 2021

Provide feedback