Some comments on the "Bonus" section of this article from AI Summer.
Edit: The article has been updated with my demonstration (link).
-
"When
" and "When " are inverted. -
I have no idea why the authors replace
with or when the class changes. If properly trained, the model weights should "push" the sigmoid to output 0 or 1 depending on the input . -
The proposed demonstration does not actually prove anything:
-
When
, negative class: -
When
, positive class:
-
Let's assume that we have a simple neural network with weights
The chain rule gives us the gradient of the loss
MSE loss is expressed as follows:
Thus, the gradient with respect to
We can see that
When we try with a BCE loss:
For
If the network is right and predicted the negative class,
For
If the network is right,