I found that the flag dataset has been preprocessed, specifically, the norm of all flags is almost the same.
np.linalg.norm(flags[c]) == 97.97
This makes the softmax update rule almost equivalent to weighted summation based on cosine similarity.
Perhaps a more convincing dataset should be provided.
Finally, thank you very much for creating this tutorial.
I found that the flag dataset has been preprocessed, specifically, the norm of all flags is almost the same.
np.linalg.norm(flags[c]) == 97.97This makes the softmax update rule almost equivalent to weighted summation based on cosine similarity.
Perhaps a more convincing dataset should be provided.
Finally, thank you very much for creating this tutorial.