New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor pedagogical issues with momentum post. #51
Comments
Thanks for the thoughtful feedback! Since this feedback is potentially useful to future readers, I will provide a link to this in the appendix.
Matt Hoffman has noted this too, but I will stand by these statements. A speedup from O(n^2) to O(nlogn) is close enough to quadratic to be similar.
This is a good point. Bear in mind it is not necessary to store all the iterates, just their scores on the held out set to determine where to stop. If you wish to store models, however, even if they are a few "checkpoints", they do come at cost to memory.
The optimal step size is in fact very close to 2. It is hard to hit the optimum by dragging the slider, but if you click on the text above the arrow (or a little circle on the slider), it will snap to that point for you.
This is true, and this discussion should serve as the clarification needed! |
Thanks for the comments, Rif! It looks like there aren't any action items arising from this issue, so I'll close it. Feel free to discuss further despite that or ask me to reopen if there's anything else. |
For the sake of clarity: A real, symmetric matrix has real eigenvalues and there's an assumption of positive definitness of |
Claiming "up to a quadratic speedup" in one sentence and then
saying "similar to the FFT speedup or QuickSort" is potentially
confusing --- those both give you speedsup from n^2 to n log n,
and I would not describe them as "quadratic speedup."
"This gift, it seems, doesn't to come at a price. A beautiful free lunch [7] indeed." s/to come at a price/come at a price. Also I disagree with the argument: in general, keeping all the iterates comes at a huge cost in memory, and if you don't keep them, knowing when to stop is far from easy.
I think there is a mistake in the interactive graph right above "Choosing A
Step-Size." It says 2 is the "optimal step size", but it looks
like 2 is way too big, causing the biggest eigenvalue \lambda_3
to not converge at all.
The article introduces a convex quadratic, but doesn't explicitly mentions the connection between that and nonnegative eigenvalues. This could be clarified.
The text was updated successfully, but these errors were encountered: