Minor pedagogical issues with momentum post. #51
Claiming "up to a quadratic speedup" in one sentence and then
"This gift, it seems, doesn't to come at a price. A beautiful free lunch  indeed." s/to come at a price/come at a price. Also I disagree with the argument: in general, keeping all the iterates comes at a huge cost in memory, and if you don't keep them, knowing when to stop is far from easy.
I think there is a mistake in the interactive graph right above "Choosing A
The article introduces a convex quadratic, but doesn't explicitly mentions the connection between that and nonnegative eigenvalues. This could be clarified.
The text was updated successfully, but these errors were encountered:
Thanks for the thoughtful feedback! Since this feedback is potentially useful to future readers, I will provide a link to this in the appendix.
Matt Hoffman has noted this too, but I will stand by these statements. A speedup from O(n^2) to O(nlogn) is close enough to quadratic to be similar.
This is a good point. Bear in mind it is not necessary to store all the iterates, just their scores on the held out set to determine where to stop. If you wish to store models, however, even if they are a few "checkpoints", they do come at cost to memory.
The optimal step size is in fact very close to 2. It is hard to hit the optimum by dragging the slider, but if you click on the text above the arrow (or a little circle on the slider), it will snap to that point for you.
This is true, and this discussion should serve as the clarification needed!