/ post--momentum Public

# Minor pedagogical issues with momentum post.#51

Closed
opened this issue Apr 5, 2017 · 3 comments
Closed

# Minor pedagogical issues with momentum post. #51

opened this issue Apr 5, 2017 · 3 comments

### derifatives commented Apr 5, 2017

 Claiming "up to a quadratic speedup" in one sentence and then saying "similar to the FFT speedup or QuickSort" is potentially confusing --- those both give you speedsup from n^2 to n log n, and I would not describe them as "quadratic speedup." "This gift, it seems, doesn't to come at a price. A beautiful free lunch [7] indeed." s/to come at a price/come at a price. Also I disagree with the argument: in general, keeping all the iterates comes at a huge cost in memory, and if you don't keep them, knowing when to stop is far from easy. I think there is a mistake in the interactive graph right above "Choosing A Step-Size." It says 2 is the "optimal step size", but it looks like 2 is way too big, causing the biggest eigenvalue \lambda_3 to not converge at all. The article introduces a convex quadratic, but doesn't explicitly mentions the connection between that and nonnegative eigenvalues. This could be clarified. The text was updated successfully, but these errors were encountered:

### gabgoh commented Apr 5, 2017

 Thanks for the thoughtful feedback! Since this feedback is potentially useful to future readers, I will provide a link to this in the appendix. Claiming "up to a quadratic speedup" in one sentence and then saying "similar to the FFT speedup or QuickSort" is potentially confusing --- those both give you speedsup from n^2 to n log n, and I would not describe them as "quadratic speedup." Matt Hoffman has noted this too, but I will stand by these statements. A speedup from O(n^2) to O(nlogn) is close enough to quadratic to be similar. "This gift, it seems, doesn't to come at a price. A beautiful free lunch [7] indeed." s/to come at a price/come at a price. Also I disagree with the argument: in general, keeping all the iterates comes at a huge cost in memory, and if you don't keep them, knowing when to stop is far from easy. This is a good point. Bear in mind it is not necessary to store all the iterates, just their scores on the held out set to determine where to stop. If you wish to store models, however, even if they are a few "checkpoints", they do come at cost to memory. I think there is a mistake in the interactive graph right above "Choosing A Step-Size." It says 2 is the "optimal step size", but it looks like 2 is way too big, causing the biggest eigenvalue \lambda_3 to not converge at all. The optimal step size is in fact very close to 2. It is hard to hit the optimum by dragging the slider, but if you click on the text above the arrow (or a little circle on the slider), it will snap to that point for you. The article introduces a convex quadratic, but doesn't explicitly mentions the connection between that and nonnegative eigenvalues. This could be clarified. This is true, and this discussion should serve as the clarification needed!

### colah commented Apr 5, 2017

 Thanks for the comments, Rif! It looks like there aren't any action items arising from this issue, so I'll close it. Feel free to discuss further despite that or ask me to reopen if there's anything else.

closed this as completed Apr 5, 2017

### ehsanmok commented Apr 6, 2017

 For the sake of clarity: A real, symmetric matrix has real eigenvalues and there's an assumption of positive definitness of $A$ lies behind the description of unconstrained convex quadratic optimization.