New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor pedagogical issues with momentum post. #51

Closed
derifatives opened this Issue Apr 5, 2017 · 3 comments

Comments

Projects
None yet
4 participants
@derifatives

derifatives commented Apr 5, 2017

Claiming "up to a quadratic speedup" in one sentence and then
saying "similar to the FFT speedup or QuickSort" is potentially
confusing --- those both give you speedsup from n^2 to n log n,
and I would not describe them as "quadratic speedup."

"This gift, it seems, doesn't to come at a price. A beautiful free lunch [7] indeed." s/to come at a price/come at a price. Also I disagree with the argument: in general, keeping all the iterates comes at a huge cost in memory, and if you don't keep them, knowing when to stop is far from easy.

I think there is a mistake in the interactive graph right above "Choosing A
Step-Size." It says 2 is the "optimal step size", but it looks
like 2 is way too big, causing the biggest eigenvalue \lambda_3
to not converge at all.

The article introduces a convex quadratic, but doesn't explicitly mentions the connection between that and nonnegative eigenvalues. This could be clarified.

@gabgoh

This comment has been minimized.

Collaborator

gabgoh commented Apr 5, 2017

Thanks for the thoughtful feedback! Since this feedback is potentially useful to future readers, I will provide a link to this in the appendix.

Claiming "up to a quadratic speedup" in one sentence and then saying "similar to the FFT speedup or QuickSort" is potentially confusing --- those both give you speedsup from n^2 to n log n, and I would not describe them as "quadratic speedup."

Matt Hoffman has noted this too, but I will stand by these statements. A speedup from O(n^2) to O(nlogn) is close enough to quadratic to be similar.

"This gift, it seems, doesn't to come at a price. A beautiful free lunch [7] indeed." s/to come at a price/come at a price. Also I disagree with the argument: in general, keeping all the iterates comes at a huge cost in memory, and if you don't keep them, knowing when to stop is far from easy.

This is a good point. Bear in mind it is not necessary to store all the iterates, just their scores on the held out set to determine where to stop. If you wish to store models, however, even if they are a few "checkpoints", they do come at cost to memory.

I think there is a mistake in the interactive graph right above "Choosing A Step-Size." It says 2 is the "optimal step size", but it looks like 2 is way too big, causing the biggest eigenvalue \lambda_3 to not converge at all.

The optimal step size is in fact very close to 2. It is hard to hit the optimum by dragging the slider, but if you click on the text above the arrow (or a little circle on the slider), it will snap to that point for you.

The article introduces a convex quadratic, but doesn't explicitly mentions the connection between that and nonnegative eigenvalues. This could be clarified.

This is true, and this discussion should serve as the clarification needed!

@colah

This comment has been minimized.

Member

colah commented Apr 5, 2017

Thanks for the comments, Rif!

It looks like there aren't any action items arising from this issue, so I'll close it. Feel free to discuss further despite that or ask me to reopen if there's anything else.

@colah colah closed this Apr 5, 2017

@ehsanmok

This comment has been minimized.

ehsanmok commented Apr 6, 2017

For the sake of clarity: A real, symmetric matrix has real eigenvalues and there's an assumption of positive definitness of $A$ lies behind the description of unconstrained convex quadratic optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment