No learning rate annealing during training. #3

ylfzr · 2018-05-26T09:17:57Z

Hi Chunyuan,

Thanks for sharing. I found that in the 2-D simulation experiment the learning rate(injected Gaussian noise level) is kept constant, which doesn't satisfy the assumption 1 in your AAAI '16 paper. While in previous works e.g. Welling 2011, I found a polynomial decay scheme is applied. Will this be a problem?

ChunyuanLI · 2018-05-27T00:58:15Z

In theory, the learning rate decay guarantees the asymptotic convergence. In practice, its implementation varies case by case.

Depending on your purpose, the answer can be different.
(1) To validate the theory, or pursue better performance, the decay scheme may help.
(2) To see the performance difference between SGLD and pSGLD, I don't think the decay scheme matters much, to see pSGLD algorithm gives better samples by adjusting the learning rate. One experiment you can try is to compare the two algorithms with the same learning rate decay. That said, perhaps more rigorous comparison is to use the same learning rate decay. I wish I have done it at the first place.

ylfzr · 2018-05-27T01:09:35Z

I tried pSGLD on the first experiment in [Welling 2011], which has 2 modes in the posterior and the two variables are strongly negatively correlated. I get proper posterior samples with similar learning rate annealing scheme.
Thanks.

ylfzr closed this as completed Nov 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No learning rate annealing during training. #3

No learning rate annealing during training. #3

ylfzr commented May 26, 2018

ChunyuanLI commented May 27, 2018

ylfzr commented May 27, 2018

No learning rate annealing during training. #3

No learning rate annealing during training. #3

Comments

ylfzr commented May 26, 2018

ChunyuanLI commented May 27, 2018

ylfzr commented May 27, 2018