Why is Adam much slower than sgd? #1516
Comments
And also, Adam takes 2g memory more than ccsgd. In my case, ccsgd only takes 2.5g, but Adam takes 4.5, sometimes, near 5g. Is there any way that this can be optimised? Or is it supposed to be like this? |
ccsgd saves memory because it's implemented in C++ and doesn't allocate any temp memory. You can implement adam in cpp to get similar performance. |
The Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition compared various optimization algorithms and concluded that "in practice Adam is currently recommended as the default algorithm to use, and often works slightly better than RMSProp". It's definitely worthy to implement Adam in C++. |
AdamUpdate is now available in cpp. Closing it for now. |
Could you help me, and give me some examples of Adam's implementation in C++? I'm looking for that in MLPACK, but it's really don't clear. |
Internally, adams need two more variables for each weight. Therefore, more operations are required than Gradient descent and the learning time by sample is increased. |
I tried to use Adam to replace sgd. I did nothing but replaced the optimiser, then I found the speed dropped too much, which was confusing. Using sgd, I got 0.6 sample per sec (I use a large input for fcn). Then using Adam, I got 0.4 sample per sec. I dont know if this makes sense, since the complexity of Adam seems not that high that caused so much difference in speed. Any idea?
The text was updated successfully, but these errors were encountered: