Trajectory optimization not stable #108

yongxf · 2018-08-03T04:31:03Z

Hi there,

Thanks for your excellent code. I am running your code using my own Mujoco model to do peg hole insertion with algorithm_traj_opt only, (No neural net yet). It seems the first 15 iterations is okay and the trajectory is converging.
However, things suddenly become worse after that. The Laplace estimation of the improvement produces a very large value, so the new eta grows very fast. Then the program crushes since Non-PD error happens.

I checked the iLQR paper. It seems there is no Laplace estimation. And the Qtt (combination of Qxx, Qxu, Quu) has a very different form with the equation you wrote in traj_opt_lqr_python.py. The iLQR paper I read is this: https://homes.cs.washington.edu/~todorov/papers/TassaIROS12.pdf

Can you let me know the paper of Laplace estimation implementation and the implementation paper of the iLQR you referred? Appreciate it!

yongxf · 2018-08-04T20:38:04Z

The instability of iLQR comes from the eta update in iLQR.
The eta penalizes one of the KL divergence in iLQR, and is tuned by comparing kl_div with kl_step.
The problem is:

when mc cost increases ==> new_mult < 1 ==> step decreases (since actual improvement becomes much smaller than predicted improvement and algorithm tries to reduce the step size)
step decreases ==> con > 0 (since kl_step = step * kl_base, thus the theoretical bound becomes more strict. You refered kl_step in the code as epsilon, which is not correct since epsilon controls the other KL divergence term)
con > 0 ==> eta increases (since more strict constraint on kl divergence makes current kl divergence violated the constraint, so more penalty will be added (i.k.i eta increases)

In summary:
when actual cost increases ==> penalization of KL divergence increases.

This is not reasonable, since more effort on KL divergence term will make the loss term becomes even more larger. After several iterations, the robot waived crazy.

The first several iterations is normal though. I guess the scaling of the improvement in new_multi calculation matters.

Any comment on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trajectory optimization not stable #108

Trajectory optimization not stable #108

yongxf commented Aug 3, 2018 •

edited

Loading

yongxf commented Aug 4, 2018

Trajectory optimization not stable #108

Trajectory optimization not stable #108

Comments

yongxf commented Aug 3, 2018 • edited Loading

yongxf commented Aug 4, 2018

yongxf commented Aug 3, 2018 •

edited

Loading