-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndustRealSim training time and inference performance #167
Comments
Hi Konstantin, Thank you so much for your interest in IndustReal!
Please feel free to let us know if you have any other questions. Best, |
Hi Bingjie, Indeed, the problem that Konstantin mentions happened to us as well. Before the fix you suggested, we get a 0% insertion success rate (no successes at all during training) and after that about 26% (modern GPU, about 10hours of training). Is this supposed to be happening or there is any discrepancy? Best, P.S. When the results are logged in WandB, you have measured insertion_successes/rewards over time/frame/iter, but your reported WandB loggings have all the same time scale. Shouldn't all they be separate or am I missing something here? |
Hi Isidoros, Can you revert my previously suggested fix (i.e. change back to
The difference here is now the robot end-effector control target is calculated with the assumption that the gripper fingers are perfectly mirrored which aligns with the real-world condition. Bingjie |
Hi Bingjie, thanks for your adjustments. I changed both lines and it turns out, these changes work! After 1100 iterations of the training the robot seems to be able to successfully complete a peg insertion (PFA the corresponding webm). However, the insertion strategy seems not that reliable or robust, when trying with other starting positions. Further training seems to lead to a divergence of the strategy not beeing able to insert the peg anymore afte the full 8192 iterations (PFA the corresponding webm). There was already a strong degradation starting at 1200 iterations. When you said
Did you mean, you run the whole 8192 iterations within those 10 hours or did you terminate at comparable 1000ish iterations? Best regards, Konstantin |
Hi Konstantin, Have you tried the changes suggested in this answer? Normally I will stop the policy training when I see the logged success rate reaches ~85% at the most difficult level of curriculum (i.e. logged Best, |
Hi Bingjie,
I have, thanks for sharing this! With these changes the policy of the Peg-Hole task already converges with some reliable joining behaviour after just 200 episodes. After that it diverges before again assembling relative reliably at around 5000 episodes of training. Do you know an explanation for this very early convergence and divergence afterwards? Would it make sense to lower the tolerance of peg and hole given this low number of episodes to converge?
Also thanks for this note. But I don't understand the connection between the value of Best regards, |
Hi Konstantin, For the divergence, does it happen with multiple random seeds? Decrease the tolerance of peg and hole will definitely increase the difficulty of the task, which may introduce extra challenge during policy learning. Without careful evaluation, we cannot predict if it will be simply solved by longer training time.
Best, |
Hi Bingjie,
I just started another run. Is there anything to do for using another random seed or will this be choosen by default at every new training? Another training where I logged using wandb showed the same behaviour: Thanks for the explanation using Could you provide me with a wandb diagram of some successful training of yours? Also here is a graph of the gear insertion task: It does not even reach another difficulty of the curriculum after running for more than 2 days. It should use the same replacement as the peg hole insertion uses, you mentioned earlier:
Best regards, |
Hi Konstantin, The issue you are running into could be originated from the unstable training of PPO. This paper offers an in-depth analysis of best practices of training with PPO. Early stop is an easy and effective approach in our case. Best, |
Hi, Bingjie, Thanks for the fix. It indeed allowed the insertion policy to train and reach in 8-10 hours a similar performance with what you report in the paper for simulation. Best, P.S. Ofc, if I let it train for longer, I am also having the same unstable PPO behaviour that Konstantin reports above, for multiple seeds. |
Hi Bingjie, thanks for pointing this out. I tried to train the gear insertion task with the additional functions But it does not converge at all: Is there anything else I could try? Best, |
Hi Konstantin, I am not exactly sure what is happening. One thing you can try is to increase the number of points sampled on the gear mesh with this parameter in IndustRealTaskGearsInsert.yaml. This will essentially sample more points on the gear mesh for SDF reward query and an increased number of points will better capture the geometry of the gear. I have attached my training plots below (with the default value). Best, |
First of all: Thanks for your very interesting paper about Sim2Real of Contact-Rich Assembly and making the related code public!
I just trained the policy from scratch for the task IndustRealTaskPegsInsert running:
Unlike mentioned in the docs this took not only 10 hours but 50 hours instead running on a modern PC with a RTX4080.
Evaluating the inference with rendering running:
With additionally
num_envs=8
for performance reasons, it seems the insertion policy does not really work yet.PFA one close-up and one more far away video of those inferences.
Best regards,
Konstantin
IndustReal_PiH.webm
IndustReal_PiH_close.webm
The text was updated successfully, but these errors were encountered: