Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I seriously doubt the rationality of the method used in the article. #1

Closed
chillybird opened this issue Jun 29, 2022 · 6 comments
Closed

Comments

@chillybird
Copy link

First, only the reference motion of the thigh joint is used in the experiment. According to the characteristics of reinforcement learning, legal multi-gait motion can not be obtained.

Second, the CPG signal is not used as the input of the strategy network in the experiment, but only the phase parameter is input. Using only one hot coding is not enough to help the strategy network successfully learn different gait.

What can prove this is that the URDF file is not provided in the code, and using the code in the code warehouse can not get the effect shown in the article.

@chillybird
Copy link
Author

Therefore, I do not recommend using the method used in this article. At the same time, the multi-gait hierarchical learning framework has been proposed in 2001(Decentralized Autonomous Control of a Quadruped Locomotion Robot. http://www.ynl.t.u-tokyo.ac.jp/project/RobotBrainCREST/publications/pdf/tsuchiya/3_05.pdf). This article does not have any innovation

@awesomericky
Copy link
Owner

Thank you for your comments about the previous project I have done. However, I am not quite getting your points.

First, what do you mean by just using the reference motions of the thigh joints is not appropriate considering the characteristic of RL? When we train RL-based locomotion policy, some reference trajectories are often used as behavior priors to facilitate learning and generate more natural motions. This concept is called PMTG (Iscen et al.) and has been used in many impressive works (Lee et al., Miki et al.). In these papers, behavior priors were given with foot trajectories. However, I thought giving behavior priors with thigh joint trajectories could be another choice and that's what I tried in this project.

Second, what do you mean by the URDF? URDF file is the open source file of the robot used in the project. I used the file provided in the raisim simulator github (link). As you mention, the neural network architecture has some limitations and can make better by giving the current CPG state to the input rather than modulating it at the policy output. Then, the policy will be more generalizable in various cases (e.g. abrupt phase changes). However, in this work, we did not consider those cases.

Finally, about the novelty of the work and the paper you pointed out, there are lots of works that use CPG with model-based control to learn multiple gaits (one of them is the paper you linked). There are also lots of works that use predefined contact schedules with model-predictive controller to generate various gaits. However, at the time I was working on this project, there were not many works (best of my knowledge) that shows multiple gaits can be learned in RL-based controller with simple behavior priors. Recently there are some nice works with different approaches such as this paper.

@chillybird
Copy link
Author

Thank you for your comments about the previous project I have done. However, I am not quite getting your points.

First, what do you mean by just using the reference motions of the thigh joints is not appropriate considering the characteristic of RL? When we train RL-based locomotion policy, some reference trajectories are often used as behavior priors to facilitate learning and generate more natural motions. This concept is called PMTG (Iscen et al.) and has been used in many impressive works (Lee et al., Miki et al.). In these papers, behavior priors were given with foot trajectories. However, I thought giving behavior priors with thigh joint trajectories could be another choice and that's what I tried in this project.

Second, what do you mean by the URDF? URDF file is the open source file of the robot used in the project. I used the file provided in the raisim simulator github (link). As you mention, the neural network architecture has some limitations and can make better by giving the current CPG state to the input rather than modulating it at the policy output. Then, the policy will be more generalizable in various cases (e.g. abrupt phase changes). However, in this work, we did not consider those cases.

Finally, about the novelty of the work and the paper you pointed out, there are lots of works that use CPG with model-based control to learn multiple gaits (one of them is the paper you linked). There are also lots of works that use predefined contact schedules with model-predictive controller to generate various gaits. However, at the time I was working on this project, there were not many works (best of my knowledge) that shows multiple gaits can be learned in RL-based controller with simple behavior priors. Recently there are some nice works with different approaches such as this paper.

Thank you for your reply.

In this paper, the Hopf oscillator is used as the CPG signal, but in your paper, only the sine function is used as the CPG signal of the thigh, so there is no complete constraint on the gait. Therefore, when I run the experiment with your code, all gait approaches the same gait.

On the other hand, from the perspective of code implementation, you redefined the dynamics of laikago in URDF file, but you didn't put it into the code warehouse, which is unreasonable.

@chillybird
Copy link
Author

Thank you for your comments about the previous project I have done. However, I am not quite getting your points.

First, what do you mean by just using the reference motions of the thigh joints is not appropriate considering the characteristic of RL? When we train RL-based locomotion policy, some reference trajectories are often used as behavior priors to facilitate learning and generate more natural motions. This concept is called PMTG (Iscen et al.) and has been used in many impressive works (Lee et al., Miki et al.). In these papers, behavior priors were given with foot trajectories. However, I thought giving behavior priors with thigh joint trajectories could be another choice and that's what I tried in this project.

Second, what do you mean by the URDF? URDF file is the open source file of the robot used in the project. I used the file provided in the raisim simulator github (link). As you mention, the neural network architecture has some limitations and can make better by giving the current CPG state to the input rather than modulating it at the policy output. Then, the policy will be more generalizable in various cases (e.g. abrupt phase changes). However, in this work, we did not consider those cases.

Finally, about the novelty of the work and the paper you pointed out, there are lots of works that use CPG with model-based control to learn multiple gaits (one of them is the paper you linked). There are also lots of works that use predefined contact schedules with model-predictive controller to generate various gaits. However, at the time I was working on this project, there were not many works (best of my knowledge) that shows multiple gaits can be learned in RL-based controller with simple behavior priors. Recently there are some nice works with different approaches such as this paper.

If your code implementation is correct, in order to get a conservative strategy for any gait, the foot ends of all legs will be close to the ground, but from the video results you recorded, it is not the case.

@awesomericky
Copy link
Owner

awesomericky commented Jul 9, 2022

Got your points. I am not an expert in CPG signals and the Hopf oscillator, however, parameterizing the action space with behavior priors (e.g. in this work, sine functions for thigh joint angles) and executing policy search in these domains do help RL to find behaviors that we want such as multiple gait patterns. As you mentioned, the method does not completely constrain the gaits. Thus, not well-tuned reward functions and reward weights (and sometimes action exploration terms) may lead to a single gait. However, using the reward functions and reward weights mentioned in the paper appendix ('cost' terminology was used in the paper rather than 'reward'), I could successfully learn multiple gaits by just changing the phase differences between legs as shown in the video.

Sorry for making confusion due to not including the URDF in the code warehouse. It was located in the parent folder of the current repository considering the raisim folder structures. Open-source laikago URDF was used. To just focus on moving forwards, I modified the types of four hip joints (e.g. LF, LH, RF, RH) from 'revolute' to 'fixed'. Thus, the number of revolute joints of the robot that can be checked from the raisim Unity was 8 (not 12).

For the conservative strategy of gaits, are you meaning torque usage? All swing leg trajectories do not need to be close to the ground for all gaits because our objective is not just minimizing torque usage. As you can check in the implementation and the paper appendix, there are several reward functions and they include both torque rewards and foot clearance rewards. Torque rewards and foot clearance rewards are each used to minimize the torque usage while letting the swing leg trajectory show sufficient maximum height from the ground (the desired height is set as a hyperparameter). Appropriately summing them up with different weights result in locomotion that shows both suitable torque usage and foot clearance from the ground.

@chillybird
Copy link
Author

chillybird commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants