I seriously doubt the rationality of the method used in the article. #1

chillybird · 2022-06-29T03:53:08Z

First, only the reference motion of the thigh joint is used in the experiment. According to the characteristics of reinforcement learning, legal multi-gait motion can not be obtained.

Second, the CPG signal is not used as the input of the strategy network in the experiment, but only the phase parameter is input. Using only one hot coding is not enough to help the strategy network successfully learn different gait.

What can prove this is that the URDF file is not provided in the code, and using the code in the code warehouse can not get the effect shown in the article.

chillybird · 2022-06-29T03:58:44Z

Therefore, I do not recommend using the method used in this article. At the same time, the multi-gait hierarchical learning framework has been proposed in 2001(Decentralized Autonomous Control of a Quadruped Locomotion Robot. http://www.ynl.t.u-tokyo.ac.jp/project/RobotBrainCREST/publications/pdf/tsuchiya/3_05.pdf). This article does not have any innovation

awesomericky · 2022-06-29T11:34:03Z

Thank you for your comments about the previous project I have done. However, I am not quite getting your points.

First, what do you mean by just using the reference motions of the thigh joints is not appropriate considering the characteristic of RL? When we train RL-based locomotion policy, some reference trajectories are often used as behavior priors to facilitate learning and generate more natural motions. This concept is called PMTG (Iscen et al.) and has been used in many impressive works (Lee et al., Miki et al.). In these papers, behavior priors were given with foot trajectories. However, I thought giving behavior priors with thigh joint trajectories could be another choice and that's what I tried in this project.

Second, what do you mean by the URDF? URDF file is the open source file of the robot used in the project. I used the file provided in the raisim simulator github (link). As you mention, the neural network architecture has some limitations and can make better by giving the current CPG state to the input rather than modulating it at the policy output. Then, the policy will be more generalizable in various cases (e.g. abrupt phase changes). However, in this work, we did not consider those cases.

Finally, about the novelty of the work and the paper you pointed out, there are lots of works that use CPG with model-based control to learn multiple gaits (one of them is the paper you linked). There are also lots of works that use predefined contact schedules with model-predictive controller to generate various gaits. However, at the time I was working on this project, there were not many works (best of my knowledge) that shows multiple gaits can be learned in RL-based controller with simple behavior priors. Recently there are some nice works with different approaches such as this paper.

chillybird · 2022-07-09T08:36:39Z

Thank you for your comments about the previous project I have done. However, I am not quite getting your points.

First, what do you mean by just using the reference motions of the thigh joints is not appropriate considering the characteristic of RL? When we train RL-based locomotion policy, some reference trajectories are often used as behavior priors to facilitate learning and generate more natural motions. This concept is called PMTG (Iscen et al.) and has been used in many impressive works (Lee et al., Miki et al.). In these papers, behavior priors were given with foot trajectories. However, I thought giving behavior priors with thigh joint trajectories could be another choice and that's what I tried in this project.

Second, what do you mean by the URDF? URDF file is the open source file of the robot used in the project. I used the file provided in the raisim simulator github (link). As you mention, the neural network architecture has some limitations and can make better by giving the current CPG state to the input rather than modulating it at the policy output. Then, the policy will be more generalizable in various cases (e.g. abrupt phase changes). However, in this work, we did not consider those cases.

Finally, about the novelty of the work and the paper you pointed out, there are lots of works that use CPG with model-based control to learn multiple gaits (one of them is the paper you linked). There are also lots of works that use predefined contact schedules with model-predictive controller to generate various gaits. However, at the time I was working on this project, there were not many works (best of my knowledge) that shows multiple gaits can be learned in RL-based controller with simple behavior priors. Recently there are some nice works with different approaches such as this paper.

Thank you for your reply.

In this paper, the Hopf oscillator is used as the CPG signal, but in your paper, only the sine function is used as the CPG signal of the thigh, so there is no complete constraint on the gait. Therefore, when I run the experiment with your code, all gait approaches the same gait.

On the other hand, from the perspective of code implementation, you redefined the dynamics of laikago in URDF file, but you didn't put it into the code warehouse, which is unreasonable.

chillybird · 2022-07-09T08:42:22Z

Thank you for your comments about the previous project I have done. However, I am not quite getting your points.

First, what do you mean by just using the reference motions of the thigh joints is not appropriate considering the characteristic of RL? When we train RL-based locomotion policy, some reference trajectories are often used as behavior priors to facilitate learning and generate more natural motions. This concept is called PMTG (Iscen et al.) and has been used in many impressive works (Lee et al., Miki et al.). In these papers, behavior priors were given with foot trajectories. However, I thought giving behavior priors with thigh joint trajectories could be another choice and that's what I tried in this project.

Second, what do you mean by the URDF? URDF file is the open source file of the robot used in the project. I used the file provided in the raisim simulator github (link). As you mention, the neural network architecture has some limitations and can make better by giving the current CPG state to the input rather than modulating it at the policy output. Then, the policy will be more generalizable in various cases (e.g. abrupt phase changes). However, in this work, we did not consider those cases.

Finally, about the novelty of the work and the paper you pointed out, there are lots of works that use CPG with model-based control to learn multiple gaits (one of them is the paper you linked). There are also lots of works that use predefined contact schedules with model-predictive controller to generate various gaits. However, at the time I was working on this project, there were not many works (best of my knowledge) that shows multiple gaits can be learned in RL-based controller with simple behavior priors. Recently there are some nice works with different approaches such as this paper.

If your code implementation is correct, in order to get a conservative strategy for any gait, the foot ends of all legs will be close to the ground, but from the video results you recorded, it is not the case.

awesomericky · 2022-07-09T14:10:02Z

Got your points. I am not an expert in CPG signals and the Hopf oscillator, however, parameterizing the action space with behavior priors (e.g. in this work, sine functions for thigh joint angles) and executing policy search in these domains do help RL to find behaviors that we want such as multiple gait patterns. As you mentioned, the method does not completely constrain the gaits. Thus, not well-tuned reward functions and reward weights (and sometimes action exploration terms) may lead to a single gait. However, using the reward functions and reward weights mentioned in the paper appendix ('cost' terminology was used in the paper rather than 'reward'), I could successfully learn multiple gaits by just changing the phase differences between legs as shown in the video.

Sorry for making confusion due to not including the URDF in the code warehouse. It was located in the parent folder of the current repository considering the raisim folder structures. Open-source laikago URDF was used. To just focus on moving forwards, I modified the types of four hip joints (e.g. LF, LH, RF, RH) from 'revolute' to 'fixed'. Thus, the number of revolute joints of the robot that can be checked from the raisim Unity was 8 (not 12).

For the conservative strategy of gaits, are you meaning torque usage? All swing leg trajectories do not need to be close to the ground for all gaits because our objective is not just minimizing torque usage. As you can check in the implementation and the paper appendix, there are several reward functions and they include both torque rewards and foot clearance rewards. Torque rewards and foot clearance rewards are each used to minimize the torque usage while letting the swing leg trajectory show sufficient maximum height from the ground (the desired height is set as a hyperparameter). Appropriately summing them up with different weights result in locomotion that shows both suitable torque usage and foot clearance from the ground.

chillybird · 2022-10-11T08:41:50Z

The URDF file you mentioned was modified by me two weeks ago. And I've also noticed the implementation of multiple reward functions and different action space structures that you mentioned. But after more than half a month of repeated experimentation, I really can't achieve the effect you showed. Perhaps, you are right, and I may further study the implementation in the code warehouse in the future, but I reserve my doubts now. I'm sorry if my question bothers you. | | ***@***.*** | | ***@***.*** |

…

---- Replied Message ---- | From | Yunho ***@***.***> | | Date | 07/09/2022 22:10 | | To | ***@***.***> | | Cc | ***@***.******@***.***> | | Subject | Re: [awesomericky/multiple-gait-controller-for-quadruped-robot] I seriously doubt the rationality of the method used in the article. (Issue #1) | Understand your �claims. I am not an expert in CPG signals and the Hopf oscillator, however, parameterizing the action space with behavior priors (e.g. in this work, sine functions for thigh joint angles) and executing policy search in these domains do help RL to find locomotion behaviors that we want such as gait patterns. As you mentioned, the method does not completely constrain the behaviors. Thus, not well-tuned reward functions and reward weights (and sometimes action exploration terms) may lead to a single gait. While doing this project, I spent most of the time tuning these parameters. However, using the reward functions and reward weights mentioned in the paper appendix ('cost' terminology was used in the paper rather than 'reward'), I could successfully learn multiple gaits by just changing the phase differences between legs as shown in the video. For the URDF, I apologize for not including it in the code warehouse. It was located in the parent folder of the current repository considering the raisim folder structures. Open-source laikago URDF was used. To just focus on moving forwards, I modified the types of four hip joints (e.g. LF, LH, RF, RH) from 'revolute' to 'fixed'. Thus, the number of revolute joints of the robot that can be checked from the raisim Unity was 8 (not 12). For the conservative strategy of gaits, are you meaning torque usage? All swing leg trajectories do not need to be close to the ground for all gaits because our objective is not just minimizing torque usage. As you can check in the implementation and the paper appendix, there are several reward functions and they include both torque rewards and foot clearance rewards. Torque rewards and foot clearance rewards are each used to minimize the torque usage while letting the swing leg trajectory show sufficient maximum height from the ground (the desired height is set as a hyperparameter). Appropriately summing them up with different weights result in locomotion that shows both suitable torque usage and foot clearance from the ground. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

chillybird closed this as completed Jul 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I seriously doubt the rationality of the method used in the article. #1

I seriously doubt the rationality of the method used in the article. #1

chillybird commented Jun 29, 2022

chillybird commented Jun 29, 2022

awesomericky commented Jun 29, 2022

chillybird commented Jul 9, 2022

chillybird commented Jul 9, 2022

awesomericky commented Jul 9, 2022 •

edited

chillybird commented Oct 11, 2022 via email

I seriously doubt the rationality of the method used in the article. #1

I seriously doubt the rationality of the method used in the article. #1

Comments

chillybird commented Jun 29, 2022

chillybird commented Jun 29, 2022

awesomericky commented Jun 29, 2022

chillybird commented Jul 9, 2022

chillybird commented Jul 9, 2022

awesomericky commented Jul 9, 2022 • edited

chillybird commented Oct 11, 2022 via email

awesomericky commented Jul 9, 2022 •

edited