Curriculum learning for grasping environment #62

AndrejOrsula · 2021-03-10T18:30:45Z

Idea/topic 1: Decouple the entire Grasp task into sub-routines (primitive tasks) and train them individually until fully mastered (success rate and/or reward above certain value)

Here is a list of examples that could fit into such sub-routines. For each of them, there is a corresponding termination state for success, as well as sparse and dense reward (+ their alternatives).

Robot must approach object
- Termination (success):
  - Distance between robot tool centre point (TCP) and the closest object is less than a threshold
  - Robot finger(s) collide with the object
- Reward
  - Sparse reward:
    - Constant positive reward once episode is terminated due to success
  - Dense reward:
    - (relative +-) Positive/negative reward based on how much closer/further the robot TCP is to the closest object compared to the previous step
    - (absolute) Negative distance between robot TCP and closest object
Robot must touch the object
- Termination (success):
  - Any fingers must be in contact with an object
- Reward
  - Sparse reward:
    - Constant positive reward once episode is terminated due to success
  - Dense reward:
    - ?
Robot must grasp the object
- Termination (success):
  - Fingers must be in contact with an object (contact normals cannot point in the same direction, i.e. no pushing of object)
  - Same as above, but for X number of consecutive steps
    - This would be preferred, however, agent has currently no temporal information in the observations
- Reward
  - Sparse reward:
    - Constant positive reward once episode is terminated due to success
  - Dense reward:
    - ? Reward engineering for this one is quite difficult, not to speak about the quality of the grasp.
      - One could give a small reward if a part of object's geometry is located between fingers. Or look at the distance to an object in the direction of each finger's actuated direction. A negative reward could also be given for each step the gripper is closed and not contacting any object.
      - For now, sparse reward might be much more descriptive.
Robot must lift the grasped object
- Termination (success):
  - An object is lifted above certain height threshold while being in contact with the fingers (contact normals cannot point in the same direction, i.e. no pushing of object)
- Reward
  - Sparse reward:
    - Constant positive reward once episode is terminated due to success
  - Dense reward:
    - (relative +-) Positive/negative reward based on how much higher/lower an object is compared to the previous step
      - The object must be in contact with fingers.
    - (relative +[-]) Positive/negative reward based on how much higher/lower an object is compared to the previous step
      - The object must be in contact with fingers if it is higher. No negative reward will be given if the object is falling (with no contact).
    - (relative +) Only positive reward based on how much higher an object is compared to the previous step
    - (absolute) Negative distance between robot TCP and closest object
One could be continue from this (or previous) step for other actions, e.g. placing. I am not looking into that in the scope of this project.

Variant a (Selected): Update termination state and reward function to the next sub-task only after the current sub-task performs well

Start training from the first sub-task, including only the termination and reward for such task. After certain reward is accumulate (or success rate achieved), update the termination and reward function to take the next sub-task into account. For the reward, this could be done in two ways:

Use only the reward for the last objective and rely on the replay buffer and current policy that the agent keeps reaching the goal of the previous sub-task
Keep both the previous and the new reward function
Keep both the previous and the new reward function, but down-scale the reward function of previous sub-task by some constant value

Variant b: Design separate tasks (environments) - not really a curriculum learning, but might be of interest

In this variant, each sub-task would have its own task (environment) that would provide adequate starting and termination position. Once each sub-task has relatively high success rate, save transitions from each sub-task into a single replay buffer and use it as demonstration of each step to train the agent from start to finish. This could also be done in an offline fashion.

Advantage of this approach is in the clear separation of the goals that the agent should reach. Disadvantage is that agent might not be as robust and able to learn the connections/transitions between the tasks correctly.

Idea/topic 2: Restrict action space (and workspace) of robot and progressively increase it

Restricting the actions space could aid exploration because less randomness would be required to reach a rewarding state. The workspace can grow with success rate / average reward, or simply as the step gets called

This can be restricted by:

Position goal, using a growing axis-aligned bounding box should do the job
- Volume in which objects are spawned must be adjusted accordingly
Orientation goal (only for full 3D), top-down orientation is more likely to result in successful grasp compared to bottom-up

Restricting gripper action probably does not make much sense, not sure what the point would be.

Idea/topic 3: Make the environment progressively more difficult

Apply randomizer (random objects and ground plane) once simpler scenario is solved

AndrejOrsula · 2021-03-18T17:40:43Z

First two points are addressed by #65

Making the environment progressively more difficult (multiple, random objects) can be done manually, and it also seems to be the easiest way (performance, easier to determine when to change).

AndrejOrsula added this to the Final design of algorithm, policy, training environment, task and domain randomization. At least 98% implemented milestone Mar 10, 2021

AndrejOrsula added Epic EPIC 🦄 As epic as it can ever get and removed Epic labels Mar 10, 2021

AndrejOrsula mentioned this issue Mar 18, 2021

Add task- and workspace-based curriculum for Grasp tasks #65

Merged

AndrejOrsula closed this as completed Mar 18, 2021

AndrejOrsula mentioned this issue Mar 18, 2021

EPIC: Reward function for grasping environment #41

Closed

AndrejOrsula mentioned this issue Apr 17, 2021

EPIC: Design and decisions #50

Closed

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Curriculum learning for grasping environment #62

Curriculum learning for grasping environment #62

AndrejOrsula commented Mar 10, 2021 •

edited

Loading

AndrejOrsula commented Mar 18, 2021

Curriculum learning for grasping environment #62

Curriculum learning for grasping environment #62

Comments

AndrejOrsula commented Mar 10, 2021 • edited Loading

Idea/topic 1: Decouple the entire Grasp task into sub-routines (primitive tasks) and train them individually until fully mastered (success rate and/or reward above certain value)

Variant a (Selected): Update termination state and reward function to the next sub-task only after the current sub-task performs well

Variant b: Design separate tasks (environments) - not really a curriculum learning, but might be of interest

Idea/topic 2: Restrict action space (and workspace) of robot and progressively increase it

Idea/topic 3: Make the environment progressively more difficult

AndrejOrsula commented Mar 18, 2021

AndrejOrsula commented Mar 10, 2021 •

edited

Loading