GitHub - JonasMedu/learn_grasping: Policy learning of in-hand manipulation. Proximal policy optimization trains the Allegro hand to learn a stabilizing grasp

In-hand manipulation with proximal policy optimization (PPO)

This project aims to train the Allegro hand for in-hand manipulation (in simulation). The robotic Allegro hand is an analogy of human anatomy. Its fingertips measure haptic pressure via tactile information. The real hand makes use of fluids, which are hard to model. This motivates model free reinforcement learning. In this project the Allegro hand learns to find a stabilizing grasp for a small bar. The project defines two experimental setup. The first one aims to find a reward function definition to enable training progression. The second setup simulates a trajectory optimization in form of noise.

framework overview

tensorflow 1.x for machine learning pybullet for the Allegro hand simulation gym(.Env) as superclass for hand

packages overview

ausy_base:

Learning script
PPO implementation
Policy and value function model

allegro_pybullet:

Meshes for the hand and the grasping object
Pybullet interface, manipulation interface for the hand

hand_env:

Gym classes for the Allegro hand. (defines state and action spaces)
1. Grasping learning hand (allegro_env)
2. Grasping learning hand under the influence of noise (noisy_hand)
3. Grasping learning hand which stacks a previously trained (trained_env)

setting_utils:

controls the experiments parameters
holds the positions of the initial state distribution
defines the reward functions

performance_analysis:

holds code to make graphs for the training progression etc.

quick start (curerntly unavailable; I do not have any licence information about the allegro Hand implementation)

install requirements (e.g. via pip), the requirements file
adapt hard coded logging folder in setting_utils.paramhandler
run learn_lower with gui=True
wait for learning progression
1. You can measure the learning progression by the "number of trajectories" per training data gathering cycle. You can inspect this variable via tensorboard.

implementation notes

setting_utils defines tensorboard
project ran only on local machines -> no cli arguments, alter source code directly
tf v 1. requires messy model saving/loading handling
files saved (standard out: documents/tb, new folder per experiment) per run:
1. progress.csv with the training progress
2. config file saves the parameter settings
pybullet works in one thread. The easiest possibility to speed up the experiments, is to run them in parallel.
the show_finger option of the Envs shows the tactile pressure, but slow down the simulation hard
because of pybullet, you only render every time step, or none.
lots of tf warnings are thrown because of the project age
the policy loading and saving is mostly managed via naming conventions

model notes

2-hidden layer (size 64) MLP, for the value function and the policy.

How to enable Logging in tensorboard

run in your comand line tensorboard --logdir=path/to/tensorboard/data
Where as the fixed standard out: "path/to/tensorboard/data" = "~/documents/tb"

Results

After around 600 training and data gathering iterations the policy learns a stabilizing grasp. The following images show a policy trained with the reward function weighted_or_dist.

The second experiment shows how a simulation of the upper trajectory optimization influences the learning capabilities of the stabilizing grasp. The graph shows the training progression with the influence of noise. The graph shows 20 training sessions, each of which as a different noise (simulated upper trajectory optimization signal) input. The noise is spaced between .1 and 20 in a geometric progression. The color of the lines help to indicate the noise magnitude. The higher the noise, the darker the line. A gradient from bright, at the button, to dark (at the top) appears in the graph.

Each iteration has a fixed number of simulation steps. The lower the number of trajectories, the more stable is the grasp of the bar. The graphs indicates the relation between noise and capability of the agent to learn a stabilizing policy.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
read_me_images		read_me_images
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

In-hand manipulation with proximal policy optimization (PPO)

framework overview

packages overview

quick start (curerntly unavailable; I do not have any licence information about the allegro Hand implementation)

implementation notes

model notes

How to enable Logging in tensorboard

Results

About

Releases

Packages

Languages

JonasMedu/learn_grasping

Folders and files

Latest commit

History

Repository files navigation

In-hand manipulation with proximal policy optimization (PPO)

framework overview

packages overview

quick start (curerntly unavailable; I do not have any licence information about the allegro Hand implementation)

implementation notes

model notes

How to enable Logging in tensorboard

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages