Skip to content

Commit

Permalink
Merge pull request #2 from HaozhiQi/hqi.dev
Browse files Browse the repository at this point in the history
Improvements and Bug Fix
  • Loading branch information
HaozhiQi committed Dec 3, 2023
2 parents 2e36ae6 + 5aa5974 commit 539de4b
Show file tree
Hide file tree
Showing 4 changed files with 97 additions and 65 deletions.
28 changes: 15 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# In-Hand Object Rotation via Rapid Motor Adaptation
# In-Hand Object Rotation Codebase

This repository contains a reference PyTorch implementation of the paper:
This codebase is initially built for code release of the following paper:

<b>In-Hand Object Rotation via Rapid Motor Adaptation</b> <br>
[Haozhi Qi*](https://haozhi.io/),
Expand All @@ -17,11 +17,15 @@ Conference on Robot Learning (CoRL), 2022 <br>
<img src="https://user-images.githubusercontent.com/10141467/204687717-bb649cb5-ab0f-4450-a98b-2d40788029f6.gif" width="1000"/>
</p>

After the initial release, we are still actively building this project by adding new features and resolving previous bugs. Therefore, some of the experiment number may be inconsistent from what was reported in the above paper. Please check out version [0.0.1](https://github.com/HaozhiQi/hora/tree/v0.0.1) if you want to reproduce the numbers reported in the paper.

We also maintain a [changelog and bugs](docs/changelog.md) we found during this development process.

## Disclaimer

It is worth noticing that:
1. Simulation: The method is developed and debugged using IsaacGym Preview 3.0 ([Download](https://drive.google.com/file/d/1oK-QMZ40PO60PFWWsTmtK5ToFDkbL6R0/)), IsaacGymEnvs ([e860979](https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/tree/e86097999b88da28b5252be16f81c595bbb3fca5)). Versions newer than these should work, but have not been extensively tested yet.
2. Hardware: The method is developed using an internal version of AllegroHand. We also provide a reference implementation (see the *Training the Policy* section for details) and [video results](https://haozhi.io/hora/allegro_v4) using the public AllegroHand-v4.
1. Simulation: The repo is mainly developed and debugged using IsaacGym Preview 4.0 ([Download](https://drive.google.com/file/d/1StaRl_hzYFYbJegQcyT7-yjgutc6C7F9)). Please note the results will be inconsistent if you train with IsaacGym Preview 3.0.
2. Hardware: The method is developed using an internal version of AllegroHand. We also provide a reference implementation (but please refer to version [0.0.1](https://github.com/HaozhiQi/hora/tree/v0.0.1) readme) and [video results](https://haozhi.io/hora/allegro_v4) using the public AllegroHand-v4.
3. Results: The reward number in this repository are higher than what is reported in the paper. This is because we change the `reset` function order following [LeggedGym](https://github.com/leggedrobotics/legged_gym) instead of the one in [IsaacGymEnvs](https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/blob/e8f1c66b24/isaacgymenvs/tasks/base/vec_task.py).

## Installation
Expand Down Expand Up @@ -67,36 +71,34 @@ This section can verify whether you install the repository and dependencies corr
Download a pretrained policy:
```
cd outputs/AllegroHandHora/
gdown 1AKecNsQZ56TCyJU49DU06GxnQRbeawMu -O hora.zip
unzip hora.zip -d ./hora
gdown 17fr40KQcUyFXz4W1ejuLTzRqP-Qu9EPS -O hora_v0.0.2.zip
unzip hora_v0.0.2.zip -d ./hora_v0.0.2
cd ../../
```

The data structure should look like:
```
outputs/
AllegroHandHora/
hora/
hora_v0.0.2/
stage1_nn/ # stage 1 checkpoints
stage1_tb/ # stage 1 tensorboard records
stage2_nn/ # stage 2 checkpoints
stage2_tb/ # stage 2 tensorboard records
```

Visualize it by running the following command. Note that stage 1 policy refers to the one trained with privileged object information while stage 2 policy refers to the one trained with proprioceptive history. The stage 2 policy is also what we deployed in the real-world.

```
# s1 and s2 stands for stage 1 and 2, respectively
scripts/vis_s1.sh hora
scripts/vis_s2.sh hora
scripts/vis_s1.sh hora_v0.0.2
scripts/vis_s2.sh hora_v0.0.2
```

Evaluate this two policies by running:

```
# change {GPU_ID} to a valid number
scripts/eval_s1.sh ${GPU_ID} hora
scripts/eval_s2.sh ${GPU_ID} hora
scripts/eval_s1.sh ${GPU_ID} hora_v0.0.2
scripts/eval_s2.sh ${GPU_ID} hora_v0.0.2
```

## Training the Policy
Expand Down
7 changes: 7 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Changelog and Bugs

December 1st, 2023. [v0.0.2].
- Read angular velocity at control frequency. Previously the angular velocity is obtained at simulation frequency. We found this will result in oscillation behavior since the policy will learn to exploit this phenomenon.
- Remove the hand-crafted lower and upper of privileged information.
- Remove online mass randomization since it does not have effect after simulation is created.
- Change angular velocity max clip limit from 0.5 to 0.4, to compensate the higher rotation speed.
125 changes: 74 additions & 51 deletions hora/tasks/allegro_hand_hora.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
import numpy as np
from isaacgym import gymtorch
from isaacgym import gymapi
from isaacgym.torch_utils import to_torch, unscale, quat_apply, tensor_clamp, torch_rand_float
from isaacgym.torch_utils import to_torch, unscale, quat_apply, tensor_clamp, torch_rand_float, quat_conjugate, quat_mul
from glob import glob
from hora.utils.misc import tprint
from .base.vec_task import VecTask
Expand Down Expand Up @@ -99,6 +99,8 @@ def __init__(self, config, sim_device, graphics_device_id, headless):
self.rot_axis_buf = torch.zeros((self.num_envs, 3), device=self.device, dtype=torch.float)

# useful buffers
self.object_rot_prev = self.object_rot.clone()
self.object_pos_prev = self.object_pos.clone()
self.init_pose_buf = torch.zeros((self.num_envs, self.num_dofs), device=self.device, dtype=torch.float)
self.actions = torch.zeros((self.num_envs, self.num_actions), device=self.device, dtype=torch.float)
self.torques = torch.zeros((self.num_envs, self.num_actions), device=self.device, dtype=torch.float)
Expand Down Expand Up @@ -197,7 +199,7 @@ def _create_envs(self, num_envs, spacing, num_per_row):
num_scales = len(self.randomize_scale_list)
obj_scale = np.random.uniform(self.randomize_scale_list[i % num_scales] - 0.025, self.randomize_scale_list[i % num_scales] + 0.025)
self.gym.set_actor_scale(env_ptr, object_handle, obj_scale)
self._update_priv_buf(env_id=i, name='obj_scale', value=obj_scale, lower=0.6, upper=0.9)
self._update_priv_buf(env_id=i, name='obj_scale', value=obj_scale)

obj_com = [0, 0, 0]
if self.randomize_com:
Expand All @@ -208,7 +210,7 @@ def _create_envs(self, num_envs, spacing, num_per_row):
np.random.uniform(self.randomize_com_lower, self.randomize_com_upper)]
prop[0].com.x, prop[0].com.y, prop[0].com.z = obj_com
self.gym.set_actor_rigid_body_properties(env_ptr, object_handle, prop)
self._update_priv_buf(env_id=i, name='obj_com', value=obj_com, lower=-0.02, upper=0.02)
self._update_priv_buf(env_id=i, name='obj_com', value=obj_com)

obj_friction = 1.0
if self.randomize_friction:
Expand All @@ -223,7 +225,17 @@ def _create_envs(self, num_envs, spacing, num_per_row):
p.friction = rand_friction
self.gym.set_actor_rigid_shape_properties(env_ptr, object_handle, object_props)
obj_friction = rand_friction
self._update_priv_buf(env_id=i, name='obj_friction', value=obj_friction, lower=0.0, upper=1.5)
self._update_priv_buf(env_id=i, name='obj_friction', value=obj_friction)

if self.randomize_mass:
prop = self.gym.get_actor_rigid_body_properties(env_ptr, object_handle)
for p in prop:
p.mass = np.random.uniform(self.randomize_mass_lower, self.randomize_mass_upper)
self.gym.set_actor_rigid_body_properties(env_ptr, object_handle, prop)
self._update_priv_buf(env_id=i, name='obj_mass', value=prop[0].mass)
else:
prop = self.gym.get_actor_rigid_body_properties(env_ptr, object_handle)
self._update_priv_buf(env_id=i, name='obj_mass', value=prop[0].mass)

if self.aggregate_mode > 0:
self.gym.end_aggregate(env_ptr)
Expand All @@ -236,24 +248,6 @@ def _create_envs(self, num_envs, spacing, num_per_row):
self.object_indices = to_torch(self.object_indices, dtype=torch.long, device=self.device)

def reset_idx(self, env_ids):
if self.randomize_mass:
lower, upper = self.randomize_mass_lower, self.randomize_mass_upper

for env_id in env_ids:
env = self.envs[env_id]
handle = self.gym.find_actor_handle(env, 'object')
prop = self.gym.get_actor_rigid_body_properties(env, handle)
for p in prop:
p.mass = np.random.uniform(lower, upper)
self.gym.set_actor_rigid_body_properties(env, handle, prop)
self._update_priv_buf(env_id=env_id, name='obj_mass', value=prop[0].mass, lower=0, upper=0.2)
else:
for env_id in env_ids:
env = self.envs[env_id]
handle = self.gym.find_actor_handle(env, 'object')
prop = self.gym.get_actor_rigid_body_properties(env, handle)
self._update_priv_buf(env_id=env_id, name='obj_mass', value=prop[0].mass, lower=0, upper=0.2)

if self.randomize_pd_gains:
self.p_gain[env_ids] = torch_rand_float(
self.randomize_p_gain_lower, self.randomize_p_gain_upper, (len(env_ids), self.num_actions),
Expand Down Expand Up @@ -331,33 +325,36 @@ def compute_reward(self, actions):
# work and torque penalty
torque_penalty = (self.torques ** 2).sum(-1)
work_penalty = ((self.torques * self.dof_vel_finite_diff).sum(-1)) ** 2
obj_linv_pscale = self.object_linvel_penalty_scale
pose_diff_pscale = self.pose_diff_penalty_scale
torque_pscale = self.torque_penalty_scale
work_pscale = self.work_penalty_scale

self.rew_buf[:], log_r_reward, olv_penalty = compute_hand_reward(
self.object_linvel, obj_linv_pscale,
self.object_angvel, self.rot_axis_buf, self.rotate_reward_scale,
self.angvel_clip_max, self.angvel_clip_min,
pose_diff_penalty, pose_diff_pscale,
torque_penalty, torque_pscale,
work_penalty, work_pscale,
# Compute offset in radians. Radians -> radians / sec
angdiff = quat_to_axis_angle(quat_mul(self.object_rot, quat_conjugate(self.object_rot_prev)))
object_angvel = angdiff / (self.control_freq_inv * self.dt)
vec_dot = (object_angvel * self.rot_axis_buf).sum(-1)
rotate_reward = torch.clip(vec_dot, max=self.angvel_clip_max, min=self.angvel_clip_min)
# linear velocity: use position difference instead of self.object_linvel
object_linvel = ((self.object_pos - self.object_pos_prev) / (self.control_freq_inv * self.dt)).clone()
object_linvel_penalty = torch.norm(object_linvel, p=1, dim=-1)

self.rew_buf[:] = compute_hand_reward(
object_linvel_penalty, self.object_linvel_penalty_scale,
rotate_reward, self.rotate_reward_scale,
pose_diff_penalty, self.pose_diff_penalty_scale,
torque_penalty, self.torque_penalty_scale,
work_penalty, self.work_penalty_scale,
)
self.reset_buf[:] = self.check_termination(self.object_pos)
self.extras['rotation_reward'] = log_r_reward.mean()
self.extras['object_linvel_penalty'] = olv_penalty.mean()
self.extras['rotation_reward'] = rotate_reward.mean()
self.extras['object_linvel_penalty'] = object_linvel_penalty.mean()
self.extras['pose_diff_penalty'] = pose_diff_penalty.mean()
self.extras['work_done'] = work_penalty.mean()
self.extras['torques'] = torque_penalty.mean()
self.extras['roll'] = self.object_angvel[:, 0].mean()
self.extras['pitch'] = self.object_angvel[:, 1].mean()
self.extras['yaw'] = self.object_angvel[:, 2].mean()
self.extras['roll'] = object_angvel[:, 0].mean()
self.extras['pitch'] = object_angvel[:, 1].mean()
self.extras['yaw'] = object_angvel[:, 2].mean()

if self.evaluate:
finished_episode_mask = self.reset_buf == 1
self.stat_sum_rewards += self.rew_buf.sum()
self.stat_sum_rotate_rewards += log_r_reward.sum()
self.stat_sum_rotate_rewards += rotate_reward.sum()
self.stat_sum_torques += self.torques.abs().sum()
self.stat_sum_obj_linvel += (self.object_linvel ** 2).sum(-1).sum()
self.stat_sum_episode_length += (self.reset_buf == 0).sum()
Expand Down Expand Up @@ -408,6 +405,8 @@ def pre_physics_step(self, actions):
targets = self.prev_targets + 1 / 24 * self.actions
self.cur_targets[:] = tensor_clamp(targets, self.allegro_hand_dof_lower_limits, self.allegro_hand_dof_upper_limits)
self.prev_targets[:] = self.cur_targets.clone()
self.object_rot_prev[:] = self.object_rot
self.object_pos_prev[:] = self.object_pos

if self.force_scale > 0.0:
self.rb_forces *= torch.pow(self.force_decay, self.dt / self.force_decay_interval)
Expand Down Expand Up @@ -618,23 +617,47 @@ def _init_object_pose(self):


def compute_hand_reward(
object_linvel, object_linvel_penalty_scale: float,
object_angvel, rotation_axis, rotate_reward_scale: float,
angvel_clip_max: float, angvel_clip_min: float,
object_linvel_penalty, object_linvel_penalty_scale: float,
rotate_reward, rotate_reward_scale: float,
pose_diff_penalty, pose_diff_penalty_scale: float,
torque_penalty, torque_pscale: float,
work_penalty, work_pscale: float,
):
rotate_reward_cond = (rotation_axis[:, -1] != 0).float()
vec_dot = (object_angvel * rotation_axis).sum(-1)
rotate_reward = torch.clip(vec_dot, max=angvel_clip_max, min=angvel_clip_min)
rotate_reward = rotate_reward_scale * rotate_reward * rotate_reward_cond
object_linvel_penalty = torch.norm(object_linvel, p=1, dim=-1)

reward = rotate_reward
reward = rotate_reward_scale * rotate_reward
# Distance from the hand to the object
reward = reward + object_linvel_penalty * object_linvel_penalty_scale
reward = reward + pose_diff_penalty * pose_diff_penalty_scale
reward = reward + torque_penalty * torque_pscale
reward = reward + work_penalty * work_pscale
return reward, rotate_reward, object_linvel_penalty
return reward


def quat_to_axis_angle(quaternions: torch.Tensor) -> torch.Tensor:
"""
Convert rotations given as quaternions to axis/angle.
Adapted from PyTorch3D:
https://pytorch3d.readthedocs.io/en/latest/_modules/pytorch3d/transforms/rotation_conversions.html#quaternion_to_axis_angle
Args:
quaternions: quaternions with real part last,
as tensor of shape (..., 4).
Returns:
Rotations given as a vector in axis angle form, as a tensor
of shape (..., 3), where the magnitude is the angle
turned anticlockwise in radians around the vector's
direction.
"""
norms = torch.norm(quaternions[..., :3], p=2, dim=-1, keepdim=True)
half_angles = torch.atan2(norms, quaternions[..., 3:])
angles = 2 * half_angles
eps = 1e-6
small_angles = angles.abs() < eps
sin_half_angles_over_angles = torch.empty_like(angles)
sin_half_angles_over_angles[~small_angles] = (
torch.sin(half_angles[~small_angles]) / angles[~small_angles]
)
# for x small, sin(x/2) is about x/2 - (x/2)^3/6
# so sin(x/2)/x is about 1/2 - (x*x)/48
sin_half_angles_over_angles[small_angles] = (
0.5 - (angles[small_angles] * angles[small_angles]) / 48
)
return quaternions[..., :3] / sin_half_angles_over_angles
2 changes: 1 addition & 1 deletion scripts/deploy.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/bash
CACHE=$1
python deploy.py checkpoint=outputs/AllegroHandHora/"${CACHE}"/stage2_nn/last.pth
python deploy.py checkpoint=outputs/AllegroHandHora/"${CACHE}"/stage2_nn/model_last.ckpt

0 comments on commit 539de4b

Please sign in to comment.