diff --git a/README.md b/README.md index 385444d..c6fe953 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ -# In-Hand Object Rotation via Rapid Motor Adaptation +# In-Hand Object Rotation Codebase -This repository contains a reference PyTorch implementation of the paper: +This codebase is initially built for code release of the following paper: In-Hand Object Rotation via Rapid Motor Adaptation
[Haozhi Qi*](https://haozhi.io/), @@ -17,11 +17,15 @@ Conference on Robot Learning (CoRL), 2022

+After the initial release, we are still actively building this project by adding new features and resolving previous bugs. Therefore, some of the experiment number may be inconsistent from what was reported in the above paper. Please check out version [0.0.1](https://github.com/HaozhiQi/hora/tree/v0.0.1) if you want to reproduce the numbers reported in the paper. + +We also maintain a [changelog and bugs](docs/changelog.md) we found during this development process. + ## Disclaimer It is worth noticing that: -1. Simulation: The method is developed and debugged using IsaacGym Preview 3.0 ([Download](https://drive.google.com/file/d/1oK-QMZ40PO60PFWWsTmtK5ToFDkbL6R0/)), IsaacGymEnvs ([e860979](https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/tree/e86097999b88da28b5252be16f81c595bbb3fca5)). Versions newer than these should work, but have not been extensively tested yet. -2. Hardware: The method is developed using an internal version of AllegroHand. We also provide a reference implementation (see the *Training the Policy* section for details) and [video results](https://haozhi.io/hora/allegro_v4) using the public AllegroHand-v4. +1. Simulation: The repo is mainly developed and debugged using IsaacGym Preview 4.0 ([Download](https://drive.google.com/file/d/1StaRl_hzYFYbJegQcyT7-yjgutc6C7F9)). Please note the results will be inconsistent if you train with IsaacGym Preview 3.0. +2. Hardware: The method is developed using an internal version of AllegroHand. We also provide a reference implementation (but please refer to version [0.0.1](https://github.com/HaozhiQi/hora/tree/v0.0.1) readme) and [video results](https://haozhi.io/hora/allegro_v4) using the public AllegroHand-v4. 3. Results: The reward number in this repository are higher than what is reported in the paper. This is because we change the `reset` function order following [LeggedGym](https://github.com/leggedrobotics/legged_gym) instead of the one in [IsaacGymEnvs](https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/blob/e8f1c66b24/isaacgymenvs/tasks/base/vec_task.py). ## Installation @@ -67,8 +71,8 @@ This section can verify whether you install the repository and dependencies corr Download a pretrained policy: ``` cd outputs/AllegroHandHora/ -gdown 1AKecNsQZ56TCyJU49DU06GxnQRbeawMu -O hora.zip -unzip hora.zip -d ./hora +gdown 17fr40KQcUyFXz4W1ejuLTzRqP-Qu9EPS -O hora_v0.0.2.zip +unzip hora_v0.0.2.zip -d ./hora_v0.0.2 cd ../../ ``` @@ -76,27 +80,25 @@ The data structure should look like: ``` outputs/ AllegroHandHora/ - hora/ + hora_v0.0.2/ stage1_nn/ # stage 1 checkpoints - stage1_tb/ # stage 1 tensorboard records stage2_nn/ # stage 2 checkpoints - stage2_tb/ # stage 2 tensorboard records ``` Visualize it by running the following command. Note that stage 1 policy refers to the one trained with privileged object information while stage 2 policy refers to the one trained with proprioceptive history. The stage 2 policy is also what we deployed in the real-world. ``` # s1 and s2 stands for stage 1 and 2, respectively -scripts/vis_s1.sh hora -scripts/vis_s2.sh hora +scripts/vis_s1.sh hora_v0.0.2 +scripts/vis_s2.sh hora_v0.0.2 ``` Evaluate this two policies by running: ``` # change {GPU_ID} to a valid number -scripts/eval_s1.sh ${GPU_ID} hora -scripts/eval_s2.sh ${GPU_ID} hora +scripts/eval_s1.sh ${GPU_ID} hora_v0.0.2 +scripts/eval_s2.sh ${GPU_ID} hora_v0.0.2 ``` ## Training the Policy diff --git a/docs/changelog.md b/docs/changelog.md new file mode 100644 index 0000000..e2e1367 --- /dev/null +++ b/docs/changelog.md @@ -0,0 +1,7 @@ +# Changelog and Bugs + +December 1st, 2023. [v0.0.2]. +- Read angular velocity at control frequency. Previously the angular velocity is obtained at simulation frequency. We found this will result in oscillation behavior since the policy will learn to exploit this phenomenon. +- Remove the hand-crafted lower and upper of privileged information. +- Remove online mass randomization since it does not have effect after simulation is created. +- Change angular velocity max clip limit from 0.5 to 0.4, to compensate the higher rotation speed. \ No newline at end of file diff --git a/hora/tasks/allegro_hand_hora.py b/hora/tasks/allegro_hand_hora.py index c1acf76..cc9551d 100644 --- a/hora/tasks/allegro_hand_hora.py +++ b/hora/tasks/allegro_hand_hora.py @@ -10,7 +10,7 @@ import numpy as np from isaacgym import gymtorch from isaacgym import gymapi -from isaacgym.torch_utils import to_torch, unscale, quat_apply, tensor_clamp, torch_rand_float +from isaacgym.torch_utils import to_torch, unscale, quat_apply, tensor_clamp, torch_rand_float, quat_conjugate, quat_mul from glob import glob from hora.utils.misc import tprint from .base.vec_task import VecTask @@ -99,6 +99,8 @@ def __init__(self, config, sim_device, graphics_device_id, headless): self.rot_axis_buf = torch.zeros((self.num_envs, 3), device=self.device, dtype=torch.float) # useful buffers + self.object_rot_prev = self.object_rot.clone() + self.object_pos_prev = self.object_pos.clone() self.init_pose_buf = torch.zeros((self.num_envs, self.num_dofs), device=self.device, dtype=torch.float) self.actions = torch.zeros((self.num_envs, self.num_actions), device=self.device, dtype=torch.float) self.torques = torch.zeros((self.num_envs, self.num_actions), device=self.device, dtype=torch.float) @@ -197,7 +199,7 @@ def _create_envs(self, num_envs, spacing, num_per_row): num_scales = len(self.randomize_scale_list) obj_scale = np.random.uniform(self.randomize_scale_list[i % num_scales] - 0.025, self.randomize_scale_list[i % num_scales] + 0.025) self.gym.set_actor_scale(env_ptr, object_handle, obj_scale) - self._update_priv_buf(env_id=i, name='obj_scale', value=obj_scale, lower=0.6, upper=0.9) + self._update_priv_buf(env_id=i, name='obj_scale', value=obj_scale) obj_com = [0, 0, 0] if self.randomize_com: @@ -208,7 +210,7 @@ def _create_envs(self, num_envs, spacing, num_per_row): np.random.uniform(self.randomize_com_lower, self.randomize_com_upper)] prop[0].com.x, prop[0].com.y, prop[0].com.z = obj_com self.gym.set_actor_rigid_body_properties(env_ptr, object_handle, prop) - self._update_priv_buf(env_id=i, name='obj_com', value=obj_com, lower=-0.02, upper=0.02) + self._update_priv_buf(env_id=i, name='obj_com', value=obj_com) obj_friction = 1.0 if self.randomize_friction: @@ -223,7 +225,17 @@ def _create_envs(self, num_envs, spacing, num_per_row): p.friction = rand_friction self.gym.set_actor_rigid_shape_properties(env_ptr, object_handle, object_props) obj_friction = rand_friction - self._update_priv_buf(env_id=i, name='obj_friction', value=obj_friction, lower=0.0, upper=1.5) + self._update_priv_buf(env_id=i, name='obj_friction', value=obj_friction) + + if self.randomize_mass: + prop = self.gym.get_actor_rigid_body_properties(env_ptr, object_handle) + for p in prop: + p.mass = np.random.uniform(self.randomize_mass_lower, self.randomize_mass_upper) + self.gym.set_actor_rigid_body_properties(env_ptr, object_handle, prop) + self._update_priv_buf(env_id=i, name='obj_mass', value=prop[0].mass) + else: + prop = self.gym.get_actor_rigid_body_properties(env_ptr, object_handle) + self._update_priv_buf(env_id=i, name='obj_mass', value=prop[0].mass) if self.aggregate_mode > 0: self.gym.end_aggregate(env_ptr) @@ -236,24 +248,6 @@ def _create_envs(self, num_envs, spacing, num_per_row): self.object_indices = to_torch(self.object_indices, dtype=torch.long, device=self.device) def reset_idx(self, env_ids): - if self.randomize_mass: - lower, upper = self.randomize_mass_lower, self.randomize_mass_upper - - for env_id in env_ids: - env = self.envs[env_id] - handle = self.gym.find_actor_handle(env, 'object') - prop = self.gym.get_actor_rigid_body_properties(env, handle) - for p in prop: - p.mass = np.random.uniform(lower, upper) - self.gym.set_actor_rigid_body_properties(env, handle, prop) - self._update_priv_buf(env_id=env_id, name='obj_mass', value=prop[0].mass, lower=0, upper=0.2) - else: - for env_id in env_ids: - env = self.envs[env_id] - handle = self.gym.find_actor_handle(env, 'object') - prop = self.gym.get_actor_rigid_body_properties(env, handle) - self._update_priv_buf(env_id=env_id, name='obj_mass', value=prop[0].mass, lower=0, upper=0.2) - if self.randomize_pd_gains: self.p_gain[env_ids] = torch_rand_float( self.randomize_p_gain_lower, self.randomize_p_gain_upper, (len(env_ids), self.num_actions), @@ -331,33 +325,36 @@ def compute_reward(self, actions): # work and torque penalty torque_penalty = (self.torques ** 2).sum(-1) work_penalty = ((self.torques * self.dof_vel_finite_diff).sum(-1)) ** 2 - obj_linv_pscale = self.object_linvel_penalty_scale - pose_diff_pscale = self.pose_diff_penalty_scale - torque_pscale = self.torque_penalty_scale - work_pscale = self.work_penalty_scale - - self.rew_buf[:], log_r_reward, olv_penalty = compute_hand_reward( - self.object_linvel, obj_linv_pscale, - self.object_angvel, self.rot_axis_buf, self.rotate_reward_scale, - self.angvel_clip_max, self.angvel_clip_min, - pose_diff_penalty, pose_diff_pscale, - torque_penalty, torque_pscale, - work_penalty, work_pscale, + # Compute offset in radians. Radians -> radians / sec + angdiff = quat_to_axis_angle(quat_mul(self.object_rot, quat_conjugate(self.object_rot_prev))) + object_angvel = angdiff / (self.control_freq_inv * self.dt) + vec_dot = (object_angvel * self.rot_axis_buf).sum(-1) + rotate_reward = torch.clip(vec_dot, max=self.angvel_clip_max, min=self.angvel_clip_min) + # linear velocity: use position difference instead of self.object_linvel + object_linvel = ((self.object_pos - self.object_pos_prev) / (self.control_freq_inv * self.dt)).clone() + object_linvel_penalty = torch.norm(object_linvel, p=1, dim=-1) + + self.rew_buf[:] = compute_hand_reward( + object_linvel_penalty, self.object_linvel_penalty_scale, + rotate_reward, self.rotate_reward_scale, + pose_diff_penalty, self.pose_diff_penalty_scale, + torque_penalty, self.torque_penalty_scale, + work_penalty, self.work_penalty_scale, ) self.reset_buf[:] = self.check_termination(self.object_pos) - self.extras['rotation_reward'] = log_r_reward.mean() - self.extras['object_linvel_penalty'] = olv_penalty.mean() + self.extras['rotation_reward'] = rotate_reward.mean() + self.extras['object_linvel_penalty'] = object_linvel_penalty.mean() self.extras['pose_diff_penalty'] = pose_diff_penalty.mean() self.extras['work_done'] = work_penalty.mean() self.extras['torques'] = torque_penalty.mean() - self.extras['roll'] = self.object_angvel[:, 0].mean() - self.extras['pitch'] = self.object_angvel[:, 1].mean() - self.extras['yaw'] = self.object_angvel[:, 2].mean() + self.extras['roll'] = object_angvel[:, 0].mean() + self.extras['pitch'] = object_angvel[:, 1].mean() + self.extras['yaw'] = object_angvel[:, 2].mean() if self.evaluate: finished_episode_mask = self.reset_buf == 1 self.stat_sum_rewards += self.rew_buf.sum() - self.stat_sum_rotate_rewards += log_r_reward.sum() + self.stat_sum_rotate_rewards += rotate_reward.sum() self.stat_sum_torques += self.torques.abs().sum() self.stat_sum_obj_linvel += (self.object_linvel ** 2).sum(-1).sum() self.stat_sum_episode_length += (self.reset_buf == 0).sum() @@ -408,6 +405,8 @@ def pre_physics_step(self, actions): targets = self.prev_targets + 1 / 24 * self.actions self.cur_targets[:] = tensor_clamp(targets, self.allegro_hand_dof_lower_limits, self.allegro_hand_dof_upper_limits) self.prev_targets[:] = self.cur_targets.clone() + self.object_rot_prev[:] = self.object_rot + self.object_pos_prev[:] = self.object_pos if self.force_scale > 0.0: self.rb_forces *= torch.pow(self.force_decay, self.dt / self.force_decay_interval) @@ -618,23 +617,47 @@ def _init_object_pose(self): def compute_hand_reward( - object_linvel, object_linvel_penalty_scale: float, - object_angvel, rotation_axis, rotate_reward_scale: float, - angvel_clip_max: float, angvel_clip_min: float, + object_linvel_penalty, object_linvel_penalty_scale: float, + rotate_reward, rotate_reward_scale: float, pose_diff_penalty, pose_diff_penalty_scale: float, torque_penalty, torque_pscale: float, work_penalty, work_pscale: float, ): - rotate_reward_cond = (rotation_axis[:, -1] != 0).float() - vec_dot = (object_angvel * rotation_axis).sum(-1) - rotate_reward = torch.clip(vec_dot, max=angvel_clip_max, min=angvel_clip_min) - rotate_reward = rotate_reward_scale * rotate_reward * rotate_reward_cond - object_linvel_penalty = torch.norm(object_linvel, p=1, dim=-1) - - reward = rotate_reward + reward = rotate_reward_scale * rotate_reward # Distance from the hand to the object reward = reward + object_linvel_penalty * object_linvel_penalty_scale reward = reward + pose_diff_penalty * pose_diff_penalty_scale reward = reward + torque_penalty * torque_pscale reward = reward + work_penalty * work_pscale - return reward, rotate_reward, object_linvel_penalty + return reward + + +def quat_to_axis_angle(quaternions: torch.Tensor) -> torch.Tensor: + """ + Convert rotations given as quaternions to axis/angle. + Adapted from PyTorch3D: + https://pytorch3d.readthedocs.io/en/latest/_modules/pytorch3d/transforms/rotation_conversions.html#quaternion_to_axis_angle + Args: + quaternions: quaternions with real part last, + as tensor of shape (..., 4). + Returns: + Rotations given as a vector in axis angle form, as a tensor + of shape (..., 3), where the magnitude is the angle + turned anticlockwise in radians around the vector's + direction. + """ + norms = torch.norm(quaternions[..., :3], p=2, dim=-1, keepdim=True) + half_angles = torch.atan2(norms, quaternions[..., 3:]) + angles = 2 * half_angles + eps = 1e-6 + small_angles = angles.abs() < eps + sin_half_angles_over_angles = torch.empty_like(angles) + sin_half_angles_over_angles[~small_angles] = ( + torch.sin(half_angles[~small_angles]) / angles[~small_angles] + ) + # for x small, sin(x/2) is about x/2 - (x/2)^3/6 + # so sin(x/2)/x is about 1/2 - (x*x)/48 + sin_half_angles_over_angles[small_angles] = ( + 0.5 - (angles[small_angles] * angles[small_angles]) / 48 + ) + return quaternions[..., :3] / sin_half_angles_over_angles diff --git a/scripts/deploy.sh b/scripts/deploy.sh index 38c80c3..d017e6a 100755 --- a/scripts/deploy.sh +++ b/scripts/deploy.sh @@ -1,3 +1,3 @@ #!/bin/bash CACHE=$1 -python deploy.py checkpoint=outputs/AllegroHandHora/"${CACHE}"/stage2_nn/last.pth \ No newline at end of file +python deploy.py checkpoint=outputs/AllegroHandHora/"${CACHE}"/stage2_nn/model_last.ckpt