Merge pull request #2 from HaozhiQi/hqi.dev

Improvements and Bug Fix
HaozhiQi · Dec 3, 2023 · 539de4b · 539de4b
2 parents 2e36ae6 + 5aa5974
commit 539de4b
Show file tree

Hide file tree

Showing 4 changed files with 97 additions and 65 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
-# In-Hand Object Rotation via Rapid Motor Adaptation
+# In-Hand Object Rotation Codebase
 
-This repository contains a reference PyTorch implementation of the paper:
+This codebase is initially built for code release of the following paper:
 
 <b>In-Hand Object Rotation via Rapid Motor Adaptation</b> <br>
 [Haozhi Qi*](https://haozhi.io/),
@@ -17,11 +17,15 @@ Conference on Robot Learning (CoRL), 2022 <br>
   <img src="https://user-images.githubusercontent.com/10141467/204687717-bb649cb5-ab0f-4450-a98b-2d40788029f6.gif" width="1000"/>
 </p>
 
+After the initial release, we are still actively building this project by adding new features and resolving previous bugs. Therefore, some of the experiment number may be inconsistent from what was reported in the above paper. Please check out version [0.0.1](https://github.com/HaozhiQi/hora/tree/v0.0.1) if you want to reproduce the numbers reported in the paper.
+
+We also maintain a [changelog and bugs](docs/changelog.md) we found during this development process.
+
 ## Disclaimer
 
 It is worth noticing that:
-1. Simulation: The method is developed and debugged using IsaacGym Preview 3.0 ([Download](https://drive.google.com/file/d/1oK-QMZ40PO60PFWWsTmtK5ToFDkbL6R0/)), IsaacGymEnvs ([e860979](https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/tree/e86097999b88da28b5252be16f81c595bbb3fca5)). Versions newer than these should work, but have not been extensively tested yet.
-2. Hardware: The method is developed using an internal version of AllegroHand. We also provide a reference implementation (see the *Training the Policy* section for details) and [video results](https://haozhi.io/hora/allegro_v4) using the public AllegroHand-v4.
+1. Simulation: The repo is mainly developed and debugged using IsaacGym Preview 4.0 ([Download](https://drive.google.com/file/d/1StaRl_hzYFYbJegQcyT7-yjgutc6C7F9)). Please note the results will be inconsistent if you train with IsaacGym Preview 3.0.
+2. Hardware: The method is developed using an internal version of AllegroHand. We also provide a reference implementation (but please refer to version [0.0.1](https://github.com/HaozhiQi/hora/tree/v0.0.1) readme) and [video results](https://haozhi.io/hora/allegro_v4) using the public AllegroHand-v4.
 3. Results: The reward number in this repository are higher than what is reported in the paper. This is because we change the `reset` function order following [LeggedGym](https://github.com/leggedrobotics/legged_gym) instead of the one in [IsaacGymEnvs](https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/blob/e8f1c66b24/isaacgymenvs/tasks/base/vec_task.py).
 
 ## Installation
@@ -67,36 +71,34 @@ This section can verify whether you install the repository and dependencies corr
 Download a pretrained policy:
 ```
 cd outputs/AllegroHandHora/
-gdown 1AKecNsQZ56TCyJU49DU06GxnQRbeawMu -O hora.zip
-unzip hora.zip -d ./hora
+gdown 17fr40KQcUyFXz4W1ejuLTzRqP-Qu9EPS -O hora_v0.0.2.zip
+unzip hora_v0.0.2.zip -d ./hora_v0.0.2
 cd ../../
 ```
 
 The data structure should look like:
 ```
 outputs/
   AllegroHandHora/
-    hora/
+    hora_v0.0.2/
       stage1_nn/  # stage 1 checkpoints
-      stage1_tb/  # stage 1 tensorboard records
       stage2_nn/  # stage 2 checkpoints
-      stage2_tb/  # stage 2 tensorboard records
 ```
 
 Visualize it by running the following command. Note that stage 1 policy refers to the one trained with privileged object information while stage 2 policy refers to the one trained with proprioceptive history. The stage 2 policy is also what we deployed in the real-world.
 
 ```
 # s1 and s2 stands for stage 1 and 2, respectively
-scripts/vis_s1.sh hora
-scripts/vis_s2.sh hora
+scripts/vis_s1.sh hora_v0.0.2
+scripts/vis_s2.sh hora_v0.0.2
 ```
 
 Evaluate this two policies by running:
 
 ```
 # change {GPU_ID} to a valid number
-scripts/eval_s1.sh ${GPU_ID} hora
-scripts/eval_s2.sh ${GPU_ID} hora
+scripts/eval_s1.sh ${GPU_ID} hora_v0.0.2
+scripts/eval_s2.sh ${GPU_ID} hora_v0.0.2
 ```
 
 ## Training the Policy

diff --git a/docs/changelog.md b/docs/changelog.md
@@ -0,0 +1,7 @@
+# Changelog and Bugs
+
+December 1st, 2023. [v0.0.2]. 
+- Read angular velocity at control frequency. Previously the angular velocity is obtained at simulation frequency. We found this will result in oscillation behavior since the policy will learn to exploit this phenomenon.
+- Remove the hand-crafted lower and upper of privileged information.
+- Remove online mass randomization since it does not have effect after simulation is created.
+- Change angular velocity max clip limit from 0.5 to 0.4, to compensate the higher rotation speed.
diff --git a/hora/tasks/allegro_hand_hora.py b/hora/tasks/allegro_hand_hora.py
@@ -10,7 +10,7 @@
 import numpy as np
 from isaacgym import gymtorch
 from isaacgym import gymapi
-from isaacgym.torch_utils import to_torch, unscale, quat_apply, tensor_clamp, torch_rand_float
+from isaacgym.torch_utils import to_torch, unscale, quat_apply, tensor_clamp, torch_rand_float, quat_conjugate, quat_mul
 from glob import glob
 from hora.utils.misc import tprint
 from .base.vec_task import VecTask
@@ -99,6 +99,8 @@ def __init__(self, config, sim_device, graphics_device_id, headless):
         self.rot_axis_buf = torch.zeros((self.num_envs, 3), device=self.device, dtype=torch.float)
 
         # useful buffers
+        self.object_rot_prev = self.object_rot.clone()
+        self.object_pos_prev = self.object_pos.clone()
         self.init_pose_buf = torch.zeros((self.num_envs, self.num_dofs), device=self.device, dtype=torch.float)
         self.actions = torch.zeros((self.num_envs, self.num_actions), device=self.device, dtype=torch.float)
         self.torques = torch.zeros((self.num_envs, self.num_actions), device=self.device, dtype=torch.float)
@@ -197,7 +199,7 @@ def _create_envs(self, num_envs, spacing, num_per_row):
                 num_scales = len(self.randomize_scale_list)
                 obj_scale = np.random.uniform(self.randomize_scale_list[i % num_scales] - 0.025, self.randomize_scale_list[i % num_scales] + 0.025)
             self.gym.set_actor_scale(env_ptr, object_handle, obj_scale)
-            self._update_priv_buf(env_id=i, name='obj_scale', value=obj_scale, lower=0.6, upper=0.9)
+            self._update_priv_buf(env_id=i, name='obj_scale', value=obj_scale)
 
             obj_com = [0, 0, 0]
             if self.randomize_com:
@@ -208,7 +210,7 @@ def _create_envs(self, num_envs, spacing, num_per_row):
                            np.random.uniform(self.randomize_com_lower, self.randomize_com_upper)]
                 prop[0].com.x, prop[0].com.y, prop[0].com.z = obj_com
                 self.gym.set_actor_rigid_body_properties(env_ptr, object_handle, prop)
-            self._update_priv_buf(env_id=i, name='obj_com', value=obj_com, lower=-0.02, upper=0.02)
+            self._update_priv_buf(env_id=i, name='obj_com', value=obj_com)
 
             obj_friction = 1.0
             if self.randomize_friction:
@@ -223,7 +225,17 @@ def _create_envs(self, num_envs, spacing, num_per_row):
                     p.friction = rand_friction
                 self.gym.set_actor_rigid_shape_properties(env_ptr, object_handle, object_props)
                 obj_friction = rand_friction
-            self._update_priv_buf(env_id=i, name='obj_friction', value=obj_friction, lower=0.0, upper=1.5)
+            self._update_priv_buf(env_id=i, name='obj_friction', value=obj_friction)
+
+            if self.randomize_mass:
+                prop = self.gym.get_actor_rigid_body_properties(env_ptr, object_handle)
+                for p in prop:
+                    p.mass = np.random.uniform(self.randomize_mass_lower, self.randomize_mass_upper)
+                self.gym.set_actor_rigid_body_properties(env_ptr, object_handle, prop)
+                self._update_priv_buf(env_id=i, name='obj_mass', value=prop[0].mass)
+            else:
+                prop = self.gym.get_actor_rigid_body_properties(env_ptr, object_handle)
+                self._update_priv_buf(env_id=i, name='obj_mass', value=prop[0].mass)
 
             if self.aggregate_mode > 0:
                 self.gym.end_aggregate(env_ptr)
@@ -236,24 +248,6 @@ def _create_envs(self, num_envs, spacing, num_per_row):
         self.object_indices = to_torch(self.object_indices, dtype=torch.long, device=self.device)
 
     def reset_idx(self, env_ids):
-        if self.randomize_mass:
-            lower, upper = self.randomize_mass_lower, self.randomize_mass_upper
-
-            for env_id in env_ids:
-                env = self.envs[env_id]
-                handle = self.gym.find_actor_handle(env, 'object')
-                prop = self.gym.get_actor_rigid_body_properties(env, handle)
-                for p in prop:
-                    p.mass = np.random.uniform(lower, upper)
-                self.gym.set_actor_rigid_body_properties(env, handle, prop)
-                self._update_priv_buf(env_id=env_id, name='obj_mass', value=prop[0].mass, lower=0, upper=0.2)
-        else:
-            for env_id in env_ids:
-                env = self.envs[env_id]
-                handle = self.gym.find_actor_handle(env, 'object')
-                prop = self.gym.get_actor_rigid_body_properties(env, handle)
-                self._update_priv_buf(env_id=env_id, name='obj_mass', value=prop[0].mass, lower=0, upper=0.2)
-
         if self.randomize_pd_gains:
             self.p_gain[env_ids] = torch_rand_float(
                 self.randomize_p_gain_lower, self.randomize_p_gain_upper, (len(env_ids), self.num_actions),
@@ -331,33 +325,36 @@ def compute_reward(self, actions):
         # work and torque penalty
         torque_penalty = (self.torques ** 2).sum(-1)
         work_penalty = ((self.torques * self.dof_vel_finite_diff).sum(-1)) ** 2
-        obj_linv_pscale = self.object_linvel_penalty_scale
-        pose_diff_pscale = self.pose_diff_penalty_scale
-        torque_pscale = self.torque_penalty_scale
-        work_pscale = self.work_penalty_scale
-
-        self.rew_buf[:], log_r_reward, olv_penalty = compute_hand_reward(
-            self.object_linvel, obj_linv_pscale,
-            self.object_angvel, self.rot_axis_buf, self.rotate_reward_scale,
-            self.angvel_clip_max, self.angvel_clip_min,
-            pose_diff_penalty, pose_diff_pscale,
-            torque_penalty, torque_pscale,
-            work_penalty, work_pscale,
+        # Compute offset in radians. Radians -> radians / sec
+        angdiff = quat_to_axis_angle(quat_mul(self.object_rot, quat_conjugate(self.object_rot_prev)))
+        object_angvel = angdiff / (self.control_freq_inv * self.dt)
+        vec_dot = (object_angvel * self.rot_axis_buf).sum(-1)
+        rotate_reward = torch.clip(vec_dot, max=self.angvel_clip_max, min=self.angvel_clip_min)
+        # linear velocity: use position difference instead of self.object_linvel
+        object_linvel = ((self.object_pos - self.object_pos_prev) / (self.control_freq_inv * self.dt)).clone()
+        object_linvel_penalty = torch.norm(object_linvel, p=1, dim=-1)
+
+        self.rew_buf[:] = compute_hand_reward(
+            object_linvel_penalty, self.object_linvel_penalty_scale,
+            rotate_reward, self.rotate_reward_scale,
+            pose_diff_penalty, self.pose_diff_penalty_scale,
+            torque_penalty, self.torque_penalty_scale,
+            work_penalty, self.work_penalty_scale,
         )
         self.reset_buf[:] = self.check_termination(self.object_pos)
-        self.extras['rotation_reward'] = log_r_reward.mean()
-        self.extras['object_linvel_penalty'] = olv_penalty.mean()
+        self.extras['rotation_reward'] = rotate_reward.mean()
+        self.extras['object_linvel_penalty'] = object_linvel_penalty.mean()
         self.extras['pose_diff_penalty'] = pose_diff_penalty.mean()
         self.extras['work_done'] = work_penalty.mean()
         self.extras['torques'] = torque_penalty.mean()
-        self.extras['roll'] = self.object_angvel[:, 0].mean()
-        self.extras['pitch'] = self.object_angvel[:, 1].mean()
-        self.extras['yaw'] = self.object_angvel[:, 2].mean()
+        self.extras['roll'] = object_angvel[:, 0].mean()
+        self.extras['pitch'] = object_angvel[:, 1].mean()
+        self.extras['yaw'] = object_angvel[:, 2].mean()
 
         if self.evaluate:
             finished_episode_mask = self.reset_buf == 1
             self.stat_sum_rewards += self.rew_buf.sum()
-            self.stat_sum_rotate_rewards += log_r_reward.sum()
+            self.stat_sum_rotate_rewards += rotate_reward.sum()
             self.stat_sum_torques += self.torques.abs().sum()
             self.stat_sum_obj_linvel += (self.object_linvel ** 2).sum(-1).sum()
             self.stat_sum_episode_length += (self.reset_buf == 0).sum()
@@ -408,6 +405,8 @@ def pre_physics_step(self, actions):
         targets = self.prev_targets + 1 / 24 * self.actions
         self.cur_targets[:] = tensor_clamp(targets, self.allegro_hand_dof_lower_limits, self.allegro_hand_dof_upper_limits)
         self.prev_targets[:] = self.cur_targets.clone()
+        self.object_rot_prev[:] = self.object_rot
+        self.object_pos_prev[:] = self.object_pos
 
         if self.force_scale > 0.0:
             self.rb_forces *= torch.pow(self.force_decay, self.dt / self.force_decay_interval)
@@ -618,23 +617,47 @@ def _init_object_pose(self):
 
 
 def compute_hand_reward(
-    object_linvel, object_linvel_penalty_scale: float,
-    object_angvel, rotation_axis, rotate_reward_scale: float,
-    angvel_clip_max: float, angvel_clip_min: float,
+    object_linvel_penalty, object_linvel_penalty_scale: float,
+    rotate_reward, rotate_reward_scale: float,
     pose_diff_penalty, pose_diff_penalty_scale: float,
     torque_penalty, torque_pscale: float,
     work_penalty, work_pscale: float,
 ):
-    rotate_reward_cond = (rotation_axis[:, -1] != 0).float()
-    vec_dot = (object_angvel * rotation_axis).sum(-1)
-    rotate_reward = torch.clip(vec_dot, max=angvel_clip_max, min=angvel_clip_min)
-    rotate_reward = rotate_reward_scale * rotate_reward * rotate_reward_cond
-    object_linvel_penalty = torch.norm(object_linvel, p=1, dim=-1)
-
-    reward = rotate_reward
+    reward = rotate_reward_scale * rotate_reward
     # Distance from the hand to the object
     reward = reward + object_linvel_penalty * object_linvel_penalty_scale
     reward = reward + pose_diff_penalty * pose_diff_penalty_scale
     reward = reward + torque_penalty * torque_pscale
     reward = reward + work_penalty * work_pscale
-    return reward, rotate_reward, object_linvel_penalty
+    return reward
+
+
+def quat_to_axis_angle(quaternions: torch.Tensor) -> torch.Tensor:
+    """
+    Convert rotations given as quaternions to axis/angle.
+    Adapted from PyTorch3D:
+    https://pytorch3d.readthedocs.io/en/latest/_modules/pytorch3d/transforms/rotation_conversions.html#quaternion_to_axis_angle
+    Args:
+        quaternions: quaternions with real part last,
+            as tensor of shape (..., 4).
+    Returns:
+        Rotations given as a vector in axis angle form, as a tensor
+            of shape (..., 3), where the magnitude is the angle
+            turned anticlockwise in radians around the vector's
+            direction.
+    """
+    norms = torch.norm(quaternions[..., :3], p=2, dim=-1, keepdim=True)
+    half_angles = torch.atan2(norms, quaternions[..., 3:])
+    angles = 2 * half_angles
+    eps = 1e-6
+    small_angles = angles.abs() < eps
+    sin_half_angles_over_angles = torch.empty_like(angles)
+    sin_half_angles_over_angles[~small_angles] = (
+        torch.sin(half_angles[~small_angles]) / angles[~small_angles]
+    )
+    # for x small, sin(x/2) is about x/2 - (x/2)^3/6
+    # so sin(x/2)/x is about 1/2 - (x*x)/48
+    sin_half_angles_over_angles[small_angles] = (
+        0.5 - (angles[small_angles] * angles[small_angles]) / 48
+    )
+    return quaternions[..., :3] / sin_half_angles_over_angles
diff --git a/scripts/deploy.sh b/scripts/deploy.sh
@@ -1,3 +1,3 @@
 #!/bin/bash
 CACHE=$1
-python deploy.py checkpoint=outputs/AllegroHandHora/"${CACHE}"/stage2_nn/last.pth
+python deploy.py checkpoint=outputs/AllegroHandHora/"${CACHE}"/stage2_nn/model_last.ckpt