# **HalfCheetah-v4 Environment Description**

## **Action Space**
The action space is a `Box(-1, 1, (6,), float32)`. Each action represents the torques applied between the joints.

| Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit         |
|-----|--------|-------------|-------------|----------------------------------|-------|--------------|
| 0   | Torque applied on the back thigh rotor | -1 | 1 | `bthigh` | hinge | torque (N m) |
| 1   | Torque applied on the back shin rotor  | -1 | 1 | `bshin`  | hinge | torque (N m) |
| 2   | Torque applied on the back foot rotor  | -1 | 1 | `bfoot`  | hinge | torque (N m) |
| 3   | Torque applied on the front thigh rotor | -1 | 1 | `fthigh` | hinge | torque (N m) |
| 4   | Torque applied on the front shin rotor | -1 | 1 | `fshin`  | hinge | torque (N m) |
| 5   | Torque applied on the front foot rotor | -1 | 1 | `ffoot`  | hinge | torque (N m) |

---

## **Observation Space**
By default, the observation is a `ndarray` with shape `(17,)`, consisting of positional values of different body parts of the cheetah, followed by their velocities.

### **Observation Dimensions:**
| Num | Observation                                  | Min  | Max  | Name (in corresponding XML file) | Joint | Unit               |
|-----|----------------------------------------------|------|------|----------------------------------|-------|--------------------|
| 0   | z-coordinate of the front tip                | -Inf | Inf  | `rootz`                          | slide | position (m)        |
| 1   | Angle of the front tip                       | -Inf | Inf  | `rooty`                          | hinge | angle (rad)         |
| 2   | Angle of the second rotor                    | -Inf | Inf  | `bthigh`                         | hinge | angle (rad)         |
| 3   | Angle of the second rotor                    | -Inf | Inf  | `bshin`                          | hinge | angle (rad)         |
| 4   | Velocity of the tip along the x-axis         | -Inf | Inf  | `bfoot`                          | hinge | velocity (rad/s)    |
| 5   | Velocity of the tip along the y-axis         | -Inf | Inf  | `fthigh`                         | hinge | velocity (rad/s)    |
| 6   | Angular velocity of the front tip            | -Inf | Inf  | `fshin`                          | hinge | angular velocity (rad/s) |
| 7   | Angular velocity of the second rotor         | -Inf | Inf  | `ffoot`                          | hinge | angular velocity (rad/s) |
| 8   | x-coordinate of the front tip                | -Inf | Inf  | `rootx`                          | slide | velocity (m/s)      |
| 9   | y-coordinate of the front tip                | -Inf | Inf  | `rootz`                          | slide | velocity (m/s)      |
| 10  | Angle of the front tip                       | -Inf | Inf  | `rooty`                          | hinge | angular velocity (rad/s) |
| 11  | Angle of the second rotor                    | -Inf | Inf  | `bthigh`                         | hinge | angular velocity (rad/s) |
| 12  | Angle of the second rotor                    | -Inf | Inf  | `bshin`                          | hinge | angular velocity (rad/s) |
| 13  | Velocity of the tip along the x-axis         | -Inf | Inf  | `bfoot`                          | hinge | angular velocity (rad/s) |
| 14  | Velocity of the tip along the y-axis         | -Inf | Inf  | `fthigh`                         | hinge | angular velocity (rad/s) |
| 15  | Angular velocity of the front tip            | -Inf | Inf  | `fshin`                          | hinge | angular velocity (rad/s) |
| 16  | Angular velocity of the second rotor         | -Inf | Inf  | `ffoot`                          | hinge | angular velocity (rad/s) |

---

## **Rewards**

The total reward is calculated as:

- **Forward Reward**: Encourages moving forward. It is measured as:
  \[
  \text{{forward_reward}} = \text{{forward_reward_weight}} \times \frac{{\text{{x-coordinate before action}} - \text{{x-coordinate after action}}}}{dt}
  \]
  Default `dt = 0.05` (based on `frame_skip = 5` and `frametime = 0.01`).

- **Control Cost**: Penalizes large actions. It is calculated as:
  \[
  \text{{ctrl_cost}} = \text{{ctrl_cost_weight}} \times \sum(\text{{action}}^2)
  \]
  The total reward returned is:
  \[
  \text{{reward}} = \text{{forward_reward}} - \text{{ctrl_cost}}
  \]
  The individual reward components are also provided in `info`.

---

## **Starting State**
The initial state has 17 dimensions, starting as:

\[
(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
\]

- **Positional values**: The first 8 values.
- **Velocity values**: The last 9 values.

Noise is added to the initial state for stochasticity:
- Positional values get uniform noise in `[-reset_noise_scale, reset_noise_scale]`.
- Velocity values get standard normal noise.

---

## **Episode End**
The episode ends (truncates) if:
- The episode length exceeds 1000 timesteps.