# Explanation

Training reinforcement learning policies in simulation allows parallelization and scale that far surpasses that of using real world training data since simulations can be run on demand, limited only by the available compute. However, training successful reinforcement learning policies requires simulations to have highly accurate contact and rigid body dynamics or else learned models won't transfer to reality, and also requires efficient calculations in simulation.

Prior to MuJoCo, no simulator really suited the needs of robotics simulation training for these reasons. MuJoCo (Multi-Joint Dynamics with Contact) introduced a fast, parallelizable, and accurate simulation system with suitable accuracy for robotics training. It was introduced as open source software, and has been used for most robotics simulation training since.

We will see that the quality and accuracy of the simulation software is a direct constraint on how well the learned models transfer to reality.

# Notes

> Existing physics engines can be used to test controllers that are
> already designed. However they lack the speed, accuracy and overall feature sets needed to automate the controller design process itself.

Current physics engines aren’t fast enough for controller design. Tools that are used for controller design don’t have physics simulation capabilities.

They suggest that the absence of good simulation tools to design controllers may be one reason modern robots perform poorly.

> We believe that numerical optimization is the most powerful and generally applicable tool for automating processes that would otherwise require human intelligence.

This is the design philosophy behind MuJoCo. They’re also right. Numerical optimization underlies ML, and probably the human brain. What does this suggest about numbers and information.

> The essence of control optimization is to automatically construct many candidate controllers, evaluate their performance in simulation, and use the data to construct better controllers.

The design process that motivated the creation of MuJoCo.

> Either way, optimizing a controller requires a vast number of dynamics evaluations for different states and controls.

In a recent work, they needed 200,000,000 evaluations, which took 10 minutes using their software, and 1 month on the previous standard software (OpenDynamics Engine [ODE]). This is a 3 order-of-magnitude increase.

This increase comes from better compute utilization, parallelization, and higher accuracy/stability allowing large time steps per calculation.

> In the context of control optimization, however, the controller is being
> "tuned" to the engine and not the other way around.

If the physics engine allows cheating, the controller will exploit this cheat. So the engine has to be accurate.

Prior physics engines were limited by either enforcing joint constraints numerically, or ignoring contact dynamics, neither of which is sufficient for robotics.

> These observations indicated that we need a new engine, representing the state in joint coordinates and simulating contacts in ways that are related to LCP but better.

So they made MuJoCo - **Mu**lti-**Jo**int Dynamics with **Co**ntact.

Contact dynamics simulation is still an area of active development, unlike smooth multi-joint dynamics which is solved.

MuJoCo is also built with several added benefits on top of a traditional simulator, like evaluating systems in parallel (useful for ML), inverse dynamics, a convenient language/compatibility, etc.

### Algorithmic Foundations

**1. Equations of Motion and Smooth Dynamics**

They use the following quantities:

| **Symbol**       | **Value**                                              | **Meaning**                                                                                                                        |
| ---------------- | ------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- |
| $\textrm{q}$     | position in generalized coordinates                    | The momentary state of the entire system. The end goal of simulation is just to render accurate positions over time.               |
| $\textrm{v}$     | velocity in generalized coordinates                    | The momentary velocities of the entire system (changes in $\textrm{q}$).                                                           |
| $M$              | inertia matrix in generalized coordinates              | Specifies how mass is distributed throughout the system to resist change in motion.                                                |
| $\textrm{b}$     | “bias” forces: Coriolis, centrifugal, gravity, springs | Forces external to the system. Ex: the forces on Earth                                                                             |
| $\tau$           | external/applied forces                                | Forces applied on the system in simulation. Ex: resisting forces applied to an actuator.                                           |
| $\phi$           | equality constraints: $\phi(\textrm{q}) = 0$           | The constraints for what can’t happen, like rigid-body overlap, and contact force applied only when touching.                      |
| $J_E$            | Jacobian of equality constraints                       | How changes to the environment would change equality constraints                                                                   |
| $\textrm{v}^*_E$ | desired velocity in equality constraint coordinates    | Defines how quickly the system will readjust to fix itself when an equality constraint is violated                                 |
| $\textrm{f}_E$   | impulse caused by equality constraints                 | The forces caused by maintaining the equality constraints (like those implied by a stationary object).                             |
| $J_C$            | Jacobian of active contacts                            | Maps how changes in generalized coordinates of link positions/joints change the position/velocity of the system and contact points |
| $\textrm{v}_C$   | velocity in contact coordinates                        | How the contact coordinates are moving over time. Useful for modeling contact behavior, like friction.                             |
| $\textrm{f}_C$   | impulse caused by contacts                             | The forces caused by maintaining contact equality constraints; objects don’t penetrate each other so they create forces instead.   |
| $\textrm{h}$     | time step                                              | Shorter time step means more accuracy but requires more computational resources.                                                   |

The first calculation is the standard motion and smooth dynamics calculations in continuous time, representing the end calculation of how all the bodies move.

They calculate this with the following steps:

1. Compute the positions and orientations of all rigid bodies (forward kinematics); detect potential collisions; construct Jacobians $J_E$, $J_C$
2. Compute the inertia matrix $M$ and the bias forces $\textrm{b}$
3. Express the equality constraint impulse $f_E$ as a function of the (unknown) $f_C$ contact impulses, calculated later. Apply constraint stabilization.
4. Solve for $f_C$ and $v_C$
5. Integrate everything numerically to get the next state.

Steps 3, 4, and 5 involved complex calculations of contact impulses that MuJoCo has implemented their own algorithms for

**2. Solving for the Contact Impulse**

Then they have to solve for the contact impulses which determine the forces of all the different rigid bodies on each other.

Instead of using the standard approach, MuJoCo uses 3 of their own algorithms for this step.

**3. Implicit Complementarity Solver**

The most accurate MuJoCo solver computes an exact solution for steps 3, 4, 5 using the complementarity constraint (2 rigid bodies either have a force and are in contact, or have no force and are not in contact).

**4. Convex Solver**

A trade-off for the prior solver, which is slightly less accurate but can be computed far more efficiently.

**5. Diagonal Solver**

The least accurate but fastest contact solver.

**6. Computational Complexity**

> The bottleneck now is in memory access. Thus the performance of physics engines such as MuJoCo tends to be dominated by cache misses more than traditional computational complexity considerations, and the only way to assess performance reliably is to run extensive timing tests.

The speed of simulation is constrained by compute.

**7. Inverse Dynamics**

> We now describe the computation of inverse dynamics, which is a unique feature of MuJoCo.

Most physics simulators don’t have inverse dynamics capabilities like MuJoCo.

This is useful for computing torques that could be used to make a robot follow a specific trajectory.

### Modeling

**1. Different ways to construct a MuJoCo model**

There are 3 different formats to make a MuJoCo model, which all contain the same information:

1. XML in MJCF file
2. C++ API calls for model construction
3. C generated by the compiler

They XML file just defines a structure to define the C++ API, which is all eventually compiled into the C.

![Screenshot 2024-11-05 at 10.40.07 AM.png](../../../images/Screenshot_2024-11-05_at_10.40.07_AM.png)

Missing information for the simulation is filled in to defaults.

**2. Elements of a MuJoCo model**

1. **Bodies** - Elements used to build kinematic trees
2. **Joints** - Define degrees of freedom between a body and it’s parents
3. **DOF** - Degree of freedom available
4. **Geom** - Massless geometric objects used for collisions
5. **Site** - Points of interest
6. **Constraint** - Impose any kinematic equality constraints like 3D position constraints, joint angle constraints, etc.
7. **Tendon** - Spatial paths that can be used for actuation
8. **Actuator** - Have control inputs, activation states (for pneumatics), and gains.

### Timing Tests

MuJoCo has comparable speed to SD/FAST.

> On a single desktop machine, we are able to run nearly 400,000 evaluations per second including contact dynamics.

### Summary

> In terms of smooth multi-joint dynamics, single-threaded MuJoCo is comparable to SD/FAST

> MuJoCo was developed to enable our research in model based control.

> The experience so far indicates that it is a very useful and widely applicable tool, that can accelerate progress in robotic control. Thus we have decided to make it publicly available. It will be free for non-profit research.
