Feature request: Faster minimizer #334

jchodera · 2014-02-15T01:23:04Z

The existing L-BFGS minimizer works very well, but is quite slow for CUDA and OpenCL due to the need to push positions and pull forces over the host bus every step. It would be great to have a minimizer that ran exclusively on the GPU and was perhaps "good enough" for many tasks, but faster.

I've implemented a very rudimentary CustomIntegrator based version as an example, but more sophisticated possibilities exist, and a custom kernel based approach could be superior.

def GradientDescentMinimizationIntegrator(initial_step_size=0.01*units.angstroms):
    """
    Construct a simple gradient descent minimization integrator.

    Parameters
    ----------
    initial_step_size : numpy.unit.Quantity compatible with nanometers, default: 0.01*simtk.unit.angstroms
        The norm of the initial step size guess.

    Returns
    -------
    integrator : simtk.openmm.CustomIntegrator
        A velocity Verlet integrator.

    Notes
    -----
    An adaptive step size is used.

    Examples
    --------

    Create a gradient descent minimization integrator.

    >>> integrator = GradientDescentMinimizationIntegrator()

    """

    timestep = 1.0 * units.femtoseconds
    integrator = mm.CustomIntegrator(timestep)

    integrator.addGlobalVariable("step_size", initial_step_size/units.nanometers)
    integrator.addGlobalVariable("energy_old", 0)
    integrator.addGlobalVariable("energy_new", 0)
    integrator.addGlobalVariable("delta_energy", 0)
    integrator.addGlobalVariable("accept", 0)
    integrator.addGlobalVariable("fnorm2", 0)
    integrator.addPerDofVariable("x_old", 0)

    # Update context state.
    integrator.addUpdateContextState()

    # Constrain positions.
    integrator.addConstrainPositions()

    # Store old energy and positions.
    integrator.addComputeGlobal("energy_old", "energy")
    integrator.addComputePerDof("x_old", "x")

    # Compute sum of squared norm.
    integrator.addComputeSum("fnorm2", "f^2")

    # Take step.
    integrator.addComputePerDof("x", "x+step_size*f/sqrt(fnorm2 + delta(fnorm2))")
    integrator.addConstrainPositions()

    # Ensure we only keep steps that go downhill in energy.
    integrator.addComputeGlobal("energy_new", "energy")
    integrator.addComputeGlobal("delta_energy", "energy_new-energy_old")
    # Accept also checks for NaN
    integrator.addComputeGlobal("accept", "step(-delta_energy) * delta(energy - energy_new)")

    integrator.addComputePerDof("x", "accept*x + (1-accept)*x_old")

    # Update step size.
    integrator.addComputeGlobal("step_size", "step_size * (2.0*accept + 0.5*(1-accept))")

    return integrator

This has been submitted to the simtk.org feature request tracker:
https://simtk.org/tracker/index.php?func=detail&aid=1987&group_id=161&atid=436

The text was updated successfully, but these errors were encountered:

rmcgibbo · 2014-02-15T01:29:50Z

When you run the BFGS minimizer on the GPU, do you know what the ratio of time spent on the GPU calculating forces is to the overhead of the data transfer and the calculation on the CPU? What is the potential maximum speedup that could come from moving the optimizer onto the GPU?

jlmaccal · 2014-02-15T01:46:43Z

+1 for a faster minimizer

jchodera · 2014-02-15T01:56:09Z

Well, for a system with only 1552 atoms, this is a factor of ten:

LocalEnergyMinimizer       took    1.650 s for 100 steps of gradient descent.
CustomIntegrator minimizer took   11.116 s for 100 steps of gradient descent.

Though that should obviously be "L-BFGS" instead of "gradient descent" for the LocalEnergyMinimizer.

I haven't done any profiling, just raw timings.

jchodera · 2014-02-15T01:58:03Z

I mean, the L-BFGS implementation is great---robust and parsimonious in its use of force calls, and quadratically convergent near the minimum.

But, you know, this is slow to have in an inner loop:

    context.setPositions(positions);
    context.computeVirtualSites();
    State state = context.getState(State::Forces | State::Energy);
    const vector<Vec3>& forces = state.getForces();

rmcgibbo · 2014-02-15T02:09:04Z

L-BFGS was 10x faster than your CustomIntegrator to do the same number of steps? Did you have that backward -- or if not what exactly is the use case for this CustomIntegrator?

jchodera · 2014-02-15T02:26:57Z

L-BFGS was 10x faster than your CustomIntegrator to do the same number of steps? Did you have that backward -- or if not what exactly is the use case for this CustomIntegrator?

Whoops, I think I had that backwards. Let me try that again.

jchodera · 2014-02-15T02:36:22Z

OK, I had completely messed that up somehow.

The timing for villin (1VII) in implicit solvent is actually much closer than I thought:

LocalEnergyMinimizer for 100 steps:    1.042 s
CustomIntegrator     for 100 steps:    0.837 s

jchodera · 2014-02-15T02:42:54Z

Also not that different for explicit solvent (8701 atoms):

LocalEnergyMinimizer for 100 steps:    2.426 s
CustomIntegrator     for 100 steps:    1.580 s

OK, maybe I'm totally wrong here.

jchodera · 2014-02-15T02:44:07Z

Also, I'll note the first time you use the CustomIntegrator version there is a large startup time for kernel compilation:

LocalEnergyMinimizer for 100 steps:    2.914 s
CustomIntegrator     for 100 steps:   11.790 s
LocalEnergyMinimizer for 100 steps:    2.431 s
CustomIntegrator     for 100 steps:    1.617 s

rmcgibbo · 2014-02-15T02:58:22Z

So the overhead for transferring the data back and forth between the CPU and GPU in LocalEnergyMinimizer is like between 20% and 2x. Obviously this depends on system size and the bus speeds, but it's not huge. IMO, this doesn't seem like a big win.

jchodera · 2014-02-15T02:59:37Z

Agreed. This needs more research before it's clearly beneficial. Closing for now.

jchodera closed this as completed Feb 15, 2014

sef43 mentioned this issue Nov 24, 2023

GPU LBFGS Minimizer #1237

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Faster minimizer #334

Feature request: Faster minimizer #334

jchodera commented Feb 15, 2014

rmcgibbo commented Feb 15, 2014

jlmaccal commented Feb 15, 2014

jchodera commented Feb 15, 2014

jchodera commented Feb 15, 2014

rmcgibbo commented Feb 15, 2014

jchodera commented Feb 15, 2014

jchodera commented Feb 15, 2014

jchodera commented Feb 15, 2014

jchodera commented Feb 15, 2014

rmcgibbo commented Feb 15, 2014

jchodera commented Feb 15, 2014

Feature request: Faster minimizer #334

Feature request: Faster minimizer #334

Comments

jchodera commented Feb 15, 2014

rmcgibbo commented Feb 15, 2014

jlmaccal commented Feb 15, 2014

jchodera commented Feb 15, 2014

jchodera commented Feb 15, 2014

rmcgibbo commented Feb 15, 2014

jchodera commented Feb 15, 2014

jchodera commented Feb 15, 2014

jchodera commented Feb 15, 2014

jchodera commented Feb 15, 2014

rmcgibbo commented Feb 15, 2014

jchodera commented Feb 15, 2014