Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Faster minimizer #334

Closed
jchodera opened this issue Feb 15, 2014 · 11 comments
Closed

Feature request: Faster minimizer #334

jchodera opened this issue Feb 15, 2014 · 11 comments

Comments

@jchodera
Copy link
Member

The existing L-BFGS minimizer works very well, but is quite slow for CUDA and OpenCL due to the need to push positions and pull forces over the host bus every step. It would be great to have a minimizer that ran exclusively on the GPU and was perhaps "good enough" for many tasks, but faster.

I've implemented a very rudimentary CustomIntegrator based version as an example, but more sophisticated possibilities exist, and a custom kernel based approach could be superior.

def GradientDescentMinimizationIntegrator(initial_step_size=0.01*units.angstroms):
    """
    Construct a simple gradient descent minimization integrator.

    Parameters
    ----------
    initial_step_size : numpy.unit.Quantity compatible with nanometers, default: 0.01*simtk.unit.angstroms
        The norm of the initial step size guess.

    Returns
    -------
    integrator : simtk.openmm.CustomIntegrator
        A velocity Verlet integrator.

    Notes
    -----
    An adaptive step size is used.

    Examples
    --------

    Create a gradient descent minimization integrator.

    >>> integrator = GradientDescentMinimizationIntegrator()

    """

    timestep = 1.0 * units.femtoseconds
    integrator = mm.CustomIntegrator(timestep)

    integrator.addGlobalVariable("step_size", initial_step_size/units.nanometers)
    integrator.addGlobalVariable("energy_old", 0)
    integrator.addGlobalVariable("energy_new", 0)
    integrator.addGlobalVariable("delta_energy", 0)
    integrator.addGlobalVariable("accept", 0)
    integrator.addGlobalVariable("fnorm2", 0)
    integrator.addPerDofVariable("x_old", 0)

    # Update context state.
    integrator.addUpdateContextState()

    # Constrain positions.
    integrator.addConstrainPositions()

    # Store old energy and positions.
    integrator.addComputeGlobal("energy_old", "energy")
    integrator.addComputePerDof("x_old", "x")

    # Compute sum of squared norm.
    integrator.addComputeSum("fnorm2", "f^2")

    # Take step.
    integrator.addComputePerDof("x", "x+step_size*f/sqrt(fnorm2 + delta(fnorm2))")
    integrator.addConstrainPositions()

    # Ensure we only keep steps that go downhill in energy.
    integrator.addComputeGlobal("energy_new", "energy")
    integrator.addComputeGlobal("delta_energy", "energy_new-energy_old")
    # Accept also checks for NaN
    integrator.addComputeGlobal("accept", "step(-delta_energy) * delta(energy - energy_new)")

    integrator.addComputePerDof("x", "accept*x + (1-accept)*x_old")

    # Update step size.
    integrator.addComputeGlobal("step_size", "step_size * (2.0*accept + 0.5*(1-accept))")

    return integrator

This has been submitted to the simtk.org feature request tracker:
https://simtk.org/tracker/index.php?func=detail&aid=1987&group_id=161&atid=436

@rmcgibbo
Copy link
Contributor

When you run the BFGS minimizer on the GPU, do you know what the ratio of time spent on the GPU calculating forces is to the overhead of the data transfer and the calculation on the CPU? What is the potential maximum speedup that could come from moving the optimizer onto the GPU?

@jlmaccal
Copy link
Contributor

+1 for a faster minimizer

@jchodera
Copy link
Member Author

Well, for a system with only 1552 atoms, this is a factor of ten:

LocalEnergyMinimizer       took    1.650 s for 100 steps of gradient descent.
CustomIntegrator minimizer took   11.116 s for 100 steps of gradient descent.

Though that should obviously be "L-BFGS" instead of "gradient descent" for the LocalEnergyMinimizer.

I haven't done any profiling, just raw timings.

@jchodera
Copy link
Member Author

I mean, the L-BFGS implementation is great---robust and parsimonious in its use of force calls, and quadratically convergent near the minimum.

But, you know, this is slow to have in an inner loop:

    context.setPositions(positions);
    context.computeVirtualSites();
    State state = context.getState(State::Forces | State::Energy);
    const vector<Vec3>& forces = state.getForces();

@rmcgibbo
Copy link
Contributor

L-BFGS was 10x faster than your CustomIntegrator to do the same number of steps? Did you have that backward -- or if not what exactly is the use case for this CustomIntegrator?

@jchodera
Copy link
Member Author

L-BFGS was 10x faster than your CustomIntegrator to do the same number of steps? Did you have that backward -- or if not what exactly is the use case for this CustomIntegrator?

Whoops, I think I had that backwards. Let me try that again.

@jchodera
Copy link
Member Author

OK, I had completely messed that up somehow.

The timing for villin (1VII) in implicit solvent is actually much closer than I thought:

LocalEnergyMinimizer for 100 steps:    1.042 s
CustomIntegrator     for 100 steps:    0.837 s

@jchodera
Copy link
Member Author

Also not that different for explicit solvent (8701 atoms):

LocalEnergyMinimizer for 100 steps:    2.426 s
CustomIntegrator     for 100 steps:    1.580 s

OK, maybe I'm totally wrong here.

@jchodera
Copy link
Member Author

Also, I'll note the first time you use the CustomIntegrator version there is a large startup time for kernel compilation:

LocalEnergyMinimizer for 100 steps:    2.914 s
CustomIntegrator     for 100 steps:   11.790 s
LocalEnergyMinimizer for 100 steps:    2.431 s
CustomIntegrator     for 100 steps:    1.617 s

@rmcgibbo
Copy link
Contributor

So the overhead for transferring the data back and forth between the CPU and GPU in LocalEnergyMinimizer is like between 20% and 2x. Obviously this depends on system size and the bus speeds, but it's not huge. IMO, this doesn't seem like a big win.

@jchodera
Copy link
Member Author

Agreed. This needs more research before it's clearly beneficial. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants