# GPUs and multi-GPUs Support

> **Warning** 
> The use of GPU and mutli-GPU is under developpement and hasn't been thoroughly tested yet. Proceed with caution !

Using pytorch as a backend, QMCTorch can leverage GPU cards available on your hardware.
You of course must have the CUDA version of pytorch installed (https://pytorch.org/)


Let's first import everything and create a molecule


In [13]:
import torch
from torch import optim
from qmctorch.scf import Molecule
from qmctorch.wavefunction import SlaterJastrow
from qmctorch.sampler import Metropolis
from qmctorch.utils import (plot_energy, plot_data)
mol = Molecule(atom='H 0. 0. 0; H 0. 0. 1.', unit='bohr', redo_scf=True)

INFO:QMCTorch|
INFO:QMCTorch| SCF Calculation
INFO:QMCTorch|  Removing H2_pyscf_sto-3g.hdf5 and redo SCF calculations
INFO:QMCTorch|  Running scf  calculation
converged SCF energy = -1.06599946214331
INFO:QMCTorch|  Molecule name       : H2
INFO:QMCTorch|  Number of electrons : 2
INFO:QMCTorch|  SCF calculator      : pyscf
INFO:QMCTorch|  Basis set           : sto-3g
INFO:QMCTorch|  SCF                 : HF
INFO:QMCTorch|  Number of AOs       : 2
INFO:QMCTorch|  Number of MOs       : 2
INFO:QMCTorch|  SCF Energy          : -1.066 Hartree


## Running on a single GPU

The use of GPU acceleration has been streamlined in QMCTorch, the only modification
you need to do on your code is to specify `cuda=True` in the declaration of the wave function and sampler, this will automatically port all the necesaary tensors to the GPU and offload all the corresponding operation there.

In [9]:
if torch.cuda.is_available():
    wf = SlaterJastrow(mol, cuda=True)
    sampler = Metropolis(nwalkers=100, nstep=500, step_size=0.25,
                     nelec=wf.nelec, ndim=wf.ndim,
                     init=mol.domain('atomic'),
                     move={'type': 'all-elec', 'proba': 'normal'},
                     cuda=True)
else:
    print('CUDA not available, install torch with cuda support to proceed')

CUDA not available, install torch with cuda support to proceed


In [None]:
lr_dict = [{'params': wf.jastrow.parameters(), 'lr': 3E-3},
           {'params': wf.ao.parameters(), 'lr': 1E-6},
           {'params': wf.mo.parameters(), 'lr': 1E-3},
           {'params': wf.fc.parameters(), 'lr': 2E-3}]
opt = optim.Adam(lr_dict, lr=1E-3)

## Multi GPU Support

The use of multiple GPUs is made possible through the `Horovod` library : https://github.com/horovod/horovod
A dedicated QMCTorch Solver has been developped to handle multiple GPU. To use this solver simply import it
and use is as the normal solver and only a few modifications are required to use horovod :

In [12]:
import horovod.torch as hvd
from qmctorch.solver import SolverSlaterJastrowHorovod

hvd.init()
if torch.cuda.is_available():
    torch.cuda.set_device(hvd.rank())
    solver = SolverSlaterJastrowHorovod(wf=wf, sampler=sampler,
                                        optimizer=opt,
                                        rank=hvd.rank())
    
else:
    print('CUDA not available, install torch with cuda support to proceed')

CUDA not available, install torch with cuda support to proceed


In [None]:
# configure the solver
if torch.cuda.is_available():
    solver.configure(track=['local_energy'], freeze=['ao', 'mo'],
                    loss='energy', grad='auto',
                    ortho_mo=False, clip_loss=False,
                    resampling={'mode': 'update',
                                'resample_every': 1,
                                'nstep_update': 50})

    # optimize the wave function
    obs = solver.run(250)

    if hvd.rank() == 0:
        plot_energy(obs.local_energy, e0=-1.1645, show_variance=True)
        plot_data(solver.observable, obsname='jastrow.weight')
else:
    print('CUDA not available, install torch with cuda support to proceed')

As you can see some classes need the rank of the process when they are defined. This is simply
to insure that only the master process generates the HDF5 files containing the information relative to the calculation.

It is currently difficult to use Horovod on mutliple node through a jupyter notebook. To do so, one should have a python file with all the code and execute the code  with the following command



```
horovodrun -np 2 python <example>.py
```

See the horovod documentation for more details : https://github.com/horovod/horovod


This solver distribute the `Nw` walkers over the `Np` process . For example specifying 2000 walkers
and using 4 process will lead to each process using only 500 walkers. During the optimizaiton of the wavefunction
each process will compute the gradients of the variational parameter using their local 500 walkers.
The gradients are then averaged over all the processes before the optimization step takes place. This data parallel
model has been greatly succesfull in machine learning applications (http://jmlr.org/papers/volume20/18-789/18-789.pdf)