# Using DFTK on GPUs

In this example we will look how DFTK can be used on
Graphics Processing Units.
In its current state runs based on Nvidia GPUs
using the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) Julia
package are better supported and there are considerably less rough
edges.

> **GPU parallelism not supported everywhere**
>
> GPU support is still a relatively new feature in DFTK.
> While basic SCF computations and e.g. forces are supported,
> this is not yet the case for all parts of the code.
> In most cases there is no intrinsic limitation and typically it only takes
> minor code modification to make it work on GPUs.
> If you require GPU support in one of our routines, where this is not
> yet supported, feel free to open an issue on github or otherwise get in touch.

In [1]:
using AtomsBuilder
using DFTK
using PseudoPotentialData

**Model setup.** First step is to setup a `Model` in DFTK.
This proceeds exactly as in the standard CPU case
(see also our Tutorial).

In [2]:
silicon = bulk(:Si)

model  = model_DFT(silicon;
                   functionals=PBE(),
                   pseudopotentials=PseudoFamily("dojo.nc.sr.pbe.v0_4_1.standard.upf"))
nothing  # hide

Next is the selection of the computational architecture.
This effectively makes the choice, whether the computation will be run
on the CPU or on a GPU.

**Nvidia GPUs.**
Supported via [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl).
Right now `Libxc` only supports CUDA 11,
so we need to explicitly request the 11.8 CUDA runtime:

In [3]:
using CUDA
CUDA.set_runtime_version!(v"11.8")  # Note: This requires a restart of Julia
architecture = DFTK.GPU(CuArray)

[ Info: Configure the active project to use CUDA 11.8; please re-start Julia for this to take effect.


DFTK.GPU{CUDA.CuArray}()

**AMD GPUs.** Supported via [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl).
Here you need to [install ROCm](https://rocm.docs.amd.com/) manually.
With that in place you can then select:

In [4]:
using AMDGPU
architecture = DFTK.GPU(ROCArray)

DFTK.GPU{AMDGPU.ROCArray}()

**Portable architecture selection.**
To make sure this script runs on the github CI (where we don't have GPUs
available) we check for the availability of GPUs before selecting an
architecture:

In [5]:
architecture = has_cuda() ? DFTK.GPU(CuArray) : DFTK.CPU()

DFTK.CPU()

**Basis and SCF.**
Based on the `architecture` we construct a `PlaneWaveBasis` object
as usual:

In [6]:
basis  = PlaneWaveBasis(model; Ecut=30, kgrid=(5, 5, 5), architecture)
nothing  # hide

... and run the SCF and some post-processing:

In [7]:
scfres = self_consistent_field(basis; tol=1e-6)
compute_forces(scfres)

n     Energy            log10(ΔE)   log10(Δρ)   Diag   Δtime
---   ---------------   ---------   ---------   ----   ------
  1   -8.457794835537                   -0.94    5.4    4.51s
  2   -8.459902661378       -2.68       -1.79    1.0    2.22s
  3   -8.460055001716       -3.82       -2.91    2.2    425ms
  4   -8.460064827817       -5.01       -3.41    2.9    507ms
  5   -8.460064912734       -7.07       -3.99    1.6    357ms
  6   -8.460064920264       -8.12       -5.22    1.8    384ms
  7   -8.460064920676       -9.39       -5.52    3.4    556ms
  8   -8.460064920680      -11.46       -6.42    1.0    313ms


2-element Vector{StaticArraysCore.SVector{3, Float64}}:
 [-1.3436212867122692e-14, -1.3735065952625096e-14, -1.843419811527009e-14]
 [1.520098257072764e-14, 1.6187347619217484e-14, 1.81556731815166e-14]

> **GPU performance**
>
> Our current (February 2025) benchmarks show DFTK to have reasonable performance
> on Nvidia / CUDA GPUs with a 50-fold to 100-fold speed-up over single-threaded
> CPU execution. However, support on AMD GPUs has been less benchmarked and
> there are likely rough edges. Overall this feature is relatively new
> and we appreciate any experience reports or bug reports.