# Using DFTK on GPUs

In this example we will look how DFTK can be used on
Graphics Processing Units.
In its current state runs based on Nvidia GPUs
using the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) Julia
package are better supported and there are considerably less rough
edges.

> **GPU parallelism not supported everywhere**
>
> GPU support is still a relatively new feature in DFTK.
> While basic SCF computations and e.g. forces are supported,
> this is not yet the case for all parts of the code.
> In most cases there is no intrinsic limitation and typically it only takes
> minor code modification to make it work on GPUs.
> If you require GPU support in one of our routines, where this is not
> yet supported, feel free to open an issue on github or otherwise get in touch.

In [1]:
using AtomsBuilder
using DFTK
using PseudoPotentialData

**Model setup.** First step is to setup a `Model` in DFTK.
This proceeds exactly as in the standard CPU case
(see also our Tutorial).

In [2]:
silicon = bulk(:Si)

model  = model_DFT(silicon;
                   functionals=PBE(),
                   pseudopotentials=PseudoFamily("dojo.nc.sr.pbe.v0_4_1.standard.upf"))
nothing  # hide

Next is the selection of the computational architecture.
This effectively makes the choice, whether the computation will be run
on the CPU or on a GPU.

**Nvidia GPUs.**
Supported via [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl).
If you install the CUDA package, all required Nvidia cuda libraries
will be automatically downloaded. So literally, the only thing
you have to do is:

In [3]:
using CUDA
architecture = DFTK.GPU(CuArray)

DFTK.GPU{CUDA.CuArray}()

**AMD GPUs.** Supported via [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl).
Here you need to [install ROCm](https://rocm.docs.amd.com/) manually.
With that in place you can then select:

In [4]:
using AMDGPU
architecture = DFTK.GPU(ROCArray)

DFTK.GPU{AMDGPU.ROCArray}()

**Portable architecture selection.**
To make sure this script runs on the github CI (where we don't have GPUs
available) we check for the availability of GPUs before selecting an
architecture:

In [5]:
architecture = has_cuda() ? DFTK.GPU(CuArray) : DFTK.CPU()

DFTK.CPU()

**Basis and SCF.**
Based on the `architecture` we construct a `PlaneWaveBasis` object
as usual:

In [6]:
basis  = PlaneWaveBasis(model; Ecut=30, kgrid=(5, 5, 5), architecture)
nothing  # hide

... and run the SCF and some post-processing:

In [7]:
scfres = self_consistent_field(basis; tol=1e-6)
compute_forces(scfres)

n     Energy            log10(ΔE)   log10(Δρ)   Diag   Δtime
---   ---------------   ---------   ---------   ----   ------
  1   -8.457733577446                   -0.94    5.3    10.2s
  2   -8.459884888532       -2.67       -1.78    1.0    4.37s
  3   -8.460052313781       -3.78       -2.90    2.0    408ms
  4   -8.460064817780       -4.90       -3.35    3.2    513ms
  5   -8.460064906221       -7.05       -3.89    1.3    346ms
  6   -8.460064914616       -8.08       -5.02    1.3    336ms
  7   -8.460064915255       -9.19       -5.26    3.2    515ms
  8   -8.460064915266      -10.97       -6.34    1.0    315ms


2-element Vector{StaticArraysCore.SVector{3, Float64}}:
 [-1.317671654738741e-14, -1.7324545955180307e-14, -1.8237922430123883e-14]
 [1.6196773518316994e-14, 1.875733847196515e-14, 1.9610438540167647e-14]

> **GPU performance**
>
> Our current (May 2025) benchmarks show DFTK to have reasonable performance
> on Nvidia / CUDA GPUs with up to a 100-fold speed-up over single-threaded
> CPU execution. However, support on AMD GPUs has been less benchmarked and
> there are likely rough edges. Since GPU support in DFTK is relatively new
> we appreciate any experience reports or bug reports.