# Using DFTK on GPUs

In this example we will look how DFTK can be used on
Graphics Processing Units.
In its current state runs based on Nvidia GPUs
using the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) Julia
package are better supported and there are considerably less rough
edges.

> **GPU parallelism not supported everywhere**
>
> GPU support is still a relatively new feature in DFTK.
> While basic SCF computations and e.g. forces are supported,
> this is not yet the case for all parts of the code.
> In most cases there is no intrinsic limitation and typically it only takes
> minor code modification to make it work on GPUs.
> If you require GPU support in one of our routines, where this is not
> yet supported, feel free to open an issue on github or otherwise get in touch.

In [1]:
using AtomsBuilder
using DFTK
using PseudoPotentialData

**Model setup.** First step is to setup a `Model` in DFTK.
This proceeds exactly as in the standard CPU case
(see also our Tutorial).

In [2]:
silicon = bulk(:Si)

model  = model_DFT(silicon;
                   functionals=PBE(),
                   pseudopotentials=PseudoFamily("dojo.nc.sr.pbe.v0_4_1.standard.upf"))
nothing  # hide

Next is the selection of the computational architecture.
This effectively makes the choice, whether the computation will be run
on the CPU or on a GPU.

**Nvidia GPUs.**
Supported via [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl).
If you install the CUDA package, all required Nvidia cuda libraries
will be automatically downloaded. So literally, the only thing
you have to do is:

In [3]:
using CUDA
architecture = DFTK.GPU(CuArray)

DFTK.GPU{CUDA.CuArray}()

**AMD GPUs.** Supported via [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl).
Here you need to [install ROCm](https://rocm.docs.amd.com/) manually.
With that in place you can then select:

In [4]:
using AMDGPU
architecture = DFTK.GPU(ROCArray)

Precompiling DFTKAMDGPUExt...
Info Given DFTKAMDGPUExt was explicitly requested, output will be shown live [0K
[0K[91m[1mERROR: [22m[39mLoadError: could not load symbol "hipDeviceGet":
[0K/opt/hostedtoolcache/julia/1.11.7/x64/bin/julia: undefined symbol: hipDeviceGet
[0KStacktrace:
[0K  [1] [0m[1mmacro expansion[22m
[0K[90m    @[39m [90m~/.julia/packages/GPUToolbox/XaIIx/src/[39m[90m[4mccalls.jl:143[24m[39m[90m [inlined][39m
[0K  [2] [0m[1mmacro expansion[22m
[0K[90m    @[39m [90m~/.julia/packages/AMDGPU/Zmbiq/src/[39m[90m[4mutils.jl:122[24m[39m[90m [inlined][39m
[0K  [3] [0m[1mhipDeviceGet[22m
[0K[90m    @[39m [90m~/.julia/packages/AMDGPU/Zmbiq/src/hip/[39m[90m[4mlibhip.jl:3683[24m[39m[90m [inlined][39m
[0K  [4] [0m[1mAMDGPU.HIP.HIPDevice[22m[0m[1m([22m[90mdevice_id[39m::[0mInt64[0m[1m)[22m
[0K[90m    @[39m [35mAMDGPU.HIP[39m [90m~/.julia/packages/AMDGPU/Zmbiq/src/hip/[39m[90m[4mdevice.jl:15[24m[39m
[0K  [5]

DFTK.GPU{AMDGPU.ROCArray}()

**Portable architecture selection.**
To make sure this script runs on the github CI (where we don't have GPUs
available) we check for the availability of GPUs before selecting an
architecture:

In [5]:
architecture = has_cuda() ? DFTK.GPU(CuArray) : DFTK.CPU()

DFTK.CPU()

**Basis and SCF.**
Based on the `architecture` we construct a `PlaneWaveBasis` object
as usual:

In [6]:
basis  = PlaneWaveBasis(model; Ecut=30, kgrid=(5, 5, 5), architecture)
nothing  # hide

... and run the SCF and some post-processing:

In [7]:
scfres = self_consistent_field(basis; tol=1e-6)
compute_forces(scfres)

n     Energy            log10(ΔE)   log10(Δρ)   Diag   Δtime
---   ---------------   ---------   ---------   ----   ------
  1   -8.457742907378                   -0.94    5.4    2.30s
  2   -8.459895326468       -2.67       -1.78    1.0    1.32s
  3   -8.460053507663       -3.80       -2.91    2.2    386ms
  4   -8.460064825356       -4.95       -3.36    3.1    1.13s
  5   -8.460064906652       -7.09       -3.89    1.4    386ms
  6   -8.460064914748       -8.09       -4.98    1.6    346ms
  7   -8.460064915251       -9.30       -5.22    3.1    462ms
  8   -8.460064915265      -10.85       -6.59    1.0    291ms


2-element Vector{StaticArraysCore.SVector{3, Float64}}:
 [-1.5206091441501848e-14, -1.3099935824332006e-14, -1.8761543096106978e-14]
 [1.618725078393129e-14, 1.3620191176754727e-14, 1.913060458590724e-14]

> **GPU performance**
>
> Our current (May 2025) benchmarks show DFTK to have reasonable performance
> on Nvidia / CUDA GPUs with up to a 100-fold speed-up over single-threaded
> CPU execution. However, support on AMD GPUs has been less benchmarked and
> there are likely rough edges. Since GPU support in DFTK is relatively new
> we appreciate any experience reports or bug reports.