# CUDA tensor network contraction demo

## Requirements
- The system must have a CUDA GPU available.

In [1]:
using Tenet
using EinExprs
using Adapt
using CUDA
using BenchmarkTools

│ Please consider using an official build from https://julialang.org/downloads/.
└ @ CUDA /home/bsc/bsc021386/.julia/packages/CUDA/75aiI/src/initialization.jl:180


Create a random tensor network and find its contraction path:

In [2]:
# Initialize random tensor network
regularity = 6
ntensors = 10
tn = rand(TensorNetwork, ntensors, regularity)
path = einexpr(tn; optimizer=Exhaustive())

SizedEinExpr{Symbol}(EinExpr{Symbol}(Symbol[], EinExpr{Symbol}[EinExpr{Symbol}([:C, :a, :N, :I], EinExpr{Symbol}[]), EinExpr{Symbol}([:C, :a, :N, :I], EinExpr{Symbol}[EinExpr{Symbol}([:C, :N, :G, :Z, :U, :R, :E, :X, :S], EinExpr{Symbol}[]), EinExpr{Symbol}([:a, :I, :G, :Z, :U, :R, :E, :X, :S], EinExpr{Symbol}[EinExpr{Symbol}([:B, :D, :G, :Z], EinExpr{Symbol}[]), EinExpr{Symbol}([:B, :a, :I, :D, :U, :R, :E, :X, :S], EinExpr{Symbol}[EinExpr{Symbol}([:I, :D, :E, :O, :M], EinExpr{Symbol}[]), EinExpr{Symbol}([:B, :a, :U, :R, :X, :S, :O, :M], EinExpr{Symbol}[EinExpr{Symbol}([:B, :V, :P, :a, :W, :A], EinExpr{Symbol}[EinExpr{Symbol}([:B, :V, :J, :b, :c, :P], EinExpr{Symbol}[]), EinExpr{Symbol}([:J, :b, :c, :a, :W, :A], EinExpr{Symbol}[])]), EinExpr{Symbol}([:V, :P, :U, :R, :X, :S, :W, :A, :O, :M], EinExpr{Symbol}[EinExpr{Symbol}([:P, :X, :T, :L, :H], EinExpr{Symbol}[]), EinExpr{Symbol}([:V, :U, :R, :S, :W, :A, :O, :M, :T, :L, :H], EinExpr{Symbol}[EinExpr{Symbol}([:R, :S, :A, :K, :M, :d, :Y, :T

Transform the tensors' data types to `CuArray`s:

In [3]:
cudatn = adapt(CuArray, tn)

TensorNetwork (#tensors=10, #inds=30)

Benchmark CUDA tensor network contraction:

In [4]:
@benchmark CUDA.@sync contract(cudatn; path)

BenchmarkTools.Trial: 3371 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.324 ms[22m[39m … [35m 16.946 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 55.47%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.389 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m1.469 ms[22m[39m ± [32m940.462 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m3.02% ±  4.18%

  [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▅[39m█[34m▇[39m[39m▅[39m▃[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▃[39m▆[39m▇[39m█[39m▇[

Benchmark regular tensor network contraction:

In [5]:
@benchmark CUDA.@sync contract(tn; path)

BenchmarkTools.Trial: 40 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 65.622 ms[22m[39m … [35m261.489 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 3.43% … 74.33%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m 97.514 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 2.04%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m130.156 ms[22m[39m ± [32m 69.544 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m39.14% ± 29.56%

  [39m▄[39m▄[39m▁[39m [39m [39m [39m█[39m▄[39m [39m█[34m [39m[39m█[39m [39m▁[39m [39m [39m [39m [39m▁[39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▄[39m [39m▁[39m [39m [39m [39m 
  [39m█[39m█[39m█