In [1]:
filepath = "test_data/H2.wfn"

include("../wfn.jl");
include("../ext21.jl");

In [2]:
#Struct containing all relevant WFN information
f = read_wfn(filepath, device = gpu);
#We store the data structures in the GPU.
f.title

"H2_gas"

### Some information from the read WFN file.

In [3]:
#The nuclei positions
f.nuclei_pos

2×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.0  0.0   0.69742
 0.0  0.0  -0.69742

### Finding the critical points

In [4]:
#Initial guesses for our Newton-Raphson search algorithm
r⃗ = [0 0 0;
     0.1 0 0.7;
     0 0 -0.7;
     0 1 0;
     1 -1 0;
     0.4 0.5 0.5;
     1 1 1;
     0 0 0.6;
     0 0 0.5;
     0 10 10] .|> Float32 |> gpu

10×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.0   0.0   0.0
 0.1   0.0   0.7
 0.0   0.0  -0.7
 0.0   1.0   0.0
 1.0  -1.0   0.0
 0.4   0.5   0.5
 1.0   1.0   1.0
 0.0   0.0   0.6
 0.0   0.0   0.5
 0.0  10.0  10.0

In [5]:
r⃗_found, ρ⃗, iters = find_critical_ρ_points(r⃗, f);

The found critical points.

In [6]:
r⃗_found

10×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 -2.77729f-18   1.11746f-16  -1.41111f-9
 -8.21886f-7    4.11652f-18   0.670376
  9.64199f-19   1.21076f-17  -0.670377
  1.59395f-17   4.45761       2.85738f-11
  3.40471      -3.40471       1.28304f-10
  2.57315       3.21644       0.532813
  3.18416       3.18416       2.3226
 -1.6956f-18    7.15339f-18   0.670375
 -2.77729f-18   1.11743f-16   2.15256f-5
  0.0          12.5571       12.4105

The number of iterations done.

In [7]:
iters

10×1 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 15.0
 15.0
 15.0
 15.0
 15.0
 15.0
 15.0
 15.0
 15.0
 15.0

The associated density for each point.

In [8]:
ρ⃗

10-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.2637181
 0.41073042
 0.41073045
 6.173389f-5
 2.9519706f-5
 0.00011567754
 1.9402674f-5
 0.41073042
 0.2637181
 3.3823704f-16

There is convergence to three different critical points:
- (0, 0, 0)
- (0, 0, 0.6703)
- (0, 0, -0.6703)

In many other cases, the algorithm diverges due to the initialization. Detecting divergence and letting the user know, as well as allowing early stopping when convergence is detected are desired features that will be implemented in a future update.

## Using CPU

Because of the underlying technology, it is possible to run the same algorithm in CPU. In order to do so, the data structures for 

In [10]:
f = read_wfn(filepath, device = cpu);
r⃗ = [0 0 0;
     0.1 0 0.7;
     0 0 -0.7;
     0 1 0;
     1 -1 0;
     0.4 0.5 0.5;
     1 1 1;
     0 0 0.6;
     0 0 0.5;
     0 10 10] .|> Float32 |> cpu
r⃗_found, ρ⃗, iters = find_critical_ρ_points(r⃗, f);

In [11]:
r⃗_found

10×3 Matrix{Float32}:
 -2.77729f-18   1.11746f-16  -1.61965f-9
 -8.21886f-7    6.91228f-18   0.670376
  9.64199f-19   1.21076f-17  -0.670377
  1.59386f-17   4.45761      -1.2829f-11
  3.40471      -3.40471      -3.36881f-11
  2.57315       3.21644       0.532813
  3.18416       3.18416       2.3226
 -1.6956f-18    7.15339f-18   0.670375
 -2.77729f-18   1.11743f-16   2.15223f-5
  0.0          12.5571       12.4105

In [12]:
ρ⃗

10-element Vector{Float32}:
 0.2637181
 0.41073036
 0.41073042
 6.173389f-5
 2.951971f-5
 0.00011567752
 1.9402662f-5
 0.4107304
 0.26371813
 3.3823773f-16

In [13]:
iters

10×1 Matrix{Float64}:
 15.0
 15.0
 15.0
 15.0
 15.0
 15.0
 15.0
 15.0
 15.0
 15.0

The results are virtually identical. However, execution times start becoming dramatically different due to the parallelism capabilities of GPUs.

# Trying a different molecule

In [16]:
using BenchmarkTools

filepath = "test_data/S1-4PPdm.wfn"
i⃗ = [0 0 0;
     0.1 0 0.3;
     0 0 -0.2;
     0 1 0;
     1 -1 0;
     0.4 0.5 0.5;
     1 1 1;
     -0.7 0 0;
     0 0 0.5;
     0 10 10] .|> Float32;

In [17]:
f = read_wfn(filepath, device = cpu);
r⃗ = i⃗ |> cpu

@btime r⃗_found, ρ⃗, iters = find_critical_ρ_points(r⃗, f);

  100.255 ms (8530 allocations: 444.46 MiB)


In [18]:
f = read_wfn(filepath, device = gpu);
r⃗ = i⃗ |> gpu

@btime r⃗_found, ρ⃗, iters = find_critical_ρ_points(r⃗, f);

  30.941 ms (103450 allocations: 5.08 MiB)


## Trying a bigger search set

In [23]:
using Distributions
#10,000 initial points
n_points_search = 10_000
d = Normal()

i⃗ = rand(d, (n_points_search,3)) .|> Float32;

In [24]:
f = read_wfn(filepath, device = cpu);
r⃗ = i⃗ |> cpu

@btime r⃗_found, ρ⃗, iters = find_critical_ρ_points(r⃗, f);

  41.066 s (8742 allocations: 70.92 GiB)


In [25]:
f = read_wfn(filepath, device = gpu);
r⃗ = i⃗ |> gpu

@btime r⃗_found, ρ⃗, iters = find_critical_ρ_points(r⃗, f);

  4.908 s (10011072 allocations: 307.41 MiB)
