This is a simple example showing how to use this package to train a machine learning force field using your data.

In this example, we will study a system of two helium atoms interacting through a Lennard-Jones potential. The example is divided into two main sections:

1) Model Setup and Training:
    We will load a pre-generated dataset containing configurations of two helium atoms in a simulation box and use the SymmLearn functions to build and train our neural network model.

2) Results and Comparison:
    Finally, we will analyze the results. For this simple case, it is possible to visualize the trained force field and compare it to the reference Lennard-Jones potential.


In [1]:
#using Symmlearn
using Plots

include("../src/MLTrain.jl")
include("../src/Data_prep.jl")
include("../src/Utils.jl")

fc (generic function with 1 method)

The loading process of the .xyz dataset con be done as illustrated here in the next code block.

This dataset consists in 1000 samples, the energy for each system was computed using a normalized Lennard-Jones potential with $\sigma$ = 1 and $\epsilon$ = 1, for each sample the distance between the two Helium atoms was randomly generated between 0.95 $\sigma$ and 2.5 $\sigma$

In [2]:
file_path = "helium_LJ_dataset.xyz"

Train, Val, Test_data, energy_mean, energy_std, forces_mean, forces_std, species, all_cells = xyz_to_nn_input(file_path)


((Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], Dict{Symbol, Any}[Dict(:energy => 0.5706239f0, :forces => Float32[-0.01270491, 0.01270491, 0.0, 0.0, 0.0, 0.0]), Dict(:energy => -1.3315994f0, :forces => Float32[-0.55432314, 0.55432314, 0.0, 0.0, 0.0, 0.0]), Dict(:energy => -1.2531202f0, :forces => Float32[2.3644674, -2.3644674, 0.0, 0.0, 0.0, 0.0]), Dict(:energy => -0.05493946f0, :forces => Float32[-0.2099825, 0.2099825, 0.0, 0.0, 0.0, 0.0]), Dict(:energy => 2.6043594f0, :forces => Float32[8.81632, -8.81632, 0.0, 0.0, 0.0, 0.0]), Dict(:energy => -1.7122737f0, :forces => Float32[-0.54902565, 0.54902565, 0.0, 0.0, 0.0, 0.0]), Dict(:energy => 0.36671016f0, :forces => Float32[-0.07163917, 0.07163917, 0.0, 0.0, 0.0, 0.0]), Dict(:energy => 0.4526745f0, :forces => Float32[-0.04544816, 0.04544816, 0.0, 0.0, 0.0, 0.0]), Dict(:energy => -0.17145172f0, :forces => Float32[-0.24886188, 0.24886188, 0.0, 0.0, 0.0, 0.0]), Dict(:energy => -1.5806979f0, :forces 

The xyz_to_nn_input function returns the data already split in test, train and validation, the mean and the standard deviation of both the energies and the forces in order to renormalize them later and the lattice parameters, used by the model to compute the atomic distances with periodic boundary conditions ( in this example we won't be using PBC as the helium atoms are confined in a box )

In the next block we wil building and training our model,  since the system is trivial using the forces to train it won't be needed

In [None]:
#define the model using 2 G1 symmetry functions

model = build_model(species, 2, 2.5f0)

#train the model

trained_model,train_loss,val_loss = train_model!(
        model,
        Train[1], 
        Train[2], 
        Val[1],
        Val[2],
        loss_function_no_forces;
         initial_lr=0.1,epochs=500, batch_size=32 , verbose=false
    )


model grads computed !(branches = NamedTuple{(:distance_layer, :species_model)}[(distance_layer = (central_atom_idx = nothing,), species_model = (layers = ((W_eta = Float32[-0.019740928, -0.0009503402, 0.001628182, -0.041982353, -0.007922775], W_Fs = Float32[-0.038765647, -1.3397384f-5, -0.0035681424, 0.039990976, 0.012407556], cutoff = nothing, charge = nothing), (weight = Float32[0.0407371 0.06192997 0.029348617 0.04408522 0.058383893; -0.0011781747 -0.0019225425 -0.0010035547 -0.0014431726 -0.0018729098; 0.03864465 0.058742654 0.027828028 0.04180778 0.055371746; 0.01649801 0.025071083 0.0118698515 0.01783757 0.023627557; -0.071290985 -0.10829899 -0.0512521 -0.07703523 -0.102049746; -0.038758345 -0.058739442 -0.027679643 -0.041685533 -0.05526981; -0.0073456652 -0.011168169 -0.0052920543 -0.007949609 -0.010528169; 0.011642973 0.017633233 0.008305152 0.012510711 0.016589528; 0.009552914 0.014426646 0.006770973 0.010216702 0.013557776; 0.053959984 0.08200425 0.03883731 0.058355108 0.077

Now our model has been trained, we can look at the results, the plot compares the energy of each pair as a function of the distance between the two atoms with the LJ potential for the test set

In [None]:
# Denormalize model predictions for the test set using Z-score inversion
test_guess = trained_model(Test_data[1])' .* energy_std .+ energy_mean

# Extract interatomic distances from the test set (for plotting)
test_distances = [Test_data[1][i, 4] for i in 1:size(Test_data[1], 1)]


# Extract true energies from the test set
test_energy = [d[:energy] for d in Test_data[2]]



# Define Lennard-Jones potential curve for reference
r = 0.95:0.01:2.5
lj_energy = 4 .* ((1 ./ r).^12 .- (1 ./ r).^6)

# Plot predictions vs Lennard-Jones potential
scatter(test_distances, test_guess,
    label="Model prediction (Test Set)",
    alpha=0.5,
    color="cyan"
)

plot!(r, lj_energy,
    label="Lennard-Jones Potential",
    color="black",
    lw=2
)

xlabel!("Distance [σ]")
ylabel!("Energy [ϵ]")
title!("Helium energy prediction - Test Set")


As expected the model managed to reproduce very well the Lennard Jones potential