# Example

This example demonstrates the whole process from initial atomic structure to training, evaluation and prediction. It includes:


1. Read input atomic structures (saved as extxyz files) and create descriptors and their derivatives.

2. Read inputs and outputs into a Data object.

3. Create tensorflow dataset for training.

4. Train the potential and apply it for prediction.

5. Save the trained model and then load it for retraining or prediction.


The code has been tested on Tensorflow 2.5 and 2.6.

In [1]:
%reset -s -f
import atomdnn

# 'float64' is used for reading data and train by default
atomdnn.data_type = 'float64'

# force and stress are evaluated by default, 
# if one only need to compute potential energy, then set compute_force to false
atomdnn.compute_force = True

# default value is for converting ev/A^3 to GPa
# note that: the predicted positive stress means tension and negative stress means compression
stress_unit_convert = 160.2176 

import numpy as np
import pickle
import tensorflow as tf
from atomdnn import data
from atomdnn.data import Data
from atomdnn.data import *
from atomdnn.atom_io import *
from atomdnn import network
from atomdnn.network import Network

## Create descriptors

In [2]:
!rm descriptors/*

In [3]:
!cat extxyz/1tp_XYZ.0

54
Lattice="19.173938 0.000456 -0.025754 -0.000245 10.309404 0.000007 0.013561 0.000002 30.000000 " Properties=species:S:1:pos:R:3:forces:R:3 energy=-323.21193092 pbc="T T T"
Mo	      2.14530      0.00007      3.83746         0.001201     -0.000018     -0.003032
Mo	      4.47897      1.71834      4.01086         0.004828     -0.000024      0.006136
Mo	      2.14523      3.43649      3.83747         0.001155      0.000033     -0.003011
Mo	      4.47888      5.15479      4.01087         0.004849      0.000022      0.006115
Mo	      2.14514      6.87299      3.83747         0.001160     -0.000022     -0.002974
Mo	      4.47880      8.59130      4.01087         0.004843      0.000013      0.006131
Mo	      8.53662      0.00021      3.82889         0.003276      0.000018     -0.004118
Mo	     10.87026      1.71849      4.00228        -0.004636      0.000005      0.005790
Mo	      8.53653      3.43665      3.82889         0.003274     -0.000016     -0.004149
Mo	     10.87018      

**Read input atomic structures (saved as extxyz files) and create descriptors and their derivatives**


In [4]:
# descriptor = {'name': 'acsf', 
#               'cutoff': 6.5,
#               'etaG2':[0.01,0.05,0.1,0.5,1,5,10], 
#               'etaG4': [0.01], 
#               'zeta': [0.08,0.2,1.0,5.0,10.0,50.0,100.0],
#               'lambda': [1.0, -1.0]}
descriptor = {'name': 'acsf', 
              'cutoff': 6.5,
              'etaG2':[1.0]}

# define lammps excutable (serial or mpi) 
# LAMMPS has to be compiled with the added compute and dump_local subrutines (inside atomdnn/lammps)
lmpexe = 'lmp_serial' 

In [5]:
!ls ./extxyz/

1tp_XYZ.0


In [6]:
!ls descriptors/

In [7]:
xyzfile_path = './extxyz' 
xyzfile_name = '1tp_XYZ.*'
descriptors_path = './descriptors'
descriptor_filename = 'dump_fingerprints' # a serials of dump_fp.* files will be created
der_filename ='dump_fingerprints_der'

# this will create a serials of files for descriptors and their derivatives inside descriptors_path
# by default, descriptor files are saved as 'dump_fp.*' and derivatives are saved as 'dump_der.*'
create_descriptors(xyzfile_path = xyzfile_path,
                   xyzfile_name = xyzfile_name,
                   lmpexe = lmpexe,
                   descriptors_path = descriptors_path, 
                   descriptor = descriptor, 
                   descriptor_filename = descriptor_filename, 
                   der_filename = der_filename)

Start creating fingerprints ...
nombreee: ./extxyz/1tp_XYZ.0
nombreee: ./extxyz/1tp_XYZ.1
Finish creating descriptors and their derivatives from total 1 images.
It took 0.15 seconds.


In [8]:
!ls extxyz/

1tp_XYZ.0


In [9]:
!ls descriptors/

dump_fingerprints.0  dump_fingerprints_der.0  in.gen_descriptors  log.lammps


## Read inputs&outputs

**Read inputs and outputs into a Data object** 

In [10]:
!cat $descriptors_path/in.gen_descriptors

clear
dimension 3
boundary p p p
units metal
atom_style atomic
variable cutoff equal 6.5
read_data lmpdatafile
mass * 1.0
pair_style zero ${cutoff} nocoeff
pair_coeff * * 1.0 1.0
neighbor 0.0 bin
compute 1 all fingerprints 6.5 etaG2 1.0 end
compute 2 all derivatives  6.5 etaG2 1.0 end
dump dump_fingerprints all custom 1 ${dp_filename}.${file_id} id type c_1[3*]
dump_modify dump_fingerprints sort id format float %20.10g
dump dump_primes all local 1 ${der_filename}.${file_id} c_2[*3] c_2[6*]
dump_modify dump_primes format float %20.10g
fix NVE all nve
run 0

In [11]:
!ls $descriptors_path/

dump_fingerprints.0  dump_fingerprints_der.0  in.gen_descriptors  log.lammps


In [12]:
# create a Data object
mote2data = Data()

# read inputs: descriptors and their derivatives
fp_filename = descriptors_path + '/dump_fingerprints.*'
der_filename = descriptors_path + '/dump_fingerprints_der.*'

mote2data.read_inputdata(fp_filename = fp_filename,der_filename = der_filename)


Reading fingerprints data from LAMMPS dump files ./descriptors/dump_fingerprints.i
filenamee: ./descriptors/dump_fingerprints.0
fd: 0
filenamee: ./descriptors/dump_fingerprints.1
  Finish reading fingerprints from total 1 images.

  image number = 1
  max number of atom = 54
  number of fingerprints = 2
  type of atoms = 2

Reading derivative data from a series of files ./descriptors/dump_fingerprints_der.i
This may take a while for large data set ...
der_filename: ./descriptors/dump_fingerprints_der.0
der_filename: ./descriptors/dump_fingerprints_der.1
  Finish reading dGdr derivatives from total 1 images.

  Pad zeros to derivatives data if needed ...
  Pading is not needed.

  image number = 1
  max number of derivative pairs = 1890
  number of fingerprints = 2

  It took 0.12 seconds to read the derivatives data.


In [13]:
# read outputs: potential energy, force and stress from extxyz files
mote2data.read_outputdata(xyzfile_path=xyzfile_path, xyzfile_name=xyzfile_name, read_stress=False,verbose=True)

Reading outputs from extxyz files ...
filename: ./extxyz/1tp_XYZ.0
filename: ./extxyz/1tp_XYZ.1
  Finish reading outputs from total 1 images.



In [14]:
!cat $descriptors_path/dump_fingerprints.0

ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
54
ITEM: BOX BOUNDS xy xz yz pp pp pp
-2.6734297530652765e-02 1.9173955473260285e+01 1.7175047861629940e-07
0.0000000000000000e+00 1.0309426050837425e+01 -2.6734297530652765e-02
0.0000000000000000e+00 2.9999991152958270e+01 2.2047923872627226e-05
ITEM: ATOMS id type c_1[3] c_1[4] 
1 1      0.0002617672238       0.001868217552 
2 1      0.0002617825297       0.001870045681 
3 1      0.0002617805478       0.001868200669 
4 1      0.0002617556251         0.0018700645 
5 1      0.0002617439652        0.00186822605 
6 1      0.0002617535823       0.001870046932 
7 1      0.0002617941223       0.001868237721 
8 1      0.0002617988521       0.001870077542 
9 1       0.000261782674       0.001868285981 
10 1      0.0002617638871       0.001870083249 
11 1      0.0002617569446        0.00186825264 
12 1      0.0002617710014       0.001870094686 
13 1      0.0002617653311       0.001868272925 
14 1      0.0002617662266       0.00187007

In [15]:
!cat $descriptors_path/dump_fingerprints_der.0

ITEM: TIMESTEP
0
ITEM: NUMBER OF ENTRIES
5670
ITEM: BOX BOUNDS xy xz yz pp pp pp
-0.0267343 19.174 1.7175e-07
0 10.3094 -0.0267343
0 30 2.20479e-05
ITEM: ENTRIES c_2[1] c_2[2] c_2[3] c_2[6] c_2[7] 
                   0                  159          2.809147721                    0     -3.023505726e-15 
                   1                   26         -5.154709601                    0      2.329636048e-14 
                   1                    2          1.866641364                    0      8.919950987e-15 
                   0                  193        -0.3297541221                    0      2.940360377e-09 
                   1                   53         -3.436460574                    0      4.091058331e-09 
                   1                    2          2.448748624                    0      1.656657379e-09 
                   0                  194         -3.582184318                    0       3.24516557e-19 
                   1                   54   

                  33                   51         -1.718227634                    0       5.97634684e-06 
                   2                    2          5.409570153                    0     -1.029828466e-05 
                  32                  155          17.25624431     -3.644439282e-13                    0 
                  33                   18         -1.718211794      1.303664926e-13                    0 
                   2                    1          4.016895966     -1.189802898e-13                    0 
                  32                  171          16.58741232                    0     -6.531909311e-21 
                  33                   48         -3.436464611                    0      5.429099564e-21 
                   2                    2          5.990220554                    0     -5.594986088e-21 
                  32                   30          6.943591122                    0      7.972424212e-22 
                  33                 

In [16]:
mote2data.num_fingerprints

2

## Create TFdataset

**Create tensorflow dataset for training**

In [17]:
# convert data to tensors
mote2data.convert_data_to_tensor()

Conversion may take a while for large datasets...
It took 0.0241 second.


In [18]:
# create tensorflow dataset
tf_dataset = tf.data.Dataset.from_tensor_slices((mote2data.input_dict,mote2data.output_dict))

dataset_path = './example_tfdataset'

# save the dataset
tf.data.experimental.save(tf_dataset, dataset_path)

# save the element_spec to disk for future loading, this is only needed for tensorflow lower than 2.6
with open(dataset_path + '/element_spec', 'wb') as out_: 
    pickle.dump(tf_dataset.element_spec, out_)

**Note: The above three steps just need to be done once for one data set, the training only uses the saved tensorflow dataset.**

## Training

**Load the dataset and train the model**

In [19]:
# load tensorflow dataset, for Tensorflow version lower than 2.6, need to specify element_spec.

with open(dataset_path + '/element_spec', 'rb') as in_:
    element_spec = pickle.load(in_)

dataset = tf.data.experimental.load(dataset_path,element_spec=element_spec)

In [20]:
# split the data to training, validation and testing sets

# train_dataset, val_dataset, test_dataset = split_dataset(dataset,0.7,0.2,0.1,shuffle=True)

In [21]:
get_fingerprints_num(dataset)

2

In [22]:
# Build the network
# See section 'Training' for detailed description on Network object.

elements = ['Mo','Te']
act_fun = 'tanh' # activation function
nfp = get_fingerprints_num(dataset) # number of fingerprints (or descriptors)
arch = [5,5,5] # NN layers

weights_init = tf.ones
bias_init = tf.zeros

model = Network(elements = elements,
                num_fingerprints = nfp,
                arch = arch,
                activation_function = act_fun,
                weights_initializer=weights_init,
                bias_initializer=bias_init)

In [23]:
for i in model.params:
    print(i)

In [24]:
# # Train the model 

# opt = 'adam' # optimizer
# loss_fun = 'mae' # loss function
# # scaling = 'std' # scaling the traning data with standardization
# lr = 0.02 # learning rate
# loss_weights = {'pe' : 1, 'force' : 1, 'stress': 0.1} # the weights in loss function

# model.train(train_dataset, val_dataset, \
#             optimizer=opt, \
#             loss_fun = loss_fun, \
#             batch_size=30, \
#             lr=lr, \
#             epochs=50, \
#             scaling=scaling, \
#             loss_weights=loss_weights, \
#             compute_all_loss=True, \
#             shuffle=True, \
#             append_loss=True)

In [25]:
# plot the training loss

# model.plot_loss(start_epoch=1)

In [26]:
model.loss_fun = 'mae'
model.loss_weights = {'pe' : 0.01, 'force' : 1}

In [27]:
# Evaluate using the first 5 data in test dataset

results = model.evaluate(dataset.take(1), return_prediction=True)

atom_pee: tf.Tensor(
[[[0.26598629]
  [0.26621604]
  [0.26598585]
  [0.26621503]
  [0.26598445]
  [0.26621259]
  [0.26599216]
  [0.26622205]
  [0.26599675]
  [0.2662184 ]
  [0.26598938]
  [0.26622071]
  [0.26599296]
  [0.26621783]
  [0.26598764]
  [0.2662148 ]
  [0.26598869]
  [0.26621266]
  [0.07966993]
  [0.15658691]
  [0.0799325 ]
  [0.15635178]
  [0.07965864]
  [0.15658323]
  [0.07993202]
  [0.15634626]
  [0.07965953]
  [0.15658216]
  [0.07992991]
  [0.15634878]
  [0.0796658 ]
  [0.15659498]
  [0.07993415]
  [0.15634882]
  [0.07966272]
  [0.1565862 ]
  [0.0799358 ]
  [0.15634493]
  [0.07966337]
  [0.15659038]
  [0.07992957]
  [0.15634711]
  [0.07966157]
  [0.15659317]
  [0.07992835]
  [0.15634532]
  [0.07965761]
  [0.15659354]
  [0.07992707]
  [0.1563498 ]
  [0.07965831]
  [0.1565985 ]
  [0.07992732]
  [0.1563429 ]]], shape=(1, 54, 1), dtype=float64)
Evaluation loss is:
        pe_loss:       3.3225e+02
     force_loss:       2.8461e-01
     total_loss:       3.6072e+00
The total l

In [28]:
results['pe']

array([9.04262323])

In [29]:
forces = results['force']

In [30]:
forces_outfile = open('atomdnn_forces.txt', 'w')

In [31]:
for i,row in enumerate(forces[0]):
#     print(f'{row[0]:12.9f} {row[1]:12.9f} {row[2]:12.9f}')
    forces_outfile.write(f'{row[0]:12.9f} {row[1]:12.9f} {row[2]:12.9f}\n')
    
forces_outfile.close()

In [32]:
!cat atomdnn_forces.txt

-0.343794587 -0.000012877  0.217295459
 0.343564866 -0.000011535 -0.217630817
-0.343845516  0.000042913  0.217246173
 0.343554749  0.000030960 -0.217665161
-0.343805659 -0.000002186  0.217215998
 0.343501408 -0.000047559 -0.217623363
-0.343860304 -0.000011428  0.217165631
 0.343589732 -0.000018106 -0.217601014
-0.343839021  0.000046777  0.217105471
 0.343574374  0.000044123 -0.217577459
-0.343814983 -0.000006471  0.217142016
 0.343551516 -0.000051575 -0.217617532
-0.343855471 -0.000061726  0.217255369
 0.343470042  0.000014616 -0.217862979
-0.343793792  0.000023229  0.217254849
 0.343556558 -0.000015494 -0.217721672
-0.343844138  0.000017728  0.217257328
 0.343544531  0.000009593 -0.217759278
-0.108360684  0.000004733  0.509910183
 0.035104614 -0.000017248  1.345921912
 0.108761181  0.000012110 -0.511254307
-0.035158033 -0.000000786 -1.344152306
-0.108381281 -0.000013654  0.509843577
 0.035134821  0.000022682  1.345884580
 0.108737773  0.000005213 -0.511251447


In [33]:
for i in model.variables:
    print(i.shape)

(2, 5)
(1, 5)
(2, 5)
(1, 5)
(5, 5)
(1, 5)
(5, 5)
(1, 5)
(5, 1)
(1,)
(5, 5)
(1, 5)
(5, 5)
(1, 5)
(5, 1)
(1,)
()
(3,)
()
(2,)
()


In [34]:
(2, 5)
(1, 5)
(5, 5)
(1, 5)
(5, 1)
(1,)
(2, 5)
(1, 5)
(5, 5)
(1, 5)
(5, 1)
(1,)

(1,)

In [35]:
# prediction using the first 5 data in test dataset

# input_dict = get_input_dict(test_dataset.take(5))
# model.predict(input_dict)

## Save/load model

**save the trained model**

In [36]:
# we re-write the descriptor here to empasize that it should be the same one defined above
# descriptor = {'name': 'acsf', 
#               'cutoff': 6.5,
#               'etaG2':[0.01,0.05,0.1,0.5,1,5,10], 
#               'etaG4': [0.01], 
#               'zeta': [0.08,0.2,1.0,5.0,10.0,50.0,100.0],
#               'lambda': [1.0, -1.0]}

# save_dir = 'example.tfdnn'
# network.save(model,save_dir,descriptor=descriptor)

**Load the trained model for continuous training and prediction**

In [37]:
# imported_model = network.load(save_dir)

# # Re-train the model 
# loss_weights = {'pe' : 1, 'force' : 1, 'stress': 0.1}

# opt = 'Adam'
# loss_fun = 'rmse'
# scaling = 'std'

# model.train(train_dataset, val_dataset, \
#             optimizer=opt, \
#             loss_fun = loss_fun, \
#             batch_size=30, \
#             lr=0.02, \
#             epochs=5, \
#             scaling=scaling, \
#             loss_weights=loss_weights, \
#             compute_all_loss=True, \
#             shuffle=True, \
#             append_loss=True)

In [38]:
# imported_model.evaluate(test_dataset.take(5),return_prediction=False)

In [39]:
# input_dict = get_input_dict(test_dataset.take(5))
# imported_model.predict(input_dict)

este debe estar en atomdnn [check]