# PANNA Tutorial

## Part 3 - NN Interatomic potential: Extract & Use elsewhere


### Part 3a - Extract the network weights

As of now, the network is still in a format only available within tensorflow. In this part of the tutorial we will see 
* how to extract it and 
* use elsewehre, e.g. within LAMMPS or ASE.

The second part of this tutorial requires LAMMPS to be installed on your system with the PANNA plugin.
See that part for installation instructions.
The last part requires the installation of ASE, see details in that cell.

If LAMMPS and the plugin are already installed, please modify the executable string in the next cell as appropriate.
The other setup is similar to the previous notebooks.

In [1]:
import os

# Specify the absolute path to PANNA (or leave this relative path)
panna_dir = os.path.abspath('../..')

# In case you need to mount the drive
# from google.colab import drive
# drive.mount('/content/drive')
# panna_dir = '/content/drive/MyDrive/your_path_to_panna'

# Cleaning up path for command line
panna_cmdir = panna_dir.replace(' ', '\ ')

# Check if PANNA is installed, otherwise install it
try:
  import panna
  print("PANNA is installed correctly")
except ModuleNotFoundError:
  print("PANNA not found, attempting to install")
  !pip install panna_cmdir

# Specify your lammps command
lammps_cmd = 'path_to_lammps/bin/lmp'
lammps_cmd = lammps_cmd.replace(' ', '\ ')

PANNA is installed correctly


The script that converts the network model from TF checkpoint format to one that can be used outside TF is ``extract_weights.py``. The configuration file for this script is minimal:

In [2]:
!cat {panna_cmdir+'/doc/tutorial/input_files/myextract.ini'}

[IO_INFORMATION]
network_dir = ./mytrain/_models
step_number = -1
output_dir = ./panna_weights
output_type = PANNA
train_input = ./input_files/mytrain.ini
gvector_input = ./input_files/mygvect_sample.ini


In [3]:
!cd {panna_cmdir+'/doc/tutorial/'}; python {panna_cmdir+'/src/panna/extract_weights.py'} --config ./input_files/myextract.ini

2023-01-30 12:09:40.618302: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2023-01-30 12:09:40.618346: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
INFO - 
    ____   _    _   _ _   _    _           
   |  _ \ / \  | \ | | \ | |  / \     
   | |_) / _ \ |  \| |  \| | / _ \     
   |  __/ ___ \| |\  | |\  |/ ___ \    
   |_| /_/   \_\_| \_|_| \_/_/   \_\ 

 Properties from Artificial Neural Network Architectures

INFO - PBC will be determined by lattice parameters in the json file for each example
INFO - Radial Gaussian centers are set by Rs0_rad, Rc_rad, RsN_r

The script makes checks and writes logs about what kind of input this network was prepared with (e.g. descriptor type). Then it stores the weights in the specified format. Above we have specified PANNA format which stores metadata in a `.json` file and all parameters in several `.npy` files for easy manipulation.

In [4]:
# PANNA data format outputs:
!ls {panna_cmdir+'/doc/tutorial/panna_weights'}

C_l0_b.npy  C_l2_w.npy	H_l2_b.npy		N_l1_b.npy  O_l0_w.npy
C_l0_w.npy  H_l0_b.npy	H_l2_w.npy		N_l1_w.npy  O_l1_b.npy
C_l1_b.npy  H_l0_w.npy	networks_metadata.json	N_l2_b.npy  O_l1_w.npy
C_l1_w.npy  H_l1_b.npy	N_l0_b.npy		N_l2_w.npy  O_l2_b.npy
C_l2_b.npy  H_l1_w.npy	N_l0_w.npy		O_l0_b.npy  O_l2_w.npy


If you are interested in using the potential in a code like LAMMPS, you can use the corresponding format:

In [3]:
!cat {panna_cmdir+'/doc/tutorial/input_files/myextract_lammps.ini'}

[IO_INFORMATION]
network_dir = ./mytrain/_models
step_number = -1
output_dir = ./lammps_weights
output_type = LAMMPS
train_input = ./input_files/mytrain.ini
gvector_input = ./input_files/mygvect_sample.ini


In [6]:
!cd {panna_cmdir+'/doc/tutorial/'}; python {panna_cmdir+'/src/panna/extract_weights.py'} --config ./input_files/myextract_lammps.ini

2023-01-30 12:12:20.596443: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2023-01-30 12:12:20.596558: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
INFO - 
    ____   _    _   _ _   _    _           
   |  _ \ / \  | \ | | \ | |  / \     
   | |_) / _ \ |  \| |  \| | / _ \     
   |  __/ ___ \| |\  | |\  |/ ___ \    
   |_| /_/   \_\_| \_|_| \_/_/   \_\ 

 Properties from Artificial Neural Network Architectures

INFO - PBC will be determined by lattice parameters in the json file for each example
INFO - Radial Gaussian centers are set by Rs0_rad, Rc_rad, RsN_r

In [7]:
# LAMMPS data format outputs:
!ls {panna_cmdir+'/doc/tutorial/lammps_weights/'}

panna.in  weights_C.dat  weights_H.dat	weights_N.dat  weights_O.dat


In [8]:
# And the file specifying the network reads
!cat {panna_cmdir+'/doc/tutorial/lammps_weights/panna.in'}

!gversion = 1
[GVECT_PARAMETERS]
Nspecies = 4
species = H,C,O,N
RsN_rad = 16
eta_rad = [16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0]
Rc_rad = 4.6
Rs_rad = [0.5, 0.75625, 1.0125, 1.2687499999999998, 1.525, 1.78125, 2.0374999999999996, 2.2937499999999997, 2.55, 2.80625, 3.0625, 3.3187499999999996, 3.5749999999999997, 3.83125, 4.0874999999999995, 4.34375]
RsN_ang = 4
eta_ang = [6.0, 6.0, 6.0, 6.0]
Rc_ang = 3.1
Rs_ang = [0.5, 1.15, 1.7999999999999998, 2.4499999999999997]
ThetasN = 8
zeta = [50, 50, 50, 50, 50, 50, 50, 50]
Thetas = [0.19634954084936207, 0.5890486225480862, 0.9817477042468103, 1.3744467859455345, 1.7671458676442586, 2.1598449493429825, 2.552544031041707, 2.945243112740431]

[H]
Nlayers = 3
sizes = 128,32,1
file = weights_H.dat
activations = 1,1,0

[C]
Nlayers = 3
sizes = 128,32,1
file = weights_C.dat
activations = 1,1,0

[O]
Nlayers = 3
sizes = 128,32,1
file = weights_O.dat
activations = 1,1,0

### Part 3b - Use the PANNA potential in an MD calculation with LAMMPS

The mock network we have created so far is not the best to force field out there as we have used little data and a very short training -- to demonstrate this step, we will use a network that is fairly close to a published one (here: https://www.nature.com/articles/s41524-021-00508-6 )

This is a Carbon potential with G-vector size 144, two hidden layers with 64 and 32 nodes. This results in a network with ~10k parameters. Network is given here in lammps format under  

In [2]:
!ls {panna_cmdir+'/doc/tutorial/tutorial_data/C_net/'}

weights_C.dat  weights.out


To use this potential, LAMMPS and the PANNA plugin need to be installed.

The main LAMMPS code can be downloaded here: https://www.lammps.org/download.html

For the PANNA plugin the files `pair_panna.cpp` and `pair_panna.h` found in `src/panna/interfaces/` should be copied to the LAMMPS `src` before compilation.

While you can find better compilation instructions for your system on the LAMMPS website, a sample installation procedure with `cmake` would look like this:
 
```
cd lammps/
mkdir build; cd build
cmake ../cmake
cmake --build .
```
which should create the `lmp` executable under `build`.

If you didn't do it before, please specify your lammps command before proceeding.

Now we can use the executable, the Carbon potential and run a basic lammps MD with the input file that looks like the following, for example, for an NVE run with initial velocities at 300K:

In [3]:
!cat {panna_cmdir+'/doc/tutorial/input_files/lammps_nve.in'}

##########################################################
# Simple cell relaxation CUBIC CELL
##########################################################
dimension       3
boundary        p p p   # periodic boundary conditions 
atom_style      atomic
atom_modify     map array

units           metal
variable a equal 3.58
lattice         diamond $a
region          box prism 0 1.0 0 1.0 0 1.0 0.0 0.0 0.0
create_box      1 box
create_atoms    1 box

pair_style      panna
pair_coeff * * tutorial_data/C_net/ weights.out

mass 1 12.0107

velocity all create 300.0 4928459

timestep 0.001
thermo           5
thermo_style custom step temp pe etotal press vol cella cellb cellc
fix 1 all nve

dump            23 all custom 1 lammps_nve.dat id type x y z fx fy fz 

dump_modify     23 element C

run 100


We can now briefly run LAMMPS (it should take a few seconds) and inspect the output log file.

In [4]:
!cd {panna_cmdir+'/doc/tutorial/'}; {lammps_cmd} -i input_files/lammps_nve.in

LAMMPS (24 Dec 2020)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:94)
  using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 3.5800000 3.5800000 3.5800000
Created triclinic box = (0.0000000 0.0000000 0.0000000) to (3.5800000 3.5800000 3.5800000) with tilt (0.0000000 0.0000000 0.0000000)
  1 by 1 by 1 MPI processor grid
Created 8 atoms
  create_atoms CPU = 0.000 seconds
Loading PANNA pair parameters from tutorial_data/C_net//weights.out
G Version is 0
G Version is 0
G Version is 0
G Version is 0
G Version is 0
G Version is 0
G Version is 0
G Version is 0
G Version is 0
G Version is 0
G Version is 0
G Version is 0
G Version is 0
G Version is 0
Network loaded!
Neighbor list info ...
  update every 10 steps, delay 0 steps, check yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 5.6
  ghost atom cutoff = 5.6
  binsize = 2.8, bins = 2 2 2
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair panna, perpe

In [5]:
!head -20 {panna_cmdir+'/doc/tutorial/log.lammps'}

LAMMPS (24 Dec 2020)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:94)
  using 1 OpenMP thread(s) per MPI task
##########################################################
# Simple cell relaxation CUBIC CELL
##########################################################
dimension       3
boundary        p p p   # periodic boundary conditions
atom_style      atomic
atom_modify     map array

units           metal
variable a equal 3.58
lattice         diamond $a
lattice         diamond 3.58
Lattice spacing in x,y,z = 3.5800000 3.5800000 3.5800000
region          box prism 0 1.0 0 1.0 0 1.0 0.0 0.0 0.0
create_box      1 box
Created triclinic box = (0.0000000 0.0000000 0.0000000) to (3.5800000 3.5800000 3.5800000) with tilt (0.0000000 0.0000000 0.0000000)
  1 by 1 by 1 MPI processor grid


### Part 3b - Use the PANNA potential in an MD calculation with ASE

In the following, we will show how to use a potential with the ASE code.
Similar to what was shown previously, a potential can be extracted from the training by calling ``extract_weights.py`` and setting ``output_type`` to ``ASE``. This will create a text configuration file (see below for a look at this file) and copy the desired model to a folder.

As done with LAMMPS, we will use a premade potential for Carbon in this example: a modest model with a descriptor of size 152, and 2 hidden layers of size 64 and 32.

If you have not installed ASE, the next cell give you an opportunity to do so.

In [4]:
# Uncomment this cell if you want to install ASE
#!pip install ase

# Importing ASE modules
import ase
import ase.md

# Importing our calculator
from panna.interfaces.ASE import PANNACalculator

The configuration file generated for ASE simply contains some information taken from your training and gvector input. In addition, it specifies the location of the parameter file.

All these data are in plain text and can be modified, but of course be careful of the compatibility with the parameter file.

In [5]:
!cat {panna_cmdir+'/doc/tutorial/tutorial_data/C_ASE/C_model.in'}

[IO_INFORMATION]
input_format = example
network_file = ./model/epoch_140_step_500000

[DATA_INFORMATION]
atomic_sequence = C
output_offset = 0.0

[DEFAULT_NETWORK]
g_size = 152
architecture = 64:32:1
trainable = 1:1:1

[SYMMETRY_FUNCTION]
type = mBP
species = C

[GVECT_PARAMETERS]
gvect_parameters_unit = angstrom
eta_rad = 32
Rc_rad = 5.0
Rs0_rad = 0.5
RsN_rad = 24
eta_ang = 16.0
Rc_ang = 5.0
Rs0_ang = 0.5
RsN_ang = 8
zeta = 128.0
ThetasN = 16


In the next cell we show as small example of running a few steps of Langevin dynamics.
We load a small configuration of 16 Carbon atoms, we load our calculator, then we attach our calculator to the configuration and we can set up and run the dynamics.

Here, we simply print the initial and final position to show that the dynamics has run. The plugin will use your GPU if available, and should be quite fast, apart from the first step in which the model needs to be set up: just to show this, we take the first step separately (the calculator might also output several tensorflow warnings).

In [7]:
myconf = ase.io.read(panna_dir+'/doc/tutorial/tutorial_data/C_ASE/myconf.xyz', ':')[0]
pcalc = PANNACalculator(config=panna_dir+'/doc/tutorial/tutorial_data/C_ASE/C_model.in')
myconf.set_calculator(pcalc)
print(myconf.get_positions())
print("Setting up the calculator")
dyn = ase.md.langevin.Langevin(myconf, ase.units.fs, temperature_K=300, friction=2e-3)
dyn.run(1)
print("Setup finished, starting 100 steps of dynamics")
dyn.run(100)
print("Dynamics finished")
print(myconf.get_positions())

INFO - reading /home/pellegrini/panna_prerelease/doc/tutorial/tutorial_data/C_ASE/C_model.in
INFO - Found a default network!
INFO - This network size will be used as default for all species unless specified otherwise
INFO - Radial Gaussian centers are set by Rs0_rad, Rc_rad, RsN_rad
INFO - Angular descriptor centers are set by ThetasN
INFO - Radial-angular Gaussian centers are set by Rs0_ang, Rc_ang, RsN_ang
[[ 1.3319575   1.56879852  1.18932058]
 [ 4.12971908  4.01165022  1.18729786]
 [ 0.49888834  1.61777113  2.34113123]
 [ 0.98289671  2.32240847  0.03216651]
 [ 0.84791568  0.86458353  3.49852078]
 [ 4.81177067  2.48599037  3.49436865]
 [-0.18364178  3.14267971  0.03403708]
 [ 5.2960202   3.19093074  1.18550115]
 [ 3.29723893  4.06126866  2.33947783]
 [ 3.78063809  4.76468126  0.02990831]
 [ 3.6455188   3.30681149  3.49613758]
 [-0.66784648  2.43782079  2.34283072]
 [ 3.32801088  0.69459256  0.03340893]
 [ 2.01206277  0.04104273  3.49528784]
 [ 2.49613995  0.74529307  1.18610153]
 [ 

In [8]:
# Run this to cleanup the tutorial directory
# (will also cleanup the outputs of tutorial 2)
!cd {panna_cmdir+'/doc/tutorial'}; rm -rf mytrain mytrain_logs myvalidation tf.log lammps_weights panna_weights lammps_nve.dat log.lammps

##### Now you know how to generate NN interatomic potentials -- go ahead and use them for good!
##### The world needs your solutions for the climate crisis!