<a href="https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/Tantalum_MLFF_FitSnap.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Step 1: Install LAMMPS and FitSNAP

In [1]:
!python --version

Python 3.10.12


In [42]:
pip install jarvis-tools

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting jarvis-tools
  Downloading jarvis_tools-2023.5.26-py2.py3-none-any.whl (974 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.6/974.6 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
Collecting spglib>=1.14.1 (from jarvis-tools)
  Downloading spglib-2.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (515 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m515.3/515.3 kB[0m [31m41.3 MB/s[0m eta [36m0:00:00[0m
Collecting xmltodict>=0.11.0 (from jarvis-tools)
  Downloading xmltodict-0.13.0-py2.py3-none-any.whl (10.0 kB)
Installing collected packages: xmltodict, spglib, jarvis-tools
Successfully installed jarvis-tools-2023.5.26 spglib-2.0.2 xmltodict-0.13.0


If you are running locally and have already installed LAMMPS and FitSNAP, skip this step.

In [2]:
# Install LAMMPS with Python interface.

!apt-get update
!apt install -y cmake build-essential git ccache openmpi-bin libopenmpi-dev python3.10-venv
!pip install --upgrade pip
!pip install numpy torch scipy virtualenv psutil pandas tabulate mpi4py Cython sklearn
!pip install ase
!pip install fitsnap3
%cd /content
!rm -rf lammps
!git clone https://github.com/lammps/lammps.git lammps
%cd /content/lammps
!rm -rf build
!mkdir build
%cd build
!cmake ../cmake -DLAMMPS_EXCEPTIONS=yes \
               -DBUILD_SHARED_LIBS=yes \
               -DMLIAP_ENABLE_PYTHON=yes \
               -DPKG_PYTHON=yes \
               -DPKG_ML-SNAP=yes \
               -DPKG_ML-IAP=yes \
               -DPKG_ML-PACE=yes \
               -DPKG_SPIN=yes \
               -DPYTHON_EXECUTABLE:FILEPATH=`which python`
!make -j 2
!make install-python

# Install FitSNAP.

%cd /content
!rm -rf FitSNAP
!git clone https://github.com/FitSNAP/FitSNAP
#!git clone -b collected-changes https://github.com/rohskopf/FitSNAP

# Set environment variables.

!$PYTHONPATH
%env PYTHONPATH=/env/python:/bin/bash:
%env LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/content/lammps/build

# Move into FitSNAP directory
%cd FitSNAP

Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease [3,622 B]
Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease
Hit:4 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:5 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease
Get:6 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Hit:7 http://ppa.launchpad.net/cran/libgit2/ubuntu focal InRelease
Get:8 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [2,400 kB]
Hit:9 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal InRelease
Get:10 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:11 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2,803 kB]
Get:12 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [1,063 kB]
Get:13 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu focal 

# Check if Python LAMMPS is working

In [3]:
import lammps
lmp = lammps.lammps()
print(lmp)

<lammps.core.lammps object at 0x7fd941263c70>


# Basic use

Necessary imports:

In [36]:
from mpi4py import MPI
import numpy as np
from fitsnap3lib.fitsnap import FitSnap

Then set up a communicator. In this simple example we will use the world communicator, which will actually get chosen by default if you optionally choose to not specify a communicator. Important points on parallelism:

- To take advantage of MPI processes, you must put these lines in a script and run like `mpirun -np P script.py`.
- Examples of this are shown in the `examples/library` directory.

In [37]:
# Set up your communicator.
comm = MPI.COMM_WORLD

The first mandatory step is to create input containing desired settings. These settings include everything from which descriptors to calculate, descriptor settings, which solver to use in performing a fit, how to separate the data into groups, and other options.

Import points on input settings:
- `settings` can be a dictionary defined like below, or a path to a traditional FitSNAP input script like `/path/to/Ta-example.in`.
- Some sections are not required. E.g.
    - `SCRAPER` is only required if you want to use native file scrapers.
    - `SOLVER` is only required if you want to use native model solvers and error analysis.
    - `GROUPS` is only required if you want to define groups of configurations with names and loss function weights.

In [55]:
# Create an input dictionary containing settings.

settings = \
{
"BISPECTRUM":
    {
    "numTypes": 1,
    "twojmax": 6,
    "rcutfac": 4.67637,
    "rfac0": 0.99363,
    "rmin0": 0.0,
    "wj": 1.0,
    "radelem": 0.5,
    "type": "Ta",
    "wselfallflag": 0,
    "chemflag": 0,
    "bzeroflag": 0,
    "quadraticflag": 0,
    },
"CALCULATOR":
    {
    "calculator": "LAMMPSSNAP",
    "energy": 1,
    "force": 1,
    "stress": 1
    },
"ESHIFT":
    {
    "Ta": 0.0
    },
"SOLVER":
    {
    "solver": "SVD",
    "compute_testerrs": 1,
    "detailed_errors": 1
    },
"SCRAPER":
    {
    "scraper": "JSON"
    },
"PATH":
    {
    "dataPath": "examples/Ta_Linear_JCP2014/JSON"
    },
"OUTFILE":
    {
    "metrics": "Ta_metrics.md",
    "potential": "Ta_pot"
    },
"REFERENCE":
    {
    "units": "metal",
    "atom_style": "atomic",
    "pair_style": "hybrid/overlay zero 10.0 zbl 4.0 4.8",
    "pair_coeff1": "* * zero",
    "pair_coeff2": "* * zbl 73 73"
    },
"EXTRAS":
    {
    "dump_dataframe": 1
    },
"GROUPS":
    {
    "group_sections": "name training_size testing_size eweight fweight vweight",
    "group_types": "str float float float float float",
    "smartweights": 0,
    "random_sampling": 0,
    "Displaced_A15" :  "0.8    0.2       100             1               1.00E-08",
    "Displaced_BCC" :  "0.8    0.2       100             1               1.00E-08",
    "Displaced_FCC" :  "0.8    0.2       100             1               1.00E-08",
    "Elastic_BCC"   :  "0.8    0.2     1.00E-08        1.00E-08        0.0001",
    "Elastic_FCC"   :  "0.8    0.2     1.00E-09        1.00E-09        1.00E-09",
    "GSF_110"       :  "0.8    0.2      100             1               1.00E-08",
    "GSF_112"       :  "0.8   0.2      100             1               1.00E-08",
    "Liquid"        :  "0.8    0.2       4.67E+02        1               1.00E-08",
    "Surface"       :  "0.8    0.2       100             1               1.00E-08",
    "Volume_A15"    :  "0.8    0.2      1.00E+00        1.00E-09        1.00E-09",
    "Volume_BCC"    :  "0.8   0.2      1.00E+00        1.00E-09        1.00E-09",
    "Volume_FCC"    :  "0.8    0.2      1.00E+00        1.00E-09        1.00E-09"
    }
}

Create an instance of `fitsnap` by feeding this input dictionary, along with the optional communicator, into the `FitSnap` class.

In [56]:
fs = FitSnap(settings, comm=comm, arglist=["--overwrite"])

In [57]:
fs.config

<fitsnap3lib.io.input.Config at 0x7fd80a889e70>

This creates a `fitsnap` instance which contains its own data, such as shared and distributed memory arrays in `fitsnap.pt`, and input settings in `fitsnap.config`. The shared and distributed memory arrays are associated with all processes in the supplied communicator `comm`.

Now we can use high-level library functions to perform a fit, with the following steps:
1. Scrape data to fit to. This is parallelized over all processes in `comm` with data stored in the `fitsnap.data` dictionary. Each MPI process has a different list of `data` dictionaries.
2. Calculate descriptors. This is parallelized over all processes in `comm` by operating on each configuration in `fitsnap.data`.
3. Fit potential. This is not parallelized over processes in most basic examples, e.g. the SVD solver will use all the data in a shared array on rank 0, and perform a simple least squares fit.

### 1. Scrape data

This step collects data (configurations of atoms) and injects it into a list of `FitSnap` data dictionaries. Users are
free to do this manually using their own formats. We will explore this option later in the tutorial. Here we provide a native
high-level function `scrape_functions()` for this purpose. As an instance owned function, `scrape_configs()` will scrape
according to the previously input `settings`. Most high-level functions in `FitSnap` act the same way; the `settings`
determine the state of a `FitSnap` instance which then determines the behavior of the high-level functions.

In [58]:
fs.scrape_configs()

'scrape_configs' took 452.83 ms on rank 0


This generates a list of `FitSnap` data dictionaries:

In [59]:
print(len(fs.data))

363


Each dictionary is formated like:

In [60]:
from jarvis.core.atoms import Atoms as JAtoms
def fs_to_jatoms(fs_entry=[]):
  elements=fs_entry['AtomTypes']
  cart_coords=fs_entry['Positions']
  lattice_mat=fs_entry['Lattice']
  atoms=JAtoms(elements=elements,coords=cart_coords,lattice_mat=lattice_mat,cartesian=True)
  return atoms


In [61]:
atms = fs_to_jatoms(fs.data[0])

In [62]:
atms


Ta64
1.0
10.6000003815 0.0 0.0
0.0 10.6000003815 0.0
0.0 0.0 10.6000003815
Ta
64
Cartesian
0.00046 10.54735 10.589320000000003
6.564150000000001 8.01596 10.54595
9.3142 8.00708 0.07335000000000001
5.28371 6.62455 2.60926
5.365 9.35513 2.65232
7.994210000000001 5.272850000000001 1.26512
7.936770000000001 5.32858 3.9651400000000003
5.298310000000001 0.06384 5.34991
7.981010000000001 2.63404 7.9256
6.686870000000001 2.58375 5.27465
9.31724 2.5934500000000003 5.36746
5.30469 1.39236 7.900550000000001
5.36637 3.9647400000000004 7.94588
7.968520000000001 10.544160000000002 6.595420000000001
8.0183 7.9703100000000004 2.5795900000000005
8.015530000000002 0.03504000000000001 9.33832
2.6718500000000005 7.9987900000000005 7.94992
1.3242500000000001 7.976250000000001 5.27971
3.9968500000000002 7.991850000000001 5.33585
0.00789 6.579790000000001 7.982410000000001
10.562580000000002 9.24146 7.928210000000001
2.62183 5.25654 6.57113
2.62673 5.246950000000001 9.20881
5.2818000000000005 5.2846200000000

Convert dataset in JARVIS-Atoms format

In [63]:
from jarvis.db.jsonutils import dumpjson
mem=[]
for ii,i in enumerate(fs.data):
  ta_id='Ta_fit_'+str(ii)
  atms = fs_to_jatoms(i)
  info={}
  info['atoms']=atms.to_dict()
  info['id']=ta_id
  info['energy']=i['Energy']
  info['energy_per_atom']=i['Energy']/atms.num_atoms
  info['forces']=i['Forces'].tolist()
  info['stress']=i['Stress'].tolist()
  mem.append(info)


In [64]:
dumpjson(data=mem,filename='ta_fitsnap.json')

In [65]:
print(fs.data[0])

{'PositionsStyle': 'angstrom', 'AtomTypeStyle': 'chemicalsymbol', 'StressStyle': 'bar', 'LatticeStyle': 'angstrom', 'EnergyStyle': 'electronvolt', 'ForcesStyle': 'electronvoltperangstrom', 'File': 'A15_3.json', 'Group': 'Displaced_A15', 'Stress': array([[23907.05,   444.79,  -765.15],
       [  444.79, 24515.94,   375.59],
       [ -765.15,   375.59, 27487.83]]), 'Positions': array([[4.600000e-04, 1.054735e+01, 1.058932e+01],
       [2.731290e+00, 2.710270e+00, 2.648050e+00],
       [1.304160e+00, 2.706760e+00, 5.081000e-02],
       [3.922400e+00, 2.568020e+00, 5.914000e-02],
       [1.052754e+01, 1.327140e+00, 2.649460e+00],
       [1.058917e+01, 3.982070e+00, 2.625480e+00],
       [2.647920e+00, 2.452000e-02, 1.380810e+00],
       [2.594880e+00, 5.512000e-02, 3.951990e+00],
       [5.221380e+00, 1.052799e+01, 3.746000e-02],
       [7.886340e+00, 2.631380e+00, 2.619960e+00],
       [6.694520e+00, 2.721900e+00, 6.773000e-02],
       [9.213730e+00, 2.710140e+00, 5.862000e-02],
       [5

This is the format used by `FitSnap` to feed data into LAMMPS for descriptor calculations in the next step.

### 2. Calculate descriptors

Here we use the native high-level `process_configs()` function, which does the following:
- Allocates shared memory arrays (if using MPI) to store descriptor and fitting information.
- Loop through all the configurations in the `fitsnap.data` list of dictionaries containing configuration info.
- Calculate descriptors for these configurations and store the information in the shared arrays `fitsnap.pt.shared_arrays`.

In [66]:
fs.process_configs()

'process_configs' took 3106.77 ms on rank 0


### 3. Perform fit

Fit a model with the native high-level `perform_fit()` function, which does the following:

- Solves the ML problem to get model coefficients, such as with linear regression or NNs, depending on the choice of
  solver in the `settings` dictionary.
- Analyze errors associated with the fits, which are stored in the `fitsnap.solver.errors` dataframe.

In [67]:
fs.perform_fit()

'fit' took 30.04 ms on rank 0
'error_analysis' took 451.81 ms on rank 0


  rsq = 1 - ssr / np.sum(np.square(g['truths'] - (g['truths'] / nconfig).sum()))
  w_rsq = 1 - w_ssr / np.sum(np.square((g['weights'] * g['truths']) - (g['weights'] * g['truths'] / w_nconfig).sum()))
  rsq = 1 - ssr / np.sum(np.square(g['truths'] - (g['truths'] / nconfig).sum()))
  w_rsq = 1 - w_ssr / np.sum(np.square((g['weights'] * g['truths']) - (g['weights'] * g['truths'] / w_nconfig).sum()))


Useful objects generated by this fit:

In [54]:
# Dataframe of detailed errors per group.
print(fs.solver.errors)

                                          ncount           mae          rmse  \
Group      Weighting  Testing  Subsystem                                       
*ALL       Unweighted Training Energy        363  1.127867e-01  3.797693e-01   
                               Force       12672  7.575758e-02  1.609730e-01   
                               Stress       2178  6.833857e+04  3.817442e+05   
           weighted   Training Energy        363  2.608423e-01  6.132321e-01   
                               Force       12672  7.574500e-02  1.609730e-01   
...                                          ...           ...           ...   
Volume_FCC Unweighted Training Force         372  3.256274e-15  7.553926e-15   
                               Stress        186  3.042005e+05  1.079178e+06   
           weighted   Training Energy         31  8.120769e-01  1.181203e+00   
                               Force         372  3.256274e-24  7.553926e-24   
                               Stress   

In [14]:
# List of fitting coefficients (for linear models).
print(fs.solver.fit)

[-2.97994849e+00 -1.14374540e-02 -7.65461855e-03 -5.02616837e-02
 -1.49917503e-01  9.46827936e-02  5.82627755e-02  6.06076097e-02
 -1.15443486e-01 -1.70155723e-01 -1.05692177e-01  3.97826631e-02
 -1.13740488e-01  4.04876497e-02 -7.26629413e-02 -6.48706053e-02
 -9.53306396e-02 -1.02394326e-01 -1.57112283e-01  4.85467075e-02
  2.49466074e-03  1.21982221e-03 -4.97372495e-02 -5.14062785e-02
 -3.41562112e-02 -1.59489125e-02 -1.50097346e-02 -6.22553797e-03
 -6.50157917e-02  3.96654127e-02  1.07549953e-02]


In [51]:
print(fs.solver.df)

None


In [15]:
# Dataframe containing all fitting info and metrics.
print(fs.solver.df)

         0             1             2             3             4  \
0      1.0  1.009033e+02      2.780691  6.357142e-01  8.179305e+00   
1      0.0  4.627919e+00      0.813000 -2.328668e-01  1.287718e+00   
2      0.0 -7.121813e-01      0.619934 -8.374431e-01  2.850890e+00   
3      0.0 -1.093067e+00     -0.748435  1.442453e-01  1.758252e+00   
4      0.0 -2.880540e+00     -0.476735  8.152660e-01 -5.128569e+00   
...    ...           ...           ...           ...           ...   
15208  0.0  1.724161e+06 -59282.146145 -6.858124e+04  1.261169e+06   
15209  0.0  1.724161e+06 -59282.146145 -6.858124e+04  1.261169e+06   
15210  0.0  3.823340e-11      0.000000  0.000000e+00 -2.867505e-11   
15211  0.0  3.823340e-11      0.000000  0.000000e+00 -1.911670e-11   
15212  0.0  0.000000e+00      0.000000  5.973969e-13 -1.433753e-11   

                  5             6             7             8             9  \
0     -2.940313e+00  1.045951e+00  1.264225e+00  6.488682e+01 -2.653318e+00   
1

### 4. Writing output files

In [16]:
# Write LAMMPS potential files.
fs.output.write_lammps(fs.solver.fit)
# Write error analysis.
fs.output.write_errors(fs.solver.errors)
# Look at files:
!ls

docs	     FitSNAP.df      README.md	    Ta_pot.snapcoeff  tutorial.ipynb
examples     LICENSE	     setup.cfg	    Ta_pot.snapparam
fitsnap3     log.lammps      Ta_metrics.md  tests
fitsnap3lib  pyproject.toml  Ta_pot.mod     tools


# Perform fits on multiple instances with different settings

Let's say we want to perform multiple fits with different settings, like different `twojmax` values.

In [17]:
# Make list of twojmax values to scan:
twojmax_list = [2,4,6,8,10]
# Make list of settings for each twojmax:
from copy import deepcopy
settings_list = [deepcopy(settings) for i in twojmax_list]
for i, twojmax in enumerate(twojmax_list):
    settings_list[i]["BISPECTRUM"]["twojmax"] = twojmax

print(len(settings_list))

5


Make a list of `FitSnap` instances, each with different settings:

In [18]:
instances = [FitSnap(setting, comm=comm, arglist=["--overwrite"]) for setting in settings_list]
print(instances)

[<fitsnap3lib.fitsnap.FitSnap object at 0x7fd80f78ee60>, <fitsnap3lib.fitsnap.FitSnap object at 0x7fd80f78e1d0>, <fitsnap3lib.fitsnap.FitSnap object at 0x7fd80f78f670>, <fitsnap3lib.fitsnap.FitSnap object at 0x7fd80f78db40>, <fitsnap3lib.fitsnap.FitSnap object at 0x7fd80f7bf6d0>]


Loop over all instances and fit:

In [19]:
for i, instance in enumerate(instances):
    print(f"--- Instance {i} with twojmax = {instance.config.sections['BISPECTRUM'].twojmax}")
    # No need to scrape configurations again, just use the previously scraped configs by injecting
    # the previous instance data into this instance data.
    instance.process_configs(data=fs.data)
    # Perform fit using the internal fitting data of this instance.
    instance.perform_fit()
    # Grab errors.
    f_mae = instance.solver.errors['mae'][('*ALL', 'Unweighted', 'Training', 'Force')]
    e_mae = instance.solver.errors['mae'][('*ALL', 'Unweighted', 'Training', 'Energy')]

--- Instance 0 with twojmax = ['2']
'process_configs' took 1789.51 ms on rank 0
'fit' took 10.09 ms on rank 0
'error_analysis' took 292.51 ms on rank 0
--- Instance 1 with twojmax = ['4']
'process_configs' took 2108.96 ms on rank 0
'fit' took 19.98 ms on rank 0
'error_analysis' took 340.94 ms on rank 0
--- Instance 2 with twojmax = ['6']
'process_configs' took 3493.98 ms on rank 0
'fit' took 39.57 ms on rank 0
'error_analysis' took 435.78 ms on rank 0
--- Instance 3 with twojmax = ['8']
'process_configs' took 7938.77 ms on rank 0
'fit' took 49.40 ms on rank 0
'error_analysis' took 347.72 ms on rank 0
--- Instance 4 with twojmax = ['10']
'process_configs' took 14606.70 ms on rank 0
'fit' took 90.91 ms on rank 0
'error_analysis' took 351.23 ms on rank 0


Look at the errors:

In [20]:
# Now each instance contains fitting information (configurations and their descriptors) and errors.
for instance in instances:
    # Extract specific errors from the errors dataframe.
    # NOTE: No `Testing` key will exist if no testing groups were defined in `settings`.
    ftest_mae = instance.solver.errors['mae'][('*ALL', 'Unweighted', 'Training', 'Force')]
    etest_mae = instance.solver.errors['mae'][('*ALL', 'Unweighted', 'Training', 'Energy')]
    print(f"{instance.config.sections['BISPECTRUM'].twojmax[0]} \
          {ftest_mae:0.5f}     {etest_mae:0.5f}")

2           0.39726     0.97260
4           0.15141     0.16422
6           0.07576     0.11279
8           0.06785     0.07044
10           0.05353     0.05356


#### Note on shared memory (if using MPI).
If using MPI, each instance allocates shared memory for storing the parallel arrays. Users must therefore take care to not allocate too many `FitSnap` instances, and to properly free memory associated with unused instances. We free shared array memory by overriding the `del` statement in `FitSnap`:

In [21]:
# Free shared memory of all instances (only necessary if using MPI):
for instance in instances:
    del instance

This example looped over fits sequentially, where each fit shared the same communicator. One could however use split communicators to achieve fits in parallel.

# How to just get the descriptors for a data set?

Sometimes we want to simply extract descriptors for data analysis without going through the pain of
performing a fit.

TODO: Show example of extracting descriptors from configs then doing data analysis (t-SNE)

### Extracting SNAP descriptors.

If we're only calculating descriptors, we just need a simple `settings` dictionary.

In [22]:
settings = \
{
"BISPECTRUM":
    {
    "numTypes": 1,
    "twojmax": 6,
    "rcutfac": 4.67637,
    "rfac0": 0.99363,
    "rmin0": 0.0,
    "wj": 1.0,
    "radelem": 0.5,
    "type": "Ta",
    "wselfallflag": 0,
    "bzeroflag": 1,
    "bikflag": 1
    },
"CALCULATOR":
    {
    "calculator": "LAMMPSSNAP",
    "energy": 1,
    "force": 0,
    "stress": 0,
    "per_atom_energy": 1
    },
"REFERENCE":
    {
    "units": "metal",
    "atom_style": "atomic",
    "pair_style": "zero 6.0",
    "pair_coeff": "* *"
    }
}

Make an instance like usual:

In [23]:
fs = FitSnap(settings, arglist=["--overwrite"])

Get data from ASE `Atoms` objects:

In [24]:
!pip install ase
from ase.io import read
frames = read("examples/Ta_XYZ/XYZ/Displaced_FCC.xyz", ":")
print(type(frames))

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
<class 'list'>


Use our ASE scraper to inject a list of `Atoms` objects into a particular instance:

In [25]:
from fitsnap3lib.scrapers.ase_funcs import ase_scraper
data = ase_scraper(frames)

Loop over configurations and calculate descriptors for each separately.

In [26]:
for i, configuration in enumerate(data):
    print(i)
    a,b,w = fs.calculator.process_single(configuration)
    print(np.shape(a))

0
(48, 30)
1
(48, 30)
2
(48, 30)
3
(48, 30)
4
(48, 30)
5
(48, 30)
6
(48, 30)
7
(48, 30)
8
(48, 30)


### Extracting ACE descriptors.

### WARNING: ACE descriptors are not supported in colab yet.

Declare settings dictionary:

In [27]:
settings = \
{
"ACE":
    {
    "numTypes": 1,
    "ranks": "1 2 3",
    "lmax":  "1 2 2",
    "nmax": "22 2 2",
    "nmaxbase": 22,
    "rcutfac": 4.604694451,
    "lambda": 3.5,
    "type": "Ta",
    "lmin": 0,
    "bzeroflag": 1,
    "bikflag": 1,
    "RPI_heuristic": "root_SO3_span"
    },
"CALCULATOR":
    {
    "calculator": "LAMMPSPACE",
    "energy": 1,
    "force": 0,
    "stress": 0,
    "per_atom_energy": 1
    },
"REFERENCE":
    {
    "units": "metal",
    "atom_style": "atomic",
    "pair_style": "zero 6.0",
    "pair_coeff": "* *"
    }
}

Make `FitSnap` instance:

In [28]:
fs = FitSnap(settings, arglist=["--overwrite"])

Generating your first pickled library of Wigner 3j coefficients. This will take a few moments...


Get configurations from somewhere (e.g. ASE):

In [29]:
from ase.io import read
from fitsnap3lib.scrapers.ase_funcs import ase_scraper
frames = read("examples/Ta_XYZ/XYZ/Displaced_FCC.xyz", ":")
data = ase_scraper(frames)

Now this `FitSnap` instance has a list of dictionaries containing structural info:

In [32]:
print(len(data))

9


Loop over these configurations and calculate ACE descriptors:

In [31]:
for i, configuration in enumerate(data):
    print(i)
    a,b,w = fs.calculator.process_single(configuration)
    print(np.shape(a))

0


Exception: ignored

# How to process configs once then do multiple fits?

This is useful if doing many fits with the same calculator (descriptor) settings but different solver settings. We can do this by:

1. Using one `FitSnap` instance to process configs and store data in its shared arrays.
2. Using this data as input to the solver functions of another instance.

First let's make an instance for calculating descriptors.


In [None]:
settings = \
{
"BISPECTRUM":
    {
    "numTypes": 1,
    "twojmax": 6,
    "rcutfac": 4.67637,
    "rfac0": 0.99363,
    "rmin0": 0.0,
    "wj": 1.0,
    "radelem": 0.5,
    "type": "Ta",
    "wselfallflag": 0,
    "chemflag": 0,
    "bzeroflag": 1,
    "bikflag": 1,
    "dgradflag": 1
    },
"CALCULATOR":
    {
    "calculator": "LAMMPSSNAP",
    "energy": 1,
    "force": 1,
    "per_atom_energy": 1,
    "nonlinear": 1
    },
"PYTORCH":
    {
    "layer_sizes": "num_desc 64 64 1",
    "learning_rate": 1e-4,
    "num_epochs": 10,
    "batch_size": 4, # 363 configs in entire set
    "save_state_output": "Ta_Pytorch.pt"
    },
"SOLVER":
    {
    "solver": "PYTORCH"
    },
"SCRAPER":
    {
    "scraper": "JSON"
    },
"PATH":
    {
    "dataPath": "examples/Ta_Linear_JCP2014/JSON"
    },
"REFERENCE":
    {
    "units": "metal",
    "atom_style": "atomic",
    "pair_style": "hybrid/overlay zero 6.0 zbl 4.0 4.8",
    "pair_coeff1": "* * zero",
    "pair_coeff2": "* * zbl 73 73"
    },
"GROUPS":
    {
    "group_sections": "name training_size testing_size eweight fweight",
    "group_types": "str float float float float",
    "smartweights": 0,
    "random_sampling": 0,
    "Displaced_A15" :  "0.7 0.3 1e-2 1",
    "Displaced_BCC" :  "0.7 0.3 1e-2 1",
    "Displaced_FCC" :  "0.7 0.3 1e-2 1",
    "Elastic_BCC"   :  "0.7 0.3 1e-2 1",
    "Elastic_FCC"   :  "0.7 0.3 1e-2 1",
    "GSF_110"       :  "0.7 0.3 1e-2 1",
    "GSF_112"       :  "0.7 0.3 1e-2 1",
    "Liquid"        :  "0.7 0.3 1e-2 1",
    "Surface"       :  "0.7 0.3 1e-2 1",
    "Volume_A15"    :  "0.7 0.3 1e-2 1",
    "Volume_BCC"    :  "0.7 0.3 1e-2 1",
    "Volume_FCC"    :  "0.7 0.3 1e-2 1"
    }
}

In [None]:
fs1 = FitSnap(settings, arglist=["--overwrite"])
fs1.scrape_configs()
fs1.process_configs()

Now use the descriptor data from this instance (which is stored in `fs1.pt`), to perform many fits
with other instances possessing different settings.

In [None]:
# Fit with one learning rate:
settings2 = deepcopy(settings)
settings2["PYTORCH"]["learning_rate"] = 1e-3
fs2 = FitSnap(settings2, arglist=["--overwrite"])
# Fit with the shared array data from instance `fs1`.
fs2.solver.perform_fit(pt=fs1.pt)

In [None]:
# Fit with a larger learning rate:
settings3 = deepcopy(settings)
settings3["PYTORCH"]["learning_rate"] = 1e-6
fs3 = FitSnap(settings3, arglist=["--overwrite"])
# Fit with the shared array data from instance `fs1`.
fs3.solver.perform_fit(pt=fs1.pt)

Get errors of the two instances.

In [None]:
# For NNs, `solver.errors` is currently a tuple of dictionaries.
# Errors for larger learning rate:
fs2.solver.error_analysis()
(mae_f, mae_e, rmse_f, rmse_e, count_train, count_test) = fs2.solver.errors
# Look at force MAE of specific group:
print(mae_f["Displaced_A15"])

In [None]:
# Errors for smaller learning rate:
fs3.solver.error_analysis()
(mae_f, mae_e, rmse_f, rmse_e, count_train, count_test) = fs3.solver.errors
# Look at force MAE of specific group:
print(mae_f["Displaced_A15"])

# Hiearchical parallelism with custom communicators

These simple examples all used a single world communicator. Our design, however, allows one to create many instances each with a different communicator, to get creative with how fits are done in parallel. For example one could split the communicator among a group of processes, and then perform multiple fits *in parallel*, where each fit is performed in parallel using the processes in its communicator. This is beyond the scope of a iPython notebook since it requires making custom Python scripts with MPI.

# How do the shared arrays work?

Each `FitSnap` instance contains shared arrays inside the `snap.pt.shared_arrays` dictionary. The descriptor array is stored in `snap.pt.shared_arrays['a'].array`. The contents of this array are shared in memory between all processes in the instance communicator `snap.pt._comm` (this is the same communicator we passed when creating the instance). This means that when an element of the shared array is changed on one process in `comm`, it will change the shared array with all other processes in the same communicator. This is important because although each `snap.pt` instance is different for all processes in a communicator, the contents `snap.pt.shared_arrays['a'].array` are shared.