# SNAP Databases

* Author: Xin Chen
* Email: Bismarrck@me.com
* Date: Apr 10, 2019

This notebook demonstrates how to build 
**[SNAP](http://pymatgen.org/_modules/pymatgen/core/structure.html#IMolecule.as_dict)** datasets for TensorAlloy.
The SNAP, built by Shyue Ping Ong's group, has 3973 unique Ni-Mo solids:

1. Ni: 461 structures
2. Mo: 284 structures
3. Ni$_{\mathrm{3}}$Mo: 321 structures
4. Ni$_{\mathrm{4}}$Mo: 321 structures
5. Ni$_{\mathrm{Mo}}$: 1668 structures
6. Mo$_{\mathrm{Ni}}$: 918 structures

**References** 

1. [PHYSICAL REVIEW B 98, 094104 (2018)](https://journals.aps.org/prb/abstract/10.1103/PhysRevB.98.094104)
2. [PHYSICAL REVIEW M 1, 043603 (2017)](https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.1.043603)

In [1]:
from __future__ import print_function

In [2]:
import zipfile
import shutil
import json
import glob
import os
import requests
import numpy as np

from os.path import join, basename, splitext, exists
from pymatgen.core.structure import Structure
from ase.calculators.singlepoint import SinglePointCalculator
from ase import Atoms
from ase.io import write, read
from typing import List

from tensoralloy.io.read import read_file

Below is the function to convert a list of `dict` to `ase.Atoms` objects and then
write the trajectory to an `extxyz` files.

**Note:** the unit of the stress values is `-1  * kbar`.

In [3]:
def json2extxyz(list_of_dicts: List[dict], output_file, weights: np.ndarray,
                source: str, tag: str):
    """
    Convert the JSON-serialized structures in `list_of_dicts` to `Atoms` 
    objects and write the trajectory to an `extxyz` file.
    
    Parameters
    ----------
    list_of_dicts : List[dict]
        A list of JSON-serialized dicts.
    output_file : str
        The output file to write.
    weights : array_like
        The confidences of energy, forces and stress.
    source : str
        The source of this JSON file.
    tag : str
        The tag of this JSON file.
    
    Returns
    -------
    ntotal : int
        The total number of structures found.
    
    """
    trajectory = []
    for i in range(len(list_of_dicts)):
        structure = Structure.from_dict(list_of_dicts[i]['structure'])
        atoms = Atoms(list(map(str, structure.species)), 
                      structure.cart_coords, 
                      cell=structure.lattice.matrix,
                      pbc=[True, True, True])
        if 'outputs' in list_of_dicts[i]:
            outputs = list_of_dicts[i]['outputs']
            outputs['stress'] = [-x for x in outputs['stress']]
            calc = SinglePointCalculator(atoms, **outputs)
            atoms.calc = calc
            params = list_of_dicts[i]['params']
            params.pop('pp')
            atoms.info = dict(tags=list_of_dicts[i]['tags'], **params)
        else:
            # The Mo dataset was created separately so the format is different.
            data = list_of_dicts[i]['data']
            results = {
                'energy': data['energy_per_atom'] * len(atoms),
                'forces': data['forces'],
                'stress': [-x for x in data['virial_stress']],
            }
            calc = SinglePointCalculator(atoms, **results)
            atoms.calc = calc
        atoms.info['weights'] = weights
        atoms.info['source'] = f"{source}.{tag}.{i}"
        trajectory.append(atoms)
    write(output_file, trajectory, format='extxyz')
    return len(trajectory)

Download the original SNAP dataset if needed. Then unzip this file.

In [4]:
local_dir = 'snap-master'
local_filename = 'snap-master.zip'

def download_or_unzip():
    if not exists(local_filename):
        url = "https://codeload.github.com/materialsvirtuallab/snap/zip/master"
        r = requests.get(url, stream=True)
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)
    if exists(local_dir):
        shutil.rmtree(local_dir, ignore_errors=True)
    with zipfile.ZipFile(local_filename, 'r') as zip_ref:
        zip_ref.extractall('.')
    if exists('__MACOSX'):
        shutil.rmtree('__MACOSX')

download_or_unzip()

Generate the extxyz files.

There are four subsets:

* Mo : AIMD (NPT), AIMD (NVT), Elastic, GB, Surface, Vacancy, Dope(Ni)
* Ni : AIMD, Elastic, Surface, Vacancy, Surface (test), Dope(Mo)
* Ni3Mo : AIMD, Elastic
* Ni4Mo : AIMD, Elastic

In their original papers, the confidences (weights) of the structures are different.

* AIMD, Surface, GB: 
    - energy: (0.5, 3000)
    - forces: (0.001, 100)
* Elastic: 
    - energy: (0.05, 10000)
    - stress: (0.001, 10)

In [5]:
sources = ['Mo', 'Ni', 'Ni-Mo']
total = 0
counts = []
all_weights = {
    'aimd': np.array([1.0, 1.0, 0.0]),
    'surface': np.array([1.0, 1.0, 0.0]),
    'gb': np.array([1.0, 1.0, 0.0]),
    'vacancy': np.array([1.0, 1.0, 0.0]),
    'elastic': np.array([10.0, 0.0, 1.0])
}

for index, source in enumerate(sources):
    training_dir = join(local_dir, source, 'training')
    list_of_files = list(glob.glob(f"{training_dir}/*.json"))
    for afile in list_of_files:
        if index == 3:
            output_file = f"{splitext(basename(afile))[0]}.extxyz"
        else:
            tag = splitext(basename(afile))[0]
            output_file = f"{source.replace('-', '')}.{tag}.extxyz"
        
        for key, values in all_weights.items():
            if key in tag.lower():
                weights = values
                break
            elif 'dope' in tag:
                weights = all_weights['aimd']
                break
        else:
            raise KeyError(f"The weights of {afile} cannot be set!")
        
        with open(afile) as fp:
            try:
                n = json2extxyz(json.load(fp), 
                                output_file, 
                                weights=weights,
                                source=source,
                                tag=tag)
                total += n
            except Exception as excp:
                print(f"Failed to proceed {afile}: {str(excp)}")
            else:
                print(f"{afile}, {n} structures, weights={weights}")
print(f"Total {total} structures.")

snap-master/Mo/training/Elastic.json, 67 structures, weights=[10.  0.  1.]
snap-master/Mo/training/Surface.json, 11 structures, weights=[1. 1. 0.]
snap-master/Mo/training/Vacancy.json, 24 structures, weights=[1. 1. 0.]
snap-master/Mo/training/GB.json, 13 structures, weights=[1. 1. 0.]
snap-master/Mo/training/AIMD_NPT.json, 49 structures, weights=[1. 1. 0.]
snap-master/Mo/training/AIMD_NVT.json, 120 structures, weights=[1. 1. 0.]
snap-master/Ni/training/Elastic.json, 121 structures, weights=[10.  0.  1.]
snap-master/Ni/training/Surface.json, 13 structures, weights=[1. 1. 0.]
snap-master/Ni/training/Vacancy.json, 40 structures, weights=[1. 1. 0.]
snap-master/Ni/training/Surface_test.json, 7 structures, weights=[1. 1. 0.]
snap-master/Ni/training/AIMD.json, 280 structures, weights=[1. 1. 0.]
snap-master/Ni-Mo/training/Mo_dopedwith_Ni.json, 918 structures, weights=[1. 1. 0.]
snap-master/Ni-Mo/training/Ni3Mo_AIMD.json, 200 structures, weights=[1. 1. 0.]
snap-master/Ni-Mo/training/Ni_dopedwit

#### 1. SNAP

Merge these extxyz files to a single `extxyz` file.

In [6]:
if exists('snap.extxyz'):
    os.remove('snap.extxyz')
all_files = glob.glob('*.*.extxyz')
filename = 'snap.extxyz' 
with open(filename, 'w') as fp:    
    for idx, afile in enumerate(all_files):
        print(afile)
        with open(afile, 'r') as obj:
            fp.write(obj.read())
all_files.append(filename)
read_file(filename, units={'stress': 'kbar'}, verbose=True)

NiMo.Ni3Mo_AIMD.extxyz
NiMo.Ni4Mo_Elastic.extxyz
Mo.AIMD_NVT.extxyz
Mo.Elastic.extxyz
NiMo.Ni4Mo_AIMD.extxyz
Mo.GB.extxyz
Ni.Surface_test.extxyz
Mo.Vacancy.extxyz
NiMo.Ni_dopedwith_Mo.extxyz
Ni.Surface.extxyz
NiMo.Mo_dopedwith_Ni.extxyz
Ni.AIMD.extxyz
Ni.Elastic.extxyz
Mo.AIMD_NPT.extxyz
Ni.Vacancy.extxyz
NiMo.Ni3Mo_Elastic.extxyz
Mo.Surface.extxyz
Extract cartesian coordinates ...
Progress:    3900 /      -1 | Speed = 310.6
Total 3973 structures, time: 12.812 sec


<ase.db.sqlite.SQLite3Database at 0x1c35812400>

#### 2. Ni

Create a **Ni** `extxyz` file.

In [7]:
all_Ni_files = glob.glob('Ni.*.extxyz')
filename = 'snap-Ni.extxyz' 
with open(filename, 'w') as fp:    
    for afile in all_Ni_files:
        print(afile)
        with open(afile, 'r') as obj:
            fp.write(obj.read())
all_files.append(filename)
read_file(filename, units={'stress': 'kbar'}, verbose=True)

Ni.Surface_test.extxyz
Ni.Surface.extxyz
Ni.AIMD.extxyz
Ni.Elastic.extxyz
Ni.Vacancy.extxyz
Extract cartesian coordinates ...
Progress:     400 /      -1 | Speed = 288.7
Total 461 structures, time: 1.610 sec


<ase.db.sqlite.SQLite3Database at 0x1c3b08f2b0>

#### 3. Mo

Create a **Mo** `extxyz` file.

In [8]:
all_Mo_files = glob.glob('Mo.*.extxyz')
filename = 'snap-Mo.extxyz'
with open(filename, 'w') as fp:    
    for afile in all_Mo_files:
        print(afile)
        with open(afile, 'r') as obj:
            fp.write(obj.read())
all_files.append(filename)
read_file(filename, units={'stress': 'kbar'}, verbose=True)

Mo.AIMD_NVT.extxyz
Mo.Elastic.extxyz
Mo.GB.extxyz
Mo.Vacancy.extxyz
Mo.AIMD_NPT.extxyz
Mo.Surface.extxyz
Extract cartesian coordinates ...
Progress:     200 /      -1 | Speed = 317.8
Total 284 structures, time: 0.878 sec


<ase.db.sqlite.SQLite3Database at 0x1c3b08f400>

#### 4. Ni$_3$Mo

In [9]:
all_Ni3Mo_files = glob.glob("NiMo.Ni3Mo*.extxyz")
filename = 'snap-Ni3Mo.extxyz'
with open(filename, 'w') as fp:    
    for afile in all_Ni3Mo_files:
        print(afile)
        with open(afile, 'r') as obj:
            fp.write(obj.read())
all_files.append(filename)
read_file(filename, units={'stress': 'kbar'}, verbose=True)

NiMo.Ni3Mo_AIMD.extxyz
NiMo.Ni3Mo_Elastic.extxyz
Extract cartesian coordinates ...
Progress:     300 /      -1 | Speed = 278.3
Total 321 structures, time: 1.156 sec


<ase.db.sqlite.SQLite3Database at 0x1c34fbcb38>

#### 5. Ni$_4$Mo

In [10]:
all_Ni4Mo_files = glob.glob("NiMo.Ni4Mo*.extxyz")
filename = 'snap-Ni4Mo.extxyz'
with open(filename, 'w') as fp:    
    for afile in all_Ni4Mo_files:
        print(afile)
        with open(afile, 'r') as obj:
            fp.write(obj.read())
all_files.append(filename)
read_file(filename, units={'stress': 'kbar'}, verbose=True)

NiMo.Ni4Mo_Elastic.extxyz
NiMo.Ni4Mo_AIMD.extxyz
Extract cartesian coordinates ...
Progress:     300 /      -1 | Speed = 281.0
Total 321 structures, time: 1.145 sec


<ase.db.sqlite.SQLite3Database at 0x1c3b08f4a8>

#### 6. Ni$_\mathrm{Mo}$

In [11]:
all_NidMo_files = glob.glob("NiMo.Ni_dopedwith_Mo.extxyz")
filename = 'snap-NidMo.extxyz'
with open(filename, 'w') as fp:    
    for afile in all_NidMo_files:
        print(afile)
        with open(afile, 'r') as obj:
            fp.write(obj.read())
all_files.append(filename)
read_file(filename, units={'stress': 'kbar'}, verbose=True)

NiMo.Ni_dopedwith_Mo.extxyz
Extract cartesian coordinates ...
Progress:    1600 /      -1 | Speed = 318.8
Total 1668 structures, time: 5.233 sec


<ase.db.sqlite.SQLite3Database at 0x1c358b40b8>

#### 7. Mo$_\mathrm{Ni}$

In [12]:
all_ModNi_files = glob.glob("NiMo.Mo_dopedwith_Ni.extxyz")
filename = 'snap-ModNi.extxyz' 
with open(filename, 'w') as fp:    
    for afile in all_ModNi_files:
        print(afile)
        with open(afile, 'r') as obj:
            fp.write(obj.read())
all_files.append(filename)
read_file(filename, units={'stress': 'kbar'}, verbose=True)

NiMo.Mo_dopedwith_Ni.extxyz
Extract cartesian coordinates ...
Progress:     900 /      -1 | Speed = 319.5
Total 918 structures, time: 2.874 sec


<ase.db.sqlite.SQLite3Database at 0x1c358b4ac8>

#### Clean up

Delete unnecessary files.

In [13]:
for afile in all_files:
    if exists(afile):
        os.remove(afile)
if exists(local_dir):
    shutil.rmtree(local_dir, ignore_errors=True)