# SNAP Database Tool

This notebook demonstrates how to convert 
**SNAP/[json](http://pymatgen.org/_modules/pymatgen/core/structure.html#IMolecule.as_dict)** files
to 
**ASE/[extxyz](https://wiki.fysik.dtu.dk/ase/ase/io/io.html#ase.io.write)** files.

* Author: Xin Chen
* Email: Bismarrck@me.com
* Date: Jan 10, 2019
* Reference: [PHYSICAL REVIEW B 98, 094104 (2018)](https://journals.aps.org/prb/abstract/10.1103/PhysRevB.98.094104)

In [1]:
from __future__ import print_function

In [2]:
import zipfile
import shutil
import json
import glob
import os

from os.path import join, basename, splitext, exists
from pymatgen.core.structure import Structure
from ase.calculators.singlepoint import SinglePointCalculator
from ase import Atoms
from ase.io import write, read
from typing import List

Below is the function to convert a list of `dict` to `ase.Atoms` objects and then
write the trajectory to an `extxyz` files.

In [3]:
def convert_pymatgen_json_to_extxyz(list_of_dicts: List[dict], output_file):
    """
    Convert the JSON-serialized structures in `list_of_dicts` to `Atoms` 
    objects and write the trajectory to an `extxyz` file.
    
    Parameters
    ----------
    list_of_dicts : List[dict]
        A list of JSON-serialized dicts.
    output_file : str
        The output file to write.
    
    """
    trajectory = []
    for i in range(len(list_of_dicts)):
        structure = Structure.from_dict(list_of_dicts[i]['structure'])
        atoms = Atoms(list(map(str, structure.species)), 
                      structure.cart_coords, 
                      cell=structure.lattice.matrix,
                      pbc=[True, True, True])
        if 'outputs' in list_of_dicts[i]:
            calc = SinglePointCalculator(atoms, **list_of_dicts[i]['outputs'])
            atoms.calc = calc
            params = list_of_dicts[i]['params']
            params.pop('pp')
            atoms.info = dict(tags=list_of_dicts[i]['tags'], **params)
        else:
            # The Mo dataset was created separately so the format is different.
            data = list_of_dicts[i]['data']
            results = {
                'energy': data['energy_per_atom'] * len(atoms),
                'forces': data['forces'],
                'stress': data['virial_stress'],
            }
            calc = SinglePointCalculator(atoms, **results)
            atoms.calc = calc
        trajectory.append(atoms)
    write(output_file, trajectory, format='extxyz')

Generate the extxyz files.

There are five types of subsets:

* Cu : aimd, elastic, md, surface, vacancy
* Mo : aimd (npt), aimd (nvt), elastic, gb, surface, vacancy, doped_with_ni
* Ni : aimd, elastic, surface, vacancy, surface (test), doped_with_Mo
* Ni3Mo : aimd, elastic
* Ni4Mo : aimd, elastic

In [4]:
if not exists('snap'):
    with zipfile.ZipFile('snap.zip', 'r') as zip_ref:
        zip_ref.extractall('.')
    if exists('__MACOSX'):
        shutil.rmtree('__MACOSX')
        
sources = ['Mo', 'Ni', 'Ni-Mo']

for index, source in enumerate(sources):
    training_dir = join('snap', source, 'training')
    list_of_files = list(glob.glob(f"{training_dir}/*.json"))
    for afile in list_of_files:
        if index == 3:
            output_file = f"{splitext(basename(afile))[0]}.extxyz"
        else:
            tag = splitext(basename(afile))[0]
            output_file = f"{source.replace('-', '')}.{tag}.extxyz"
        with open(afile) as fp:
            try:
                convert_pymatgen_json_to_extxyz(json.load(fp), output_file)
            except Exception as excp:
                print(f"Failed to proceed {afile}: {str(excp)}")

Merge these extxyz files to a single `extxyz` file.

In [5]:
if exists('snap.extxyz'):
    os.remove('snap.extxyz')
all_files = glob.glob('*.extxyz')
with open('snap.extxyz', 'w') as fp:    
    for afile in all_files:
        with open(afile, 'r') as obj:
            fp.write(obj.read())

Create a **Ni** `extxyz` file.

In [6]:
all_files = glob.glob('Ni.*.extxyz')
with open('snap-Ni.extxyz', 'w') as fp:    
    for afile in all_files:
        print(afile)
        with open(afile, 'r') as obj:
            fp.write(obj.read())

Ni.Surface_test.extxyz
Ni.Surface.extxyz
Ni.AIMD.extxyz
Ni.Elastic.extxyz
Ni.Vacancy.extxyz
