### Modeling of Cluster Distributions

The objectives of this demo are to 1) introduce the four goals that we would like to achieve by developing this cluster counting module, 2) explain the algorithms for each goal, and 3) demonstrate the procedures to achieve each goal with examples. This report includes five parts as below.

1. Introduction
2. Goal One - Identify Distinct Clusters (understand the lattice/structure)
3. Goal Two - Count Clusters
4. Goal Three - Generate Random Structures With/Without Rules
5. Goal Four - Titrate Clusters

### Introduction

The reactivity of meterials always depends on the distribution of important groups of atoms. Here we will refer the important groups of atoms as clusters. Given the bulk material property, such as the element ratio, we want to compute the distributions of the clusters in order to quantitatively understand the reactivity. The ultimate goal of this cluster counting module is to statistically compute the cluster distributions for different conditions, such as different crystal structures, different rules for atom locations, and different counting and titration priorities. This ultimate goal can be further divided into four specific goals. First, given a crystal structure, we want to understand the lattice and be able to identify distinct clusters. Next, we want to be able to count different type of clusters given a structure configuration (which specifies the atom type at each site). Thirdly, we would like to generate randome structure configurations for any given element ratio. Last, we want to titrate clusters one by one to avoid double counting clusters that share sites with each other. 

### Goal One - Identify Distinct Clusters (understand the lattice/structure)

Algorithm 

To understand the distribution of clusters, we first need to distinguish between different clusters. Clusters differ from each other based on atom distances and symmetries. 

To achieve this goal, we took advantage of the Alloy Theoretic Automated Toolkit (ATAT) developed by Axel van de Walle. In ATAT, the corrdump program takes lattice parameters and site positions as input, determines the space group of the lattice, and find all symmetrically distinct clusters based on the space group. When analyzing distinct clusters, corrdump only count for sites which are possbile for at least two types of elements. The sites which can accommodate only one type of element will only help with analyzing symmetries. The installation and modification of ATAT can be found in supporting information.

Input:
1. lat.in  
2. str_dim.txt

Procedures:
1. Run the two python files (classes.py and utilities.py) which contain the useful functions in jupyter notebook (or import the them in a python file).
        %run classes.py
        %run utilities.py
        (from classes import * )
        (from utilities import *)
2. Initialize a class of Lattice with lat.in: 
        Lattice(folder_path for lat.in)
3. Run corrdump program to generate clusters.out file containing the information for each cluster type: 
        corrdump -l=[lat.in file path] -cf=[clusters.out file path] -2=[max distance for 2-body cluster] -3=[max distance for 3-body cluster]
4. Visualize the cluster example given by corrdump for each type: 
        lattice.read_clusters_out()
        lattice.visualize_cluster(cluster_type)
5. Initialize a class of Structure with lattice parameters and structure dimensions, and prepare str.out: 
        Structure(lattice, folder_path for lat.in and str_dim.txt)
        structure.prepare_str_out()
6. Run corrdump program to generate a full list of clusters for a super cell defined by the structure dimensions:
        corrdump -l=[lat.in file path] -s=[str.out file path] -cf=[clusters.out file path] -2=[max distance for 2-body cluster] -3=[max distance for 3-body cluster] >> [cluster_list.csv file path]
7. Read the full cluster list and visulaize clusters for each type:
        structure.read_cluster_list()
        structure.visualize_one_cluster_one_example()

Output:
1. A lattice class which contains:  
1) lattice parameters: a, b, c, alpha, beta, gamma  
2) lattice constants: u, v, w  
3) sites information: site index, atom types, xyz coordinates, fractional coordinates  
4) cluster types: number of atoms in the cluster, maximum distance between two atoms in the cluster, multiplicity in one lattice unit cell, one example cluster represented in fractional coordinates for each type.  
        
2. A structure class which contains:   
1) lattice parameters: a, b, c, alpha, beta, gamma  
2) lattice constants: u, v, w  
3) structure constants: nu, nv, nw  
4) sites information stored in a dataframe: site index, atom types, xyz coordinates, fractional coordinates  
5) cluster types: all clusters represented in fractional coordinates, in xyz coordinates in site indices for each type in one super cell.  

Example:

In [10]:
#import other useful packages
import pandas as pd
import numpy as np
import os
from ase import Atoms
from ase.io import read, write
from ase.visualize import view

In [11]:
#run the two python files
%run classes.py
%run utilities.py

In [12]:
#prepare lat.in and str_dim.txt for simple cube and put them in the folder called simple_cube 
folder_path = 'jobs'
#initialize a class of Lattice with lat.in:
lattice = Lattice(folder_path)

In [13]:
#set the maxmum distances between 2 atoms in 2-body clusters and that in 3-body clusters
maxdis_2 = 11
#maxdis_3 = 15
#run corrdump to generate clusters.out file in terminal:
#the folder path for lat.in and clusters.out has been specified before
#the return of this line of code is either 0 or 256: 0 means no error message, and 256 means there is at least one error message; you can see the error messages in the terminal; if there is no str.out file, it should return 256 and there should be an error message (Unable to open structure file) in terminal; that's fine.
os.system('corrdump -l={0}/lat.in -cf={0}/clusters.out -2={1}'.format(folder_path, maxdis_2))

256

In [14]:
#read clusters.out
lattice.read_clusters_out()

In [15]:
lattice.cluster_type_numbers

[1, 1, 51, 0, 0, 0, 0, 0, 0, 0]

In [16]:
lattice.clusters['2-43']

{'eg_frac': [array([0.66697, 0.56003, 1.22823]),
  array([1.33303, 0.89307, 0.77177])],
 'm': 18,
 'max_d': 10.37589}

In [19]:
cluster_type='2-19'
lattice.visualize_cluster(cluster_type)
c= read(folder_path+'/lattice_clusters/xyzs/cluster-{}.xyz'.format(cluster_type))

In [39]:
#visualize the cluster example given by corrdump for one type
for i in range (1,41):
    cluster_type='2-'+str(i)
    lattice.visualize_cluster(cluster_type)
    c= read(folder_path+'/lattice_clusters/xyzs/cluster-{}.xyz'.format(cluster_type))

In [20]:
#initialize a class of Structure with lattice parameters and structure dimensions
structure = Structure(lattice=lattice, folder_path=folder_path)
structure.prepare_str_out()

In [21]:
#run corrdump program in terminal to generate a full list of clusters for a super cell defined by the structure dimensions; again, the return of this line of code is either 0 or 256: 0 means no error message, and 256 means there is at least one error message; you can see the error messages in the terminal

os.system('corrdump -l={0}/lat.in -s={0}/str.out -cf={0}/clusters.out -2={1} >> {0}/cluster_list.csv'.format(folder_path, maxdis_2))

0

In [22]:
#read the full cluster list and visulaize clusters for each type
structure.read_cluster_list()

In [23]:
#create xyz and image files for clusters in a specific type
cluster_type='2-11'
structure.visualize_one_cluster_type_all_examples(cluster_type)

In [24]:
import pickle
pickle.dump(lattice,open(folder_path+"/lattice.p","wb"))

In [25]:
pickle.dump(structure,open(folder_path+"/structure.p","wb"))

In [26]:
penalty={'2-1':20,'2-2':20,'2-3':20,'2-4':20 }

In [27]:
pickle.dump(penalty,open(folder_path+"/penalty.p","wb"))