In [None]:
# run to pretty-print results
import numpy as np

np.set_printoptions(precision=5, suppress=True)

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

The amd package contains functions for caluclating and comparing AMDs/PDDs, as well as .cif reading functionality.

Get the package with pip:
```shell
$ pip install average-minimum-distance
```

**Import average-minimum-distance by running**

In [None]:
import amd

# Reading cifs

To read cifs with amd, use an ```amd.CifReader``` object (the .cifs in this notebook can be found in the tests folder of this project). The CifReader is an iterator which can be looped over to get all structures from a file. If a structure cannot be read or is 'bad', by default it is skipped by the reader and a warning is printed. The reader yields ```PeriodicSet``` objects, which can be handed directly to the AMD/PDD calculator functions below. 

In [None]:
# one cif, many structures
reader = amd.CifReader('T2_experimental.cif')
for periodic_set in reader:
    print(periodic_set.name, periodic_set.motif.shape[0])   # print name & number of motif points
    periodic_set.cell                                       # access unit cell

# if you don't care about lazy reading and just want a list of structures
exp_structures = list(amd.CifReader('T2_experimental.cif'))

Setting ```remove_hydrogens``` to True will not include Hydrogen atoms:

In [None]:
# one cif, one structure

gamma_withH = list(amd.CifReader('T2_gamma.cif'))[0]
print(gamma_withH.motif.shape[0])   # n motif points
gamma_noH = list(amd.CifReader('T2_gamma.cif', remove_hydrogens=True))[0]
print(gamma_noH.motif.shape[0])     # n motif points

# Calculating AMDs and PDDs

The main calculator functions ```amd.AMD``` and ```amd.PDD``` accept two arguments, a crystal and an integer ```k```. The crystal can be either an output of a CifReader (or CSDReader), or you can manually make a pair of numpy arrays (motif, cell).

In [None]:
k = 100

# one AMD from a .cif with one structure
gamma = list(amd.CifReader('T2_gamma.cif'))[0]
gamma_amd = amd.AMD(gamma, k) 
print(gamma_amd)

# list of amds from a .cif with many structures
exp_structures = amd.CifReader('T2_experimental.cif')
experimental_amds = [amd.AMD(periodic_set, k) for periodic_set in exp_structures]

In [None]:
k = 100

# the pdd interface is the same as amd
gamma = list(amd.CifReader('T2_gamma.cif'))[0]
print(gamma)
print('unit cell:\n', gamma.cell)
print('5 points of the motif:\n', gamma.motif[:5])
gamma_pdd = amd.PDD(gamma, k)

print('PDD:\n', gamma_pdd)

In [None]:
# creating a motif, cell pair manually

import numpy as np

k = 100

# from a tuple (motif, cell) of numpy arrays
motif = np.array([[0,0,0]]) # one point at the origin
cell = np.identity(3)       # unit cell = identity (cube with unit edges)
cubic_pdd = amd.PDD((motif, cell), k)

print(cubic_pdd)

# Comparing AMDs and PDDs

There are several functions which compare by AMD or PDD. The names 'cdist' and 'pdist' come from scipy: cdist is for comparing one set against another, whereas pdist takes one set and compares it pairwise.  So to compare with AMDs, use ```amd.AMD_pdist``` or ```amd.AMD_cdist``` and to compare with PDDs use ```amd.PDD_pdist``` or ```amd.PDD_cdist```. Example:

In [None]:
# comapres T2-gamma to all experimental structures by AMD, k=100.

import numpy as np

exp_structures = list(amd.CifReader('T2_experimental.cif'))
gamma = list(amd.CifReader('T2_gamma.cif'))[0]

k = 100
exp_amds = [amd.AMD(s, k) for s in exp_structures]
gamma_amd = amd.AMD(gamma, k)

dm = amd.AMD_cdist(gamma_amd, exp_amds)  # compare gamma to exp structures
closest_indices = np.argsort(dm)[0]      # get indices of smallest elements i.e closest
for i in closest_indices:
    print(exp_structures[i].name, dm[0][i])  # print structs in order of ditance


The output of these functions mimic scipy's [cdist](scipy.spatial.distance.cdist) and [pdist](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html), cdist returns a 2D distance matrix and pdist returns a 'condensed distance vector' ([see scipy's squareform](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.squareform.html)).

These functions take the same optional arguments. If you want to change the defaults to see the effect on comparisons, details are in the readme and documentation. Briefly, ```metric``` (default 'chebyshev', aka l-infinity) is the metric used to compare AMDs or the rows of PDDs, it can be any metric accepted by [scipy's cdist function](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html). ```k``` (default None) is the value of k to use when comparing, it just truncates the inputs to a smaller length if an int, but can also be a ```range``` which compares over a range of k and collapses the resulting sequences of distances with a norm optionally specified with ```ord```, which accepts any ```ord``` accepted by [numpy's norm function](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html). 

 The functions that use PDD also take a parameter ```verbose``` which if true will print an ETA to the terminal.

In [None]:
k = 100

exp_structures = list(amd.CifReader('T2_experimental.cif'))
experimental_pdds = [amd.PDD(periodic_set, k) for periodic_set in exp_structures]

# compare experimental structures pairwise
# returns a condensed distance matrix
cdm = amd.PDD_pdist(experimental_amds)