# Basic example of how to use the rylm package to identify the structure of a set of points

Let us start by importing numpy and the key classes we will use from the module

In [1]:
import numpy as np
from rylm.rylm import Rylm, Similarity

As a first example, define a set of test points in an ideal square planar arrangement. 

Note, the first point is assumed to be the origin of the set of points

In [2]:
test_points = np.array(
    [
        [0.0, 0.0, 0.0],
        [1.0, 0.0, 0.0],
        [0.0, 1.0, 0.0],
        [-1.0, 0.0, 0.0],
        [0.0, -1.0, 0.0],
    ]
)

## Calculating the Rylm fingerprint

Next, we will instantiate the `Rylm` class that is used to calculate the spherical harmonics fingerprint of the system.  

By default
- the frequencies used are, `frequencies=[4, 6, 8, 10, 12]`
- we include the wigner3j values, `include_w=True`
- we include the coordination number in the fingerprint, `include_n_coord=True`

Inlcuding the coordination number can be useful to help differentiate some structures that have otherwise similar geometries. 

In [3]:
rylm = Rylm(include_n_coord=True, include_w=True, frequencies=[4, 6, 8, 10, 12])

To calculate the fingerprint, we pass the test points to the `calculate` function

In [4]:
fingerprint_test = rylm.calculate(test_points)

The fingerprint is a dataclass that contains some of the useful metadata about the fingerprint, so that when comparing fingerprints we can check they computed on an equivalent basis.

The values of the fingerprint itself are stored in a dictionary called `values` within the fingerprint object

In [7]:
fingerprint_test

Fingerprint(frequencies=[4, 6, 8, 10, 12], include_w=True, include_n_coord=True, values={'q4': np.float32(0.82915616), 'w4': np.float32(0.124970965), 'q6': np.float32(0.5863019), 'w6': np.float32(-0.0072146733), 'q8': np.float32(0.7979466), 'w8': np.float32(0.06380072), 'q10': np.float32(0.61396503), 'w10': np.float32(-0.016309233), 'q12': np.float32(0.78281087), 'w12': np.float32(0.042356532), 'n_coord': 4})

## Calculating similarity

Now we can compare the fingerprint of the test points to known fingerprints. For this example, we will compare to fingerprints of square planar, tetrahedral, and octahedral structures.

I've included a few different test cases in the data folder. We will load these in, calculate their fingprints, and store them in a dict.

In [9]:
from rylm.data import structures as struct


tetrahedral_fingerprint = rylm.calculate(struct.tetrahedron)
square_planar_fingerprint = rylm.calculate(struct.square_planar)
octahedral_fingerprint = rylm.calculate(struct.octahedron)

library_of_fingerprints = {
    "tetrahedral": tetrahedral_fingerprint,
    "square planar": square_planar_fingerprint,
    "octahedral": octahedral_fingerprint,
}

In the original Rylm paper, we used the Euclidean distance metric to compare fingerprints, where a value of 0 is a perfect match. 

Here we will instantiate the `Similarity` class to calculate the similarity between fingerprints.  

By  default the `Similarity` uses:
- the Euclidian distance, `metric=`euclidean` (as that is currently the only function I've added so far)
- normalizes the values between 0 and 1, based on the magnitude of the fingerprints, `normalize=True`

In [10]:
similarity_metric = Similarity(metric="euclidean", normalize=True)

To calculate the similarity, we use call the `compute` function and pass the test and known fingerprint.  

Here, we will loop over the entries in the `library_of_fingerprints` dict we defined above, keeping track of the best match (i.e., lowest value) 

In [11]:
best_match = {"value": -1, "name": "none"}
for key, fingerprint in library_of_fingerprints.items():
    value = similarity_metric.calculate(fingerprint_test, fingerprint)
    print(f"Similarity between test points and {key} structure: {value}")
    if best_match["value"] == -1 or value < best_match["value"]:
        best_match["value"] = value
        best_match["name"] = key
print("\n")
print(
    f"Best match for test points is {best_match['name']} with similarity {best_match['value']}"
)

Similarity between test points and tetrahedral structure: 0.05594049149262718
Similarity between test points and square planar structure: 0.0
Similarity between test points and octahedral structure: 0.1184366611049144


Best match for test points is square planar with similarity 0.0


Not surprisingly, we have the best match with the square planar structure, given that we defined and idealized structure that matches that in the library.

Note, the Rylm fingerprint is invariant to rotation. Defining the square planar structure in a slightly different orientation (e.g., in the x-z plane, rather than x-y) is equivalent.

In [12]:
test_points = np.array(
    [
        [0.0, 0.0, 0.0],
        [1.0, 0.0, 0.0],
        [0.0, 0.0, 1.0],
        [-1.0, 0.0, 0.0],
        [0.0, 0.0, -1.0],
    ]
)

In [14]:
fingerprint_test = rylm.calculate(test_points)

value = similarity_metric.calculate(fingerprint_test, square_planar_fingerprint)
print(value)

2.7923603013628186e-08


## Non-ideal structures

The use of spherical harmonics is particularly useful when systems have some level of noise/non-ideality, as it can often be somewhat insensitive to small perturbation.  Additionally, we can always add structures to our library that have some predetermined level of noise/non-ideality. 

To show this, the following takes the square planar structure and applies a random perturbation to the points.

In [15]:
np.random.seed(42)  # for reproducibility

# To better test, we can perturb the points slightly and see if the best match remains the same
perturbed_test_points = test_points + np.random.normal(0, 0.05, test_points.shape)

print(perturbed_test_points)

[[ 0.02483571 -0.00691322  0.03238443]
 [ 1.07615149 -0.01170767 -0.01170685]
 [ 0.07896064  0.03837174  0.97652628]
 [-0.972872   -0.02317088 -0.02328649]
 [ 0.01209811 -0.09566401 -1.08624589]]


We can calculate the fingerprint as before and compare to the known structures, where we still identify this structure as square planar.

In [16]:
fingerprint_perturbed = rylm.calculate(perturbed_test_points)

best_match_perturbed = {"value": -1, "name": "none"}
print("\nComparing perturbed test points to known structures:")
for key, fingerprint in library_of_fingerprints.items():
    value = similarity_metric.calculate(fingerprint_perturbed, fingerprint)
    print(f"Similarity between perturbed test points and {key} structure: {value}")
    if best_match_perturbed["value"] == -1 or value < best_match_perturbed["value"]:
        best_match_perturbed["value"] = value
        best_match_perturbed["name"] = key
print("\n")
print(
    f"Best match for perturbed test points is {best_match_perturbed['name']} with similarity {best_match_perturbed['value']}"
)


Comparing perturbed test points to known structures:
Similarity between perturbed test points and tetrahedral structure: 0.052710335039327726
Similarity between perturbed test points and square planar structure: 0.004640672707834218
Similarity between perturbed test points and octahedral structure: 0.1189650277141725


Best match for perturbed test points is square planar with similarity 0.004640672707834218


Below includes two realistic structures taken from systems in the tmqm dataset.  Visually, I identify these as square planar and octahedral, although both are slightly non-ideal.

In [17]:
# should be square planar, identifier ABAFOZ
test_points = np.array(
    [
        [0.30918241, 0.42624341, 0.37986646],
        [0.1073847, 0.40108952, 0.45357848],
        [0.23021752, 0.61345033, 0.30564696],
        [0.50053981, 0.49386629, 0.33569512],
        [0.34640283, 0.22676314, 0.42446571],
    ]
)

fingerprint_test1 = rylm.calculate(test_points)

# The follow visually appears octahedral, identifier CECBES
test_points2 = np.array(
    [
        [1.02725949, 0.13879648, 0.38548378],
        [1.14499885, 0.23049694, 0.54779695],
        [1.22150795, 0.08531673, 0.30058462],
        [1.0110957, 0.32119666, 0.30138697],
        [0.9162874, 0.07910506, 0.22024935],
        [1.01663292, -0.04125328, 0.47262867],
        [0.83552833, 0.17913083, 0.47278508],
    ]
)

fingerprint_test2 = rylm.calculate(test_points2)

fingerprints = {"test_system1": fingerprint_test1, "test_system2": fingerprint_test2}


Loop over the two fingerprint and the known library.

In [19]:

for name, fingerprint in fingerprints.items():
    best_match = {"value": -1, "name": "none"}
    for key, ref_fingerprint in library_of_fingerprints.items():
        value = similarity_metric.calculate(fingerprint, ref_fingerprint)
        print(f"Similarity between {name} and {key} structure: {value}")
        if best_match["value"] == -1 or value < best_match["value"]:
            best_match["value"] = value
            best_match["name"] = key
    print(
        f"\nBest match for {name} is {best_match['name']} with similarity {best_match['value']}\n"
    )


Similarity between test_system1 and tetrahedral structure: 0.04538574365906486
Similarity between test_system1 and square planar structure: 0.02940211971267979
Similarity between test_system1 and octahedral structure: 0.12462587535679397

Best match for test_system1 is square planar with similarity 0.02940211971267979

Similarity between test_system2 and tetrahedral structure: 0.13454565970404667
Similarity between test_system2 and square planar structure: 0.12068986527982108
Similarity between test_system2 and octahedral structure: 0.00798987306621414

Best match for test_system2 is octahedral with similarity 0.00798987306621414

