# Find Similar Compounds
Given our list of "interesting" compounds, see if we can find any similar stable compounds in the OQMD

In [1]:
from pymatgen import Composition
import pandas as pd
import json

## Load in Stable Compounds from OQMD
Reading from the datafile that was used to generate the training set for the DL model.

In [2]:
oqmd_all = pd.read_csv('oqmd_all.txt', delim_whitespace=True)
print('Read %d entries'%len(oqmd_all))

Read 506114 entries


  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
oqmd_all['stability'] = pd.to_numeric(oqmd_all['stability'], 'coerce')

In [4]:
oqmd_all.query('stability <= 0', inplace=True)
print('%d stable compounds'%len(oqmd_all))

21947 stable compounds


### Generate Lookup Values for Each Entry
Classify each entry by the stoichiometry and group of each element. Examples:
- NaCl is 50% of a group 1 element and 50% of group 17
- NaKCl2 is 25% of two different group 1 elements and 50% of a group 17 element

In [5]:
oqmd_all['comp_obj'] = [Composition(x) for x in oqmd_all['comp']]

Compute lookup values

In [6]:
def get_prototype(comp):
    return tuple(sorted((e.group, y) for e,y in comp.fractional_composition.items())) 

In [7]:
oqmd_all['prototype'] = oqmd_all['comp_obj'].apply(get_prototype)

Get list of examples for each prototype

In [8]:
prototypes = dict([(x,[c.get_integer_formula_and_factor()[0] for c in group['comp_obj']]) 
                   for x,group in oqmd_all.groupby('prototype')])

  % self.symbol)
  % self.symbol)
  % self.symbol)


In [9]:
print('Found %d prototypes'%len(prototypes))

Found 9211 prototypes


## Find if Interesting Compositions are Similar to those in the OQMD
Use the prototype list we worked up earlier

In [10]:
interesting_list = json.load(open('interesting_compounds.list'))

In [11]:
interesting_list = pd.DataFrame({'composition': interesting_list})

In [12]:
interesting_list['comp_obj'] = [Composition(x) for x in interesting_list['composition']]

In [13]:
interesting_list['prototype'] = interesting_list['comp_obj'].apply(get_prototype)

In [14]:
interesting_list['similiar'] = [prototypes.get(x,[]) for x in interesting_list['prototype']]

The following table shows similar compounds to those from our DL predictions. Each example "similar" structure is a stable compound in the OQMD

In [15]:
interesting_list

Unnamed: 0,composition,comp_obj,prototype,similiar
0,KSc2Br7,"(K, Sc, Br)","((1, 0.1), (3, 0.2), (17, 0.7))",[KSc2F7]
1,KHfBr5,"(K, Hf, Br)","((1, 0.14285714285714285), (4, 0.1428571428571...",[RbHfF5]
2,CsNa2CdF4,"(Cs, Na, Cd, F)","((1, 0.125), (1, 0.25), (12, 0.125), (17, 0.5))",[]
3,Na2CrPbF5,"(Na, Cr, Pb, F)","((1, 0.2222222222222222), (6, 0.11111111111111...",[]
4,K2W2N5,"(K, W, N)","((1, 0.2222222222222222), (6, 0.22222222222222...",[]
5,LiTi4N5,"(Li, Ti, N)","((1, 0.1), (4, 0.4), (15, 0.5))",[]
6,Ba3NaPtO4,"(Ba, Na, Pt, O)","((1, 0.1111111111111111), (2, 0.33333333333333...",[]
7,K2P(WN2)2,"(K, P, W, N)","((1, 0.2222222222222222), (6, 0.22222222222222...",[]
8,Sc2SeBr5,"(Sc, Se, Br)","((3, 0.25), (16, 0.125), (17, 0.625))",[]
9,Sc3SBr6,"(Sc, S, Br)","((3, 0.3), (16, 0.1), (17, 0.6))",[]
