### Generate a list of all SMACT allowed compositions
This notebook provides a simple demo of how to use `SMACT` to generate a list of element compositions that could later be used as input for a machine learned or other heuristic screening model.

This example generates a list of allowed ternary oxides.

In [41]:
### Imports

import smact
import smact.screening as screening
from datetime import datetime
import itertools
import multiprocessing

In [49]:
### Define the elements we are interested in

all_el = smact.element_dictionary()   # A dictionary of all element objects
symbol_list = [k for k,i in all_el.items()]   # A list of all element symbols

# Decide which elements you want to exclude (e.g. based on radioactivity, toxicity etc..)
do_not_want = ['H', 'He', 'B', 'C', 'O', 'Ne', 'Ar', 'Kr', 'Tc', 'Xe', 'Rn',
              'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk', 'Cf', 'Es', 'Fm', 'Md', 'No', 'Lr',
              'Ra', 'Fr', 'At', 'Po', 'Pm', 'Eu', 'Tb', 'Yb']
symbols = [x for x in symbol_list if x not in do_not_want]
used_elements = [all_el[x] for x in symbols]

In [51]:
### Define a function to perform the SMACT test

def smact_test(els):
    all_compounds = []
    elements = [e.symbol for e in els] + ['O']    # We tack O on because we want oxides
    paul_a, paul_b = els[0].pauling_eneg, els[1].pauling_eneg
    electronegativities = [paul_a, paul_b, all_el['O'].pauling_eneg]
    
    for ox_a, ox_b in itertools.product(els[0].oxidation_states, els[1].oxidation_states):
        ox_states = [ox_a, ox_b, -2]   # We hard code the oxidation state of O we want
        # Charge balance test
        cn_e, cn_r = smact.neutral_ratios(ox_states, threshold = 8)
        if cn_e:
            # Electronegativity test
            electroneg_OK = screening.pauling_test(ox_states, electronegativities)
            if electroneg_OK:
                compound = tuple([elements,cn_r[0]])
                all_compounds.append(compound)
    return all_compounds

In [62]:
### Generate the list of element compositions

# Use itertools combinations to generate all the ternary oxide chemical systems
all_el_combos = itertools.combinations(used_elements, 2)

# Use multiprocessing to generate our list
start = datetime.now()
p = multiprocessing.Pool()
result = p.map(smact_test, all_el_combos)
print('Time taken to generate list:  {0}'.format(datetime.now()-start))

Time taken to generate list:  0:00:05.612067


In [72]:
# Flatten the list of lists
flat_list = [item for sublist in result for item in sublist]
print('Number of compositions:  {0}'.format(len(flat_list)))
print('Each list entry looks like this, with elements followed by stoichiometries of each: ')
for i in flat_list[:5]:
    print(i)

Number of compositions:  38922
Each list entry looks like this, with elements followed by stoichiometries of each: 
(['Li', 'Be', 'O'], (1, 1, 1))
(['Li', 'Be', 'O'], (2, 1, 2))
(['Li', 'N', 'O'], (5, 1, 1))
(['Li', 'N', 'O'], (4, 1, 1))
(['Li', 'N', 'O'], (3, 1, 1))


In [None]:
### You could pickle this list for later use

import pickle
with open('Ternary_oxides.pkl', 'wb') as f:
    pickle.dump(pretty_formulas, f)

In [68]:
### You could turn the compositions into reduced formula using pymatgen

from pymatgen import Composition
def comp_maker(comp):
    form = []
    for el, ammt in zip(comp[0], comp[1]):
        form.append(el)
        form.append(ammt)
    form = ''.join(str(e) for e in form)
    pmg_form = Composition(form).reduced_formula
    return pmg_form

pretty_formulas = p.map(comp_maker, flat_list)
print('Each list entry now looks like this: ')
for i in pretty_formulas[:5]:
    print(i)

In [80]:
### Finally, you could put this into a pandas DataFrame
import pandas as pd
new_data = pd.DataFrame({'pretty_formula': pretty_formulas})
# Drop any duplicate compositions
new_data = new_data.drop_duplicates(subset = 'pretty_formula')
new_data.describe()

Unnamed: 0,pretty_formula
count,27412
unique,27412
top,Sn(BrO2)2
freq,1


_D. W. Davies_

___

As an alternative to multithreading, you could use the cell below to do the SMACT test within nested for-loops. However, this can result in quaternary combinations taking many hours.

In [84]:
# Without multiprocessing option
all_compounds = []
start = datetime.now()
for els in itertools.combinations(used_elements, 2):
    elements = [e.symbol for e in els] + ['O']
    paul_a, paul_b= els[0].pauling_eneg, els[1].pauling_eneg
    electronegativities = [paul_a, paul_b, 3.44]
    for ox_a, ox_b in itertools.product(els[0].oxidation_states, els[1].oxidation_states):
        
        ox_states = [ox_a, ox_b, -2]
        # Test for charge balance
        cn_e, cn_r = smact.neutral_ratios(ox_states,threshold = 8)
        if cn_e:
            # Electronegativity test
            electroneg_OK = screening.pauling_test(ox_states, electronegativities)
            if electroneg_OK:
                compound = tuple([elements,cn_r[0]])
                all_compounds.append(compound)
print('Time taken to generate list:  {0}'.format(datetime.now() - start))

Time taken to generate list:  0:00:19.048249
