# Analyze Stable QHs
This notebook analyzes the 55 Quaternary Heuslers (QHs) discovered in this work. We want to know if there are any commonalities between them.

In [1]:
from matminer.utils.data import MagpieData
from pymatgen import Composition, Element
from scipy.special import comb
import pandas as pd
import numpy as np
import itertools
import json
import os

Variables to change

In [2]:
elem_space = "Ac Ag Al As Au B Ba Be Bi Ca Cd Ce Co Cr Cs Cu Dy Er Eu " \
"Fe Ga Gd Ge Hf Hg Ho In Ir K La Li Lu Mg Mn Mo Na Nb Nd Ni Np Os Pa Pb " \
"Pd Pm Pr Pt Pu Rb Re Rh Ru Sb Sc Si Sm Sn Sr Ta Tb Tc Te Th Ti Tl Tm U V W Y Yb Zn Zr".split(" ")

In [3]:
no_rare_earths = [e for e in elem_space if not Element(e).is_rare_earth_metal]
print('There are %d elements that are not rare earths'%len(no_rare_earths))

There are 52 elements that are not rare earths


## Load in the Dataset
Read in the list of new stable QHs from the search, and original training set

In [4]:
data = pd.read_csv('stable_compounds.csv')

In [5]:
data.head()

Unnamed: 0,Stable,Formation (DFT),Hull (DFT),N_valence
0,LiMgSnPt,-0.698367,-0.041047,17
1,LiZnGaPd,-0.515589,-0.020172,16
2,LiAgCdIn,-0.208676,-0.008758,7
3,LiAgInAu,-0.347074,-0.019099,6
4,LiPdCdIn,-0.464809,-0.022169,16


Compute `Composition` objects for each QH

In [6]:
data['comp obj'] = data['Stable'].apply(Composition)

In [7]:
def get_n_valence(elem):
    """Get the number of valence electrons for an element"""
    
    # Noble gasses have no valence electrons
    if elem.is_noble_gas:
        return 0
    
    # Get the top (highest n) valence shells of each l
    g = elem.group
    
    if elem == Element('Lu') or elem == Element('Lr'):
        return 3 # Special case: full f block
    elif elem.is_rare_earth_metal:
        return g
    else:
        # Remove d electrons for fully filled shells 
        # We count (Cu and Zn groups as having full shells)
        return g if g < 11 else g - 10
assert get_n_valence(Element('Li')) == 1
assert get_n_valence(Element('Al')) == 3
assert get_n_valence(Element('Ga')) == 3
assert get_n_valence(Element('Pd')) == 10
assert get_n_valence(Element('Cl')) == 7
assert get_n_valence(Element('Lu')) == 3
assert get_n_valence(Element('Xe')) == 0

In [8]:
assert np.isclose(data['N_valence'] - data['comp obj'].apply(lambda x: sum(get_n_valence(e) for e in x)), 0).all()

Load in the training set. Get just the compositions

In [9]:
def load_training_set():
    """Load in the Quaternary Heuslers from the training set
    
    :return: pd.Dataframe with columns:
        composition - Composition of a QH
        stability - stability of most stable QH at that composition"""
    
    # Load from disk
    with open(os.path.join('..', 'datasets', 'quat-heuslers.json')) as fp:
        temp = json.load(fp)
    
    # Get the composition and stability
    data = pd.DataFrame({
        'composition': [x['composition'] for x in temp['entries']],
        'stability': [x['class']['measured'] for x in temp['entries']]
    })
    
    # Return the lowest stability at each composition
    data.sort_values('stability', ascending=True, inplace=True)
    return data.drop_duplicates('composition', keep='first')

In [10]:
training_data = load_training_set()

In [11]:
training_data['comp obj'] = training_data['composition'].apply(Composition)

Eliminate entries in training set that contain elements which are not in the search space

In [12]:
elem_space_pmg = [Element(e) for e in elem_space]

In [13]:
training_data['in_search'] = training_data['comp obj'].apply(lambda x: all([e in elem_space_pmg for e in x.keys()]))
training_data.query('in_search == True', inplace=True)
print('Total number of compositions in training set:', len(training_data))

Total number of compositions in training set: 31924


## Are there any common elements?
Test whether any elements appear especially frequently in this dataset

In [14]:
search_space_size = comb(len(no_rare_earths), 4) # Number of composition in search space

In [15]:
possible_appearances = comb(len(no_rare_earths) - 1, 3) # Number of times element should appear in search space

In [16]:
expected_n_appearences = possible_appearances / search_space_size * 55 # each element appears in ([n-1] choose 3) * 3 entries, 
# out of total entries (n choose 4) entries. I approximate random by assuming the probability
# of retrieving an entry with a certain element is equal to this fraction, and that the 
# repeated draws do not change the probability (ok since there are very many entries)
print('If elements appear randomly, each element should appear in %.2f results'%expected_n_appearences)

If elements appear randomly, each element should appear in 4.23 results


Compute how many times each element appears

In [17]:
def count_appearances(data, elems=no_rare_earths):
    """Count the number of times an element appears in a dataset
    
    :param data: DataFrame, dataset to assess
    :return: DataFrame, with columns:
        element - Element
        count - Number of times that element_appears"""
    appearances = pd.DataFrame({'element': elems})
    appearances['element'] = appearances['element'].apply(Element)
    appearances['count'] = appearances['element'].apply(lambda x: sum(x in c for c in data['comp obj']))
    return appearances
appearances = count_appearances(data)

In [18]:
appearances['ratio'] = appearances['count'] / expected_n_appearences

In [19]:
appearances.sort_values('ratio', ascending=False, inplace=True)
appearances.head(10)

Unnamed: 0,element,count,ratio
22,Li,43,10.163636
0,Ag,21,4.963636
23,Mg,19,4.490909
19,In,16,3.781818
38,Sc,15,3.545455
3,Au,15,3.545455
31,Pd,14,3.309091
50,Zn,13,3.072727
15,Ga,11,2.6
1,Al,9,2.127273


*Finding*: Li appears in this dataset way more than we would expect from random

In [20]:
appearances.tail()

Unnamed: 0,element,count,ratio
14,Fe,0,0.0
16,Ge,0,0.0
17,Hf,0,0.0
21,K,0,0.0
26,Na,0,0.0


In [21]:
appearances[appearances['element'] == Element('Rb')]

Unnamed: 0,element,count,ratio
33,Rb,0,0.0


*Question*: Does Li appear often in the training set?

In [22]:
training_set_appearances = count_appearances(training_data)

In [23]:
training_set_appearances['number stable'] = count_appearances(training_data.query('stability <= 0'))['count']
training_set_appearances['frac stable'] = training_set_appearances['number stable'] / training_set_appearances['count']

In [24]:
training_set_appearances['frac searched'] = training_set_appearances['count'] / possible_appearances

In [25]:
training_set_appearances.sort_values('number stable', ascending=False).head(10)

Unnamed: 0,element,count,number stable,frac stable,frac searched
22,Li,3583,136,0.037957,0.172053
10,Co,6957,53,0.007618,0.33407
1,Al,2319,52,0.022423,0.111357
20,Ir,2009,49,0.02439,0.096471
45,Ti,2397,47,0.019608,0.115102
15,Ga,2319,46,0.019836,0.111357
13,Cu,7615,43,0.005647,0.365666
16,Ge,2367,31,0.013097,0.113661
39,Si,2364,31,0.013113,0.113517
14,Fe,6949,29,0.004173,0.333685


*Finding*: Li occurs most frequnetly in the stable compounds in the training set. However, Ag does not appear often and nor does Mg

## Are there common combinations of groups?
Test whether our stbale compounds contains elements from certain groups more frequently than others

In [26]:
data['groups'] = data['comp obj'].apply(lambda x: tuple(sorted(e.group for e in x.keys())))

In [27]:
data['groups'].value_counts().head(10)

(1, 3, 10, 13)     6
(1, 3, 11, 13)     4
(1, 10, 12, 13)    4
(1, 11, 12, 13)    3
(2, 3, 11, 12)     3
(1, 2, 10, 13)     3
(1, 11, 11, 13)    3
(1, 2, 11, 13)     3
(1, 2, 10, 12)     2
(3, 11, 11, 13)    2
Name: groups, dtype: int64

In [28]:
Li_plus_1011 = sum(data['groups'].apply(lambda x: 1 in x and (10 in x or 11 in x)))
print('%d compounds contain Li and an element from group 10 or 11'%Li_plus_1011)

38 compounds contain Li and an element from group 10 or 11


*Finding*: There a plenty of compounds with group 1, 3, 10/11, and 13. 

## What is the most common number of valence electrons

## Did we run out of 18 electron compounds to search?
We do not find many 18 or 24 electron compounds in our search space, is that just because there are none left!

In [29]:
all_possible_compounds = pd.DataFrame({'composition': [''.join(c) for c in itertools.combinations(no_rare_earths, 4)]})
print('Generated %d possible compounds'%len(all_possible_compounds))

Generated 270725 possible compounds


In [30]:
all_possible_compounds['comp obj'] = all_possible_compounds['composition'].apply(Composition)

In [31]:
train_set_comps = set(training_data['comp obj'].apply(lambda x: x.reduced_formula))
all_possible_compounds['in_train'] = all_possible_compounds['comp obj'].apply(lambda x: x.reduced_formula)\
    .apply(lambda x: x in train_set_comps)

In [32]:
print('%d compounds of the search space have not yet been evaluated'%sum(np.logical_not(all_possible_compounds['in_train'])))

242809 compounds of the search space have not yet been evaluated


In [33]:
all_possible_compounds['N_valence'] = all_possible_compounds['comp obj'].apply(lambda x: sum(get_n_valence(e) for e in x))

In [34]:
n_unfound_1824 = len(all_possible_compounds.query('in_train == False and (N_valence == 18 or N_valence == 24)'))
print('%d compounds have not been searched and have 18 or 24 electrons'%n_unfound_1824)
print('If we picked randomly, we would expect to find %d 18 electron compounds'%(
    n_unfound_1824 / sum(np.logical_not(all_possible_compounds['in_train'])) * len(data))
 )

22876 compounds have not been searched and have 18 or 24 electrons
If we picked randomly, we would expect to find 5 18 electron compounds


In [35]:
print('Only %d new QHs have 18 or 24 electrons'%len(
    data.query('N_valence == 18 or N_valence == 24')
))

Only 3 new QHs have 18 or 24 electrons


*Finding*: There are plenty of 18/24 electron compounds to find, we just do not predict them to be stable.