### ElemNet Installation

ElemNet is a deep neural network model that takes only the elemental compositions as inputs and leverages artificial intelligence to automatically capture the essential chemistry to predict materials properties. ElemNet can automatically learn the chemical interactions and similarities between different elements which allows it to even predict the phase diagrams of chemical systems absent from the training dataset more accurately than the conventional machine learning models based on physical attributes levaraging domain knowledge.

ElemNet is a 17-layered fully connected network for the prediction of formation energy (enthalpy) from elemental compositions only. This repository contains the model weights and a Jupyter notebook for making predictions using the ElemNet model.

Input: Takes a 2D numpy array with the rows representing different compounds, and columns representing the elemental compositions with 86 elements in the set elements- ['H', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu'], elemental compositon does not contain any element from ['He', 'Ne', 'Ar', 'Po', 'At','Rn','Fr','Ra']

Output: Returns a 1D numpy array with the predicted formation energy

Installation directions here:

    https://github.com/NU-CUCIS/ElemNet/blob/master/README.md
    
#### Installation requirements
The basic requirement for re-using these environments are a Python 3.6.3 Jupyter environment with the packages listed in requirements.txt.

Some analyses required the use of Magpie, which requires Java JDK 1.7 or greater. See [the Magpie documentation for details].

    ERROR: No matching distribution found for numpy==1.22.0
    solution: upgrade python to 3.8+ - used brew upgrade python
    not sure if it worked

In [1]:
# Requirements:
import pandas as pd, warnings, sklearn, numpy as np, matplotlib.pyplot as plt
warnings.filterwarnings('ignore')

  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'


### Running the code

According to the readme in the elemnet folder, can run the code by passing a sample config file to the dl_regressors.py as follows:

python dl_regressors.py --config_file sample/sample-run.config

The config file defines the loss_type, training_data_path, test_data_path, label, input_type [elements_tl for ElemNet] and other runtime parameters. For transfer learning used in paper [2], you need to set 'model_path' to the model checkpoint trained on the larger dataset (OQMD in our case) [e.g. "model_path":"sample/sample_model"] in the config file. The output log from this sample run is provided in the sample/sample.log file.

However, when running 

    python dl_regressors.py --config_file sample/sample-run.config

I get the following error message:
 
    zsh: illegal hardware instruction  python dl_regressors.py --config_file sample/sample-run.config

## Testing some notebooks

For the most part, there's a good walk through of a lot of modules and many notebooks are included so you can copy and paste accordingly after grabbing all the necessary files. The only problem I walked into below was with the pymatgen submodules, which was an easy fix. Updates will have to be made throughout the repository so users can avoid these problems.

# Find Similar Compounds
Given our list of "interesting" compounds, see if we can find any similar stable compounds in the OQMD

In [21]:
cd

/Users/emiljaffal


In [22]:
import pandas as pd
import json

In [23]:
import pymatgen
from pymatgen.core.composition import Composition
#Had to adjust from pymatgen import composition, 
#see following: https://matsci.org/t/python-problem-with-pymatgen/35720

## Load in Stable Compounds from OQMD
Reading from the datafile that was used to generate the training set for the DL model.

In [24]:
oqmd_all = pd.read_csv('desktop/elemnet/elemnet/data/oqmd_all.data', delim_whitespace=True)
print('Read %d entries'%len(oqmd_all))

Read 506114 entries


In [25]:
oqmd_all['stability'] = pd.to_numeric(oqmd_all['stability'], 'coerce')

In [26]:
oqmd_all.query('stability <= 0', inplace=True)
print('%d stable compounds'%len(oqmd_all))

21947 stable compounds


### Generate Lookup Values for Each Entry
Classify each entry by the stoichiometry and group of each element. Examples:
- NaCl is 50% of a group 1 element and 50% of group 17
- NaKCl2 is 25% of two different group 1 elements and 50% of a group 17 element

In [27]:
oqmd_all['comp_obj'] = [Composition(x) for x in oqmd_all['comp']]

Compute lookup values

In [28]:
def get_prototype(comp):
    return tuple(sorted((e.group, y) for e,y in comp.fractional_composition.items())) 

In [29]:
oqmd_all['prototype'] = oqmd_all['comp_obj'].apply(get_prototype)

Get list of examples for each prototype

In [30]:
prototypes = dict([(x,[c.get_integer_formula_and_factor()[0] for c in group['comp_obj']]) 
                   for x,group in oqmd_all.groupby('prototype')])

In [31]:
print('Found %d prototypes'%len(prototypes))

Found 9211 prototypes


## Find if Interesting Compositions are Similar to those in the OQMD
Use the prototype list we worked up earlier

In [32]:
interesting_list = json.load(open('interesting_compounds.list'))

FileNotFoundError: [Errno 2] No such file or directory: 'interesting_compounds.list'

In [33]:
interesting_list = pd.DataFrame({'composition': interesting_list})

NameError: name 'interesting_list' is not defined

In [34]:
interesting_list['comp_obj'] = [Composition(x) for x in interesting_list['composition']]

NameError: name 'interesting_list' is not defined

In [35]:
interesting_list['prototype'] = interesting_list['comp_obj'].apply(get_prototype)

NameError: name 'interesting_list' is not defined

In [36]:
interesting_list['similiar'] = [prototypes.get(x,[]) for x in interesting_list['prototype']]

NameError: name 'interesting_list' is not defined

The following table shows similar compounds to those from our DL predictions. Each example "similar" structure is a stable compound in the OQMD

In [37]:
interesting_list

NameError: name 'interesting_list' is not defined