### pySIPFENN MGF-PSU Workshop (Feb 2023)

This Jupyter notebook is a brief walkthrough covering core functionalities of the **pySIPFENN** or **py**(**S**tructure-**I**nformed **P**rediction of **F**ormation **E**nergy using **N**eural **N**etworks) package; available through the PyPI repository. A fuller description of capabilities is given at PSU Phases Research Lab webpage under _phaseslab.com/sipfenn_.

<p align="center">
  <img src="assets/neuralnetcolorized.png" width="1000"/>
</p>

### Install pySIPFENN

Installing pySIPFENN is simple and easy utilizing either **PyPI** package repository or cloning from **GitHub**. While not required, it is recommended to first set up a virtual environement using venv or Conda. This ensures that one of the required versions of Python (3.9+) is used and there are no dependency conflicts. To create one

    conda create -n pysipfenn-workshop python=3.9 jupyter
    conda activate pysipfenn-workshop

And then install pySIPFENN from PyPI with

    pip install pysipfenn

Alternatively, you can also install pySIPFENN in editable mode if you cloned it from GitHub like

    git clone https://github.com/PhasesResearchLab/pySIPFENN.git

Or by downloading a ZIP file. Then, move to the pySIPFENN folder and install in editable (-e) mode

    cd pySIPFENN
    pip install -e .


### Starting pySIPFENN

To utilize pySIPFENN for straightforward calculations, only the Calculator class is needed. It allows for both fetching and identification of NN models and later running of them.

In [None]:
from pysipfenn import Calculator

Now initialize the Calculator. When run, this should display all models detected
(e.g. ✔ SIPFENN_Krajewski2020 Standard Materials Model)
and those not detected, but declared in the _modelsSIPFENN/models.json_ file.

In [None]:
c = Calculator()

If this is the first run of pySIPFENN and no models are available, one can fetch four default (as of Feb 2023) models from Zenodo with a simple:

In [None]:
c.downloadModels()
#c.loadModels()

For the purpose of testing, a single model is sufficient and will be fetched faster. E.g. the light weight model ('SIPFENN_Krajewski2020_NN24') can be acquired in about 1/30 of the time required to download all four.

In [None]:
#c.downloadModels(network='SIPFENN_Krajewski2020_NN24')
#c.loadModels()

### Simple run from directory

The simplest and most common usage of pySIPFENN is to deploy it on a directory/folder containing atomic structure files such as POSCAR or CIF. To of so, one simply specifies its location and which descriptor / feature vector should be used. The latter determines which ML models will be run, as they require a list of specific and ordered features as input. Furthermore, while the exact model can be specified by the user, by default all applicable models are run, as the run itself is 1-3 orders of magnitude faster than descriptor calculation.

    c.runFromDirectory(directory='myInputFiles', descriptor='KS2022')

In this demonstration, a set of test files shipped with pySIPFENN under **directory**

    pysipfenn/tests/testCaseFiles/exampleInputFiles/

is used. However, feel free to change the directory to something with your structure files. Please note that the file extension (e.g. _.POSCAR_) is required for correct input.

Furthermore, you can specify whether pySIPFENN should run in series or parallel **calculation mode**. The parallel mode is generally faster, but uses more system resources and may be slower on low-power machines with not enough CPU cores. Serial mode is also preferred if there are less than 5 calculations/worker due to multiprocessing overheads.

In [None]:
c.runFromDirectory(directory='../pysipfenn/tests/testCaseFiles/exampleInputFiles/',
                   descriptor='Ward2017',
                   mode='parallel', max_workers=4)

Now, all results are obtained and stored within the **c** Calculator object inside a few exposed conveniently named variables
_predictions_ and _inputFiles_. Also, the descriptor data is retained in _descriptorData_ if needed. Let's look up the first 3 entries.

In [None]:
print(c.inputFiles[:3])
print(c.predictions[:3])

For user convenience, a few methods are provided for extracting the results. E.g., if pySIFPENN has been run from structure files, the _get_resultDictsWithNames()_ method is available to conveniently pass results forward in the code.

In [None]:
c.get_resultDictsWithNames()

Alternatively, if results are to be preserved in a spreadsheet, they can be exported into a CSV.

In [None]:
c.writeResultsToCSV('myFirstResults_pySIPFENN.csv')

### Sigma-Phase 5-sublattice model

(description)

<p align="center">
  <img src="assets/112-Cr12Fe10Ni8.png" width="1000"/>
</p>

In [None]:
from pymatgen.core import Structure
from pymatgen.analysis.structure_analyzer import SpacegroupAnalyzer

In [None]:
baseStructure = Structure.from_file('../pysipfenn/tests/testCaseFiles/exampleInputFiles/0-Cr8Fe18Ni4.POSCAR')
baseStructure.replace_species({'Cr': 'A', 'Fe': 'A', 'Ni':'A'})

In [None]:
spgA = SpacegroupAnalyzer(baseStructure)

In [None]:
eqAtoms = spgA.get_symmetry_dataset()['equivalent_atoms']
uniqueDict = {}
for site, unique in enumerate(eqAtoms):
    if unique not in uniqueDict:
        uniqueDict.update({unique: []})
    uniqueDict[unique] += [site]
print(uniqueDict)

In [None]:
from itertools import product
allPermutations = list(product(['Fe', 'Cr', 'Ni'], repeat=5))
print(f'Obtained {len(allPermutations)} permutations of the sublattice occupancy')

In [None]:
structList = []
for permutation in allPermutations:
    tempStructure = baseStructure.copy()
    for unique, el in zip(uniqueDict, permutation):
        for site in uniqueDict[unique]:
            tempStructure.replace(site, el)
    structList.append(tempStructure)
print(structList[172])

In [None]:
c = Calculator()

In [None]:
c.runModels(structList=structList, descriptor='KS2022', mode='parallel', max_workers=6)

In [None]:
c.get_resultDicts()

In [None]:
c.runModels(structList=structList, descriptor='Ward2017', mode='parallel', max_workers=6)

In [None]:
c.get_resultDicts()

### Add a new model!

Adding a new model that accepts one of the descriptors / feature vectors implemented in pySIPFENN is very easy! No matter if it is a re-trained model to fit a specific set of species, or entirely new architecture. It doesn't even need to be created in PyTorch, as pySIPFENN imports ONNX format which can be the export target of almost all ML frameworks.
To add your model, you just need to put it in the _modelsSIPFENN_ directory in the pySIPFENN location and add a brief definition to the _models.json_ file, with field name matching model file name, descriptive name, and specify which descriptor has been used. E.g.,:

      "SIPFENN_myNewModel": {
        "name": "SIPFENN_Krajewski2022 KS2022 Novel Materials Model - Retrained for Personal Needs",
        "descriptor": "KS2022"
      }

Then just re-initialize the calcualtor and everything should be loaded automatically!

In [None]:
c = Calculator()

In [None]:
c.runModels(structList=structList, descriptor='KS2022', mode='parallel', max_workers=6)