This is a random forest machine learning model with a new feature set combined with the standard composition features such as Magpie descriptors for effective space group prediction for inorganic materials.
If you find our software is useful, please cite it as:
Li Y, Dong R, Yang W, et al. Composition based crystal materials symmetry prediction using machine learning with enhanced descriptors[J]. Computational Materials Science, 2021, 198: 110686.
Developed in 2021.4-30 at
School of Mechanical Engineering
Guizhou University, Guiyang, China
Machine Learning and Evolution Laboratory
Department of Computer Science and Engineering
University of South Carolina, Columbia, USA
Our model of space group prediction in cubic material is trained with the dataset of 'ML/cubic.csv' by useing the 'ML/RF_of_us.py' , and the dataset used for other crystal system training can be downloaded from here data.csv. Moreover, the two previous work frameworks for space group classification are also put in the ML folder.
Prediction performance for space groups over different crystal systems (10 fold cross validation)
Crystal system | Accuracy | F1 score | Recall | Precision | MCC |
---|---|---|---|---|---|
Triclinic | 0.835±0.013 | 0.834±0.013 | 0.835±0.013 | 0.835±0.013 | 0.665±0.026 |
Monoclinic | 0.712±0.009 | 0.703±0.010 | 0.712±0.009 | 0.715±0.010 | 0.647±0.011 |
Orthorhombic | 0.755±0.005 | 0.746±0.006 | 0.755±0.005 | 0.759±0.005 | 0.729±0.006 |
Tetragonal | 0.849±0.013 | 0.840±0.014 | 0.849±0.013 | 0.846±0.013 | 0.832±0.015 |
Trigonal | 0.824±0.012 | 0.818±0.012 | 0.824±0.012 | 0.823±0.013 | 0.797±0.014 |
Hexagonal | 0.909±0.008 | 0.906±0.008 | 0.909±0.008 | 0.908±0.008 | 0.888±0.010 |
Cubic | 0.961±0.006 | 0.959±0.006 | 0.961±0.006 | 0.960±0.005 | 0.945±0.008 |
To use this machine learning model, you need to create an environment with the correct dependencies. Using Anaconda
this can be accomplished with the following commands:
conda create --name SG_predict python=3.6
conda activate SG_predict
conda install --channel conda-forge pymatgen
pip install matminer
pip install scikit-learn==0.24.1
Once you have setup an environment with the correct dependencies you can install by the following commands:
conda activate SG_predict
git clone https://github.com/Yuxinya/SG_predict
cd SG_predict
pip install -e .
Pre-trained models are stored in google drive. Download the file model.zip
from from the figshare. After downing the file, copy it to SG_predict
and extract it. the model
folder should be in the SG_predict
directory after the extraction is completed. If you do not do this, the model can only make claccification by providing the crystal system information.
In order to test your installation you can run the following example from your SG_predict
directory:
cd /path/to/SG_predict/
python predict.py -i full_formula -s crystal_system
for example:
python predict.py -i Zn24Si24Bi16O96 -s cubic
python predict.py -i Zn24Si24Bi16O96
The following cyrstal_system values are accepted
crystal # crystal system unknown.
cubic
hexagonal
trigonal
tetragonal
orthorhombic
monoclinic
triclinic
You can also use this algorithm to train, test and predict data
cd /path/to/SG_predict/
python predict.py -data the data you provideed -type train, test or predict
for example:
python model.py -data data/train.csv -type train
python model.py -data data/test.csv -type test
python model.py -data data/predict.csv -type predict
The following .csv format are accepted for train and test
formula | space_group |
---|---|
Na8Al6Si6S1O28 | 195 |
Na4Cl4O12 | 198 |
The following .csv format are accepted for predict
formula |
---|
Na8Al6Si6S1O28 |
Na4Cl4O12 |