See paper: "Representation of Inorganic Synthesis Reactions and Prediction: Graphical Framework and Datasets."
The dataset used in this paper can be obtained either directly by running the code (see the "Creating Data Sets" section). An alternative is to download it from kaggle, which contains the dataset used in the experiment.
If you download it this way, ensure the two folders extracted from the archives are placed at
Data/filtered-mp-data and Data/filtered-ag-data, corresponding to the zip file names.
- Before running the scripts, first install the necessary requirements by running
pip install requirements.txtwhile in this directory. - Then, you can test to see if ActionGraphs are being created correctly by running the Juptyer notebook in
example/. This should convert two Materials Project synthesis reactions into ActionGraphs, then display the graphs.
- Create a Materials Project account and obtain an API key here.
- Place this key in a text file called
api-key.txtinutils/ - Run
fetch_synthesis_data.pyand thenfilter_synthesis_data.pyinutils/. This will create dataset 1 (raw Materials Project data). - Run
convert.pyinag-knn-test. This will convert the first dataset into serialized ActionGraphs and thus make dataset 2. - Ensure the datasets match by running
remover.pyinutils/.
- To run the model on dataset 1, run the script
pipeline.pyinknn-baseline/. This will featurize the data, train, evaluate, and save the model. - To run the model on dataset 2, first find the maximum number of nodes. To
do this, run
find_max_nodes.pyinutils/and update theMAX_NODESvariable inag-knn-test/featurize.py. Then, runpipeline.pyinag-knn-test/. - To run the PCA experiment, simply run
run_pca_experiment.pyinag-knn-test/. This will also save relevant plots.
The visualizations can be generated by running the visualize_features.py
scripts in the respective folders for each model. The feature distributions
should be identical. The results will be saved within the Data/ directory.
- The original data is sourced from the Materials Project.
- The Materials Project data is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
- The code in this repository does not redistribute any Materials Project data. It only provides scripts to download, filter, and transform the data.
This project uses data from the Materials Project, which is licensed under CC BY 4.0. The data used here has been filtered to remove unwanted synthesis reactions and transformed into ActionGraphs.
Materials Project: A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K.A. Persson. "The Materials Project: A materials genome approach to accelerating materials innovation." APL Materials 1, 011002 (2013). https://materialsproject.org
- The code in this repository is licensed under the BSD 3-Clause License (see
LICENSE.rst). - Any data generated using this code and derived from Materials Project data must comply with the CC BY 4.0 license.