Skip to content

8bitsam/actiongraph-testbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Documentation for the ActionGraph Dataset and Benchmarks

Authors and Contributors

See paper: "Representation of Inorganic Synthesis Reactions and Prediction: Graphical Framework and Datasets."

Data Availability

The dataset used in this paper can be obtained either directly by running the code (see the "Creating Data Sets" section). An alternative is to download it from kaggle, which contains the dataset used in the experiment.

If you download it this way, ensure the two folders extracted from the archives are placed at Data/filtered-mp-data and Data/filtered-ag-data, corresponding to the zip file names.

Running Code

Initialization

  1. Before running the scripts, first install the necessary requirements by running pip install requirements.txt while in this directory.
  2. Then, you can test to see if ActionGraphs are being created correctly by running the Juptyer notebook in example/. This should convert two Materials Project synthesis reactions into ActionGraphs, then display the graphs.

Creating datasets

  1. Create a Materials Project account and obtain an API key here.
  2. Place this key in a text file called api-key.txt in utils/
  3. Run fetch_synthesis_data.py and then filter_synthesis_data.py in utils/. This will create dataset 1 (raw Materials Project data).
  4. Run convert.py in ag-knn-test. This will convert the first dataset into serialized ActionGraphs and thus make dataset 2.
  5. Ensure the datasets match by running remover.py in utils/.

Training Models

  1. To run the model on dataset 1, run the script pipeline.py in knn-baseline/. This will featurize the data, train, evaluate, and save the model.
  2. To run the model on dataset 2, first find the maximum number of nodes. To do this, run find_max_nodes.py in utils/ and update the MAX_NODES variable in ag-knn-test/featurize.py. Then, run pipeline.py in ag-knn-test/.
  3. To run the PCA experiment, simply run run_pca_experiment.py in ag-knn-test/. This will also save relevant plots.

Visualizing Data

The visualizations can be generated by running the visualize_features.py scripts in the respective folders for each model. The feature distributions should be identical. The results will be saved within the Data/ directory.

Data Source and Licensing

Attribution

This project uses data from the Materials Project, which is licensed under CC BY 4.0. The data used here has been filtered to remove unwanted synthesis reactions and transformed into ActionGraphs.

Materials Project: A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K.A. Persson. "The Materials Project: A materials genome approach to accelerating materials innovation." APL Materials 1, 011002 (2013). https://materialsproject.org

License

  • The code in this repository is licensed under the BSD 3-Clause License (see LICENSE.rst).
  • Any data generated using this code and derived from Materials Project data must comply with the CC BY 4.0 license.

About

Testbench for the ActionGraph framework.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published