Skip to content

Ramprasad-Group/polymer_information_extraction

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 

Polymer Information Extraction

This repo contains code for the paper 'A general purpose material property extraction pipeline from large polymer corpora using natural language processing'[1].

Requirements and Setup

  • Python 3.7
  • Pytorch (version 1.10.0)
  • Transformers (version 4.17.0)

You can install all required Python packages using the provided environment.yml file using conda env create -f environment.yml

Running the code

Example scripts and parameters for running training of the NER model is provided in the file run_ner.sh.

The script for fine-tuning of the masked language model can be run by using the following command:

python run_mlm.py \
    --model_name_or_path bert-base \
    --train_file /path/to/train/file \
    --do_train \
    --do_eval \
    --output_dir /output

Use python data_extraction.py to combine NER predictions using heuristic rules.

The NER model used for sequence labeling can be found here

The MaterialsBERT language model that is used as the encoder for the above NER model can be found here

Please cite our paper if you use the code or data in this repo

@article{materialsbert,
  title={A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing},
  author={Shetty, Pranav and Rajan, Arunkumar Chitteth and Kuenneth, Chris and Gupta, Sonakshi and Panchumarti, Lakshmi Prerana and Holm, Lauren and Zhang, Chao and Ramprasad, Rampi},
  journal={npj Computational Materials},
  volume={9},
  number={1},
  pages={52},
  year={2023},
  publisher={Nature Publishing Group UK London}
}

References

[1] Shetty, P., Rajan, A., Kuenneth, C., Gupta, S., Panchumarti, L., Holm, L., Zhang, C. & Ramprasad, R. A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing. npj Computational Materials 9, 52 (2023)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published