Skip to content

CambridgeMolecularEngineering/chemdataextractor2

Repository files navigation

ChemDataExtractor

ChemDataExtractor v2 is a toolkit for extracting chemical information from the scientific literature. Python 3.5 to Python 3.8 supported.

Installation

Create a virtual environment, for example with conda

conda create -n cde2 python=3.8

Activate the cde2 environment

conda activate cde2

Install chemdataextractor2 with pip

pip install chemdataextractor2

Features

  • HTML, XML and PDF document readers
  • Chemistry-aware natural language processing pipeline
  • Chemical named entity recognition
  • Rule-based parsing grammars for property and spectra extraction
  • Table parser for extracting tabulated data
  • Document processing to resolve data interdependencies

Documentation & Development

Please read the documentation for instructions on contributing to the project.

https://cambridgemolecularengineering-chemdataextractor-development.readthedocs-hosted.com/en/latest/

License

ChemDataExtractor v2 is licensed under the MIT license_, a permissive, business-friendly license for open source software.

MIT license: https://github.com/CambridgeMolecularEngineering/ChemDataExtractor/blob/master/LICENSE