Skip to content

SarkisovTeam/SyntheticOracle

Repository files navigation

Synthetic Oracle

A python-based tool for parsing and interpreting experimental synthesis reports.

View Worked Example · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

A Python-based tool for large-scale analysis and comparison of synthesis protocols. This module contains tools to:

  1. Organise a sequence of synthesis actions from a well structured .json object (typically generated as an output from our preprocessing workflow
  2. Cross-reference chemical names against the PubChem database of chemical entities
  3. Standardise all physical quantities into their SI units
  4. Perform quantitative meta-analysis on the resultant corpus of synthesis protocols

Built with

(back to top)

Getting Started

Prerequisites

This branch of the software is python native, requiring python 3.7+ environment (although there are issue swith python 3.10 as of 31-1-23). All required packages can be installed through pip with the following command:

pip install -r requirements.txt

Installation

  1. Clone the repository
    git clone https://github.com/SarkisovTeam/SyntheticOracle.git
  2. Install pip requirements (preferable in a fresh python environment)
    pip install -r requirements.txt

(back to top)

Usage

The software reads in raw synthesis sequences from JSON objects with the following tags titles:

{
  "Step name": {
      "0": "Add"
   },
  "text": {
    "0": "Some raw text here"
  },
  "new_chemicals": {
    "0": [
      {
      "name": "Methanol", 
      "mass": "0.5 g", 
      "other_amount": "0.2 mmol", 
      "volume": "0.6 mL"
      }
    ]
  },
  "temp": {
    "0": ["at 300 oC"]
  },
  "time": {
    "0": ["for 24 hours"]
  }
}

Descriptions of how to produce these data structures are provided in the companion repository here. Once generated, the JSON data can be used to instantiate a Sequence object. Through the extract_chemicals class method, a ChemicalList object is created containing a set of records for each chemical mentioned in the sequence, which can be converted to a summarised BillOfMaterials using the produce_bill_of_mats method. Finally, the extract_conditions method of the Sequence object can produce a table of synthesis times and temperatures (a Conditions object). A worked example of the data workflow is shown in the example folder.

(back to top)

Roadmap

Short term (<1 month )

  • Well-documented demo files (and/or jupyter notebooks) demonstrating code features and functionality
  • Jupyter notebook examples to reproduce informaiton and figures from the associated manuscript.
  • Pruning old and unused data

Medium-term (< 12 months)

  • Identification of MOFID to automatically identify chemical constraints on synthesis
  • Quality metrics for synthesis parsing
  • Generating material-by-material synthesis reports with summary data and details, à la David Fairen-Jiminez' MOFexplorer
    • Define steps to build a plotly dashboard
    • Put them in here

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Joe Manning - @jrhmanning - joseph.manning@manchester.ac.uk

(back to top)

(back to top)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published