A python-based tool for parsing and interpreting experimental synthesis reports.
View Worked Example
·
Report Bug
·
Request Feature
Table of Contents
A Python-based tool for large-scale analysis and comparison of synthesis protocols. This module contains tools to:
- Organise a sequence of synthesis actions from a well structured .json object (typically generated as an output from our preprocessing workflow
- Cross-reference chemical names against the PubChem database of chemical entities
- Standardise all physical quantities into their SI units
- Perform quantitative meta-analysis on the resultant corpus of synthesis protocols
This branch of the software is python native, requiring python 3.7+ environment (although there are issue swith python 3.10 as of 31-1-23). All required packages can be installed through pip with the following command:
pip install -r requirements.txt
- Clone the repository
git clone https://github.com/SarkisovTeam/SyntheticOracle.git
- Install pip requirements (preferable in a fresh python environment)
pip install -r requirements.txt
The software reads in raw synthesis sequences from JSON objects with the following tags titles:
{
"Step name": {
"0": "Add"
},
"text": {
"0": "Some raw text here"
},
"new_chemicals": {
"0": [
{
"name": "Methanol",
"mass": "0.5 g",
"other_amount": "0.2 mmol",
"volume": "0.6 mL"
}
]
},
"temp": {
"0": ["at 300 oC"]
},
"time": {
"0": ["for 24 hours"]
}
}
Descriptions of how to produce these data structures are provided in the companion repository here. Once generated, the JSON data can be used to instantiate a Sequence
object. Through the extract_chemicals
class method, a ChemicalList
object is created containing a set of records for each chemical mentioned in the sequence, which can be converted to a summarised BillOfMaterials
using the produce_bill_of_mats
method. Finally, the extract_conditions
method of the Sequence
object can produce a table of synthesis times and temperatures (a Conditions
object). A worked example of the data workflow is shown in the example folder.
- Well-documented demo files (and/or jupyter notebooks) demonstrating code features and functionality
- Jupyter notebook examples to reproduce informaiton and figures from the associated manuscript.
- Pruning old and unused data
- Identification of MOFID to automatically identify chemical constraints on synthesis
- Quality metrics for synthesis parsing
- Generating material-by-material synthesis reports with summary data and details, à la David Fairen-Jiminez' MOFexplorer
- Define steps to build a plotly dashboard
- Put them in here
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.
Joe Manning - @jrhmanning - joseph.manning@manchester.ac.uk