QupKake - Predict micro-pKa of organic molecules

QupKake combines GFN2-xTB calculations with graph-neural-networks to accurately predict micro-pKa values of organic molecules. It is part of the paper: QupKake: Integrating Machine Learning and Quantum Chemistry for micro-pKa Predictions.

Requirements

Python >= 3.9
pytorch >= 2.0
pytorch_geometric >= 2.3.0
pytorch_lightning >= 2.0.2
rdkit >= 2022.03.03
xtb == 6.4.1

Installation

We recommend using conda to install QupKake.

Option 1

Clone the repository:

git clone https://github.com/Shualdon/QupKake.git
cd qupkake

Create a conda environment from the environment.yml file:

conda env create -f environment.yml
conda activate qupkake

This will create a conda environment with all the dependencies installed.

Install the package:

pip install .

Option 2

Create a conda environment:

conda create -n qupkake python=3.9
conda activate qupkake

Clone the repository and install using pip:

git clone https://github.com/Shualdon/QupKake.git
cd qupkake
pip install .

This will install the package and all the rest of the dependencies.

`xtb` Installation

Due to bugs in the conda version of xtb, it should be installed from source, and the path to the executable should be set up before running QupKake:

export XTBPATH=/path/to/xtb/executable

Follow the xtb documentation for more information.

The linux binaries of xtb come with the package and will be used by default if the neither the conda package or the $XTBPATH environment variable are set up.

Usage

Qupkake can be used as a Python package or as a command line tool. This gives the user the flexibility to use the package in their own code or to use it as a stand-alone tool.

Command line

Once installed, QupKake can be used as a command line tool. The general syntax for running the program is:

$ qupkake <input_type> <input> <flags>

The general flags that can be used are:

-r, --root: Root directory for processing data. Default: data

-t, --tautomerize: Find the most stable tautomer for the molecule(s) Default: False

-mp [N], --multiprocessing [N]: Use Multiprocessing. True if used alone. If followed by a number, it will use that number of subprocesses. (default: False)

Qupkake has 2 types of inputs that can be used to run the program:

1. A single molecule as a SMILES string:

$ qupkake smiles "SMILES"

Specific flags for this input type are:

-n, --name: molecule name. Default: molecule

-o, --output: output file name (SDF with pKa predictions). Default: qupkake_output.sdf

2. A CSV or SDF file containing multiple molecules

$ qupkake file <filename>

Specific flags for this input type are:

-s, --smiles_col: column name for SMILES strings. Default: smiles

-n, --name_col: column name for molecule names. Default: name

-o, --output: output file name (SDF with pKa predictions). Default: qupkake_output.sdf

Python package

TBA

Citation

If you use this package in your research or application, please cite the following paper:

Bibtex

@article{qupkake, 
    title={QupKake: Integrating Machine Learning and Quantum Chemistry for micro-pKa Predictions}, 
    DOI={10.26434/chemrxiv-2023-gxplb}, 
    journal={ChemRxiv}, 
    publisher={Cambridge Open Engage}, 
    author={Abarbanel, Omri and Hutchison, Geoffrey}, 
    year={2023}
}

Copyright

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.1.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.github		.github
data		data
devtools		devtools
docs		docs
qupkake		qupkake
.codecov.yml		.codecov.yml
.gitattributes		.gitattributes
.gitignore		.gitignore
.lgtm.yml		.lgtm.yml
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
QupKake.png		QupKake.png
QupKake_white.png		QupKake_white.png
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
readthedocs.yml		readthedocs.yml
setup.cfg		setup.cfg
setup.py		setup.py
versioneer.py		versioneer.py
xversioneer.py		xversioneer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QupKake - Predict micro-pKa of organic molecules

Requirements

Installation

Option 1

Option 2

`xtb` Installation

Usage

Command line

1. A single molecule as a SMILES string:

2. A CSV or SDF file containing multiple molecules

Python package

Citation

Bibtex

Copyright

Acknowledgements

About

Releases

Packages

Languages

License

Shualdon/QupKake

Folders and files

Latest commit

History

Repository files navigation

QupKake - Predict micro-pKa of organic molecules

Requirements

Installation

Option 1

Option 2

xtb Installation

Usage

Command line

1. A single molecule as a SMILES string:

2. A CSV or SDF file containing multiple molecules

Python package

Citation

Bibtex

Copyright

Acknowledgements

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

`xtb` Installation

Packages