QupKake combines GFN2-xTB calculations with graph-neural-networks to accurately predict micro-pKa values of organic molecules. It is part of the paper: QupKake: Integrating Machine Learning and Quantum Chemistry for micro-pKa Predictions.
- Python >= 3.9
- pytorch >= 2.0
- pytorch_geometric >= 2.3.0
- pytorch_lightning >= 2.0.2
- rdkit >= 2022.03.03
- xtb == 6.4.1
We recommend using conda to install QupKake.
Clone the repository:
git clone https://github.com/Shualdon/QupKake.git
cd qupkake
Create a conda environment from the environment.yml file:
conda env create -f environment.yml
conda activate qupkake
This will create a conda environment with all the dependencies installed.
Install the package:
pip install .
Create a conda environment:
conda create -n qupkake python=3.9
conda activate qupkake
Clone the repository and install using pip:
git clone https://github.com/Shualdon/QupKake.git
cd qupkake
pip install .
This will install the package and all the rest of the dependencies.
Due to bugs in the conda version of xtb
, it should be installed from source, and the path to the executable should be set up before running QupKake:
export XTBPATH=/path/to/xtb/executable
Follow the xtb documentation for more information.
The linux binaries of xtb
come with the package and will be used by default if the neither the conda package or the $XTBPATH
environment variable are set up.
Qupkake can be used as a Python package or as a command line tool. This gives the user the flexibility to use the package in their own code or to use it as a stand-alone tool.
Once installed, QupKake can be used as a command line tool. The general syntax for running the program is:
$ qupkake <input_type> <input> <flags>
The general flags that can be used are:
-r, --root: Root directory for processing data. Default: data
-t, --tautomerize: Find the most stable tautomer for the molecule(s) Default: False
-mp [N], --multiprocessing [N]: Use Multiprocessing. True if used alone. If followed by a number, it will use that number of subprocesses. (default: False)
Qupkake has 2 types of inputs that can be used to run the program:
$ qupkake smiles "SMILES"
Specific flags for this input type are:
-n, --name: molecule name. Default: molecule
-o, --output: output file name (SDF with pKa predictions). Default: qupkake_output.sdf
$ qupkake file <filename>
Specific flags for this input type are:
-s, --smiles_col: column name for SMILES strings. Default: smiles
-n, --name_col: column name for molecule names. Default: name
-o, --output: output file name (SDF with pKa predictions). Default: qupkake_output.sdf
TBA
If you use this package in your research or application, please cite the following paper:
@article{qupkake,
title={QupKake: Integrating Machine Learning and Quantum Chemistry for micro-pKa Predictions},
DOI={10.26434/chemrxiv-2023-gxplb},
journal={ChemRxiv},
publisher={Cambridge Open Engage},
author={Abarbanel, Omri and Hutchison, Geoffrey},
year={2023}
}
Copyright (c) 2023, Omri D Abarbanel, Hutchison Group, University of Pittsburgh, PA, USA.
Project based on the Computational Molecular Science Python Cookiecutter version 1.1.