HSPOC

H-SPOC descriptor generator

About The Project

pKa is one of the most fundamental physicochemical properties of compounds. Microscopic pKa at specific sites are important in the researches of organic chemical reactivity, protein docking and drug design. However, the determination of micro-pKa was challenging on both experimental measurements and theoretical calculations. Although micro-pKa was a valuable concept, even in the present era, the data of micro-pKa was still lacking. The methodologies employed for the accurate predictions of micro-pKa were still developing. In this work, based on a reliable and accurate experimental pKa database: iBonD, we developed a high-precision machine learning prediction method for pKa prediction at any local sites in small molecules based on H-SPOC descriptor. The model could obtain R2 = 0.95, RMSE = 1.45 and reached the state-of-art in SAMPL6 and SAMPL7 challenges. In more testing, H-SPOC served its ability for micro- pKa prediction and conform-specific prediction.

Getting Started

Installation

Clone the repo

git clone https://github.com/DeepSynthesis/HSPOC-version-1.0

Create new python environment

conda create -n HSPOC-env python=3.11 -y
conda activate HSPOC-env
pip install rdkit==2023.03.3 
pip install networkx==3.3
conda install pandas=2.0
conda install numpy=1.24
conda install scikit-learn=1.2.2
conda install xgboost=1.7
conda install lightgbm=4.3.0
conda install catboost=1.2.3
conda install matplotlib=3.7

(back to top)

Usage

Descriptor generation and pKa prediction: open ./script/hspoc_NoStructure_v1.py ,then you can find codes as below around line 143.
```
 datafilename='csvFileName'
```
change line 143 'csvFileName' to your datafile name. The data (.csv) must include column named "ID" "solvent" "SMILES" "H_index" "filetype" .
```
cd scripts
python hspoc_NoStructure_v1.py
```
After that, the descriptors of your data will be saved in 'PredicPSPOC.csv', and the result of your data will be saved in './Pred/After******.csv'.
Modeling:
```
python Methods.py
```
Predict contributions in pH range 0~14:

prepare a file like ./Pred/Gly_states.csv, the SMILES of the acid species should be ordered as the dissociation order. (column pKa was not necessary)

then run
```
python Get_pH_contribution.py 
```
After prediction, you could find the results at './Pred/' and a plot will be saved as './script/Gly_Pred.png'

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
crystal data		crystal data
scripts		scripts
HSPOC.yaml		HSPOC.yaml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HSPOC

About The Project

Getting Started

Installation

Usage

About

Uh oh!

Releases 1

Packages

Languages

DeepSynthesis/HSPOC

Folders and files

Latest commit

History

Repository files navigation

HSPOC

About The Project

Getting Started

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages