#CafChem tools for predicting pharmacokinetic properties using pksmart

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MauricioCafiero/CafChem/blob/main/notebooks/PK_prediction_CafChem.ipynb)

## This notebook allows you to:
- Generate predictions of pharmacokinetic properties using pksmart.
- properties include:
  * steady-state volume of distribution (VDss), 
  * total body clearance (CL), 
  * half-life (t½), 
  * fraction unbound in plasma (fu),
  * mean residence time (MRT)
- predicted errors for values are also produced

## Requirements:
- This notebook will install rdkit and pksmart
- Runs on CPU or your local runtime.

## Set-up
- install and import pksmart

In [1]:
!pip install -q pksmart

In [2]:
import pksmart
import pandas as pd
import numpy as np
from rdkit import Chem
from rdkit.Chem import Draw, AllChem
import glob
import os

## Get the pksmart help screen

In [3]:
!pksmart -h

usage: pksmart [-h] [--smiles SMILES] [--file FILE]

 ███████████  █████   ████  █████████                                       █████
░░███░░░░░███░░███   ███░  ███░░░░░███                                     ░░███
 ░███    ░███ ░███  ███   ░███    ░░░  █████████████    ██████   ████████  ███████
 ░██████████  ░███████    ░░█████████ ░░███░░███░░███  ░░░░░███ ░░███░░███░░░███░
 ░███░░░░░░   ░███ ░░███   ███    ░███ ░███ ░███ ░███   ███████  ░███ ░░░   ░███
 ░███         ░███ ░░███   ███    ░███ ░███ ░███ ░███  ███░░███  ░███       ░███ ███
 █████        █████ ░░████░░█████████  █████░███ █████░░████████ █████      ░░█████
░░░░░        ░░░░░   ░░░░  ░░░░░░░░░  ░░░░░ ░░░ ░░░░░  ░░░░░░░░ ░░░░░        ░░░░░
                                                                                    

Abstract:
Drug exposure is a key contributor to the safety and efficacy of drugs. It can be defined using human pharmacokinetic (PK) parameters that affect the blood concentration profile of a drug, s

## Get PK 
- first read in a source of SMILES strings. In this example, they are saved as a list alled smiles
- then run the loop to predict the values.
- values are returned from the get_props function as a dataframe.
- values for all molecules in the loop are saved to a dataframe called *out* which can be viewed in the following cell.

In [4]:
df = pd.read_csv('/kaggle/input/statin905/905-unique-statins.csv')
smiles = df["Ligand SMILES"].to_list()
print(len(smiles))

905


In [5]:
def get_props(smile):
    !pksmart -s '{smile}' 
    files = glob.glob('*.csv') 
    latest = max(files, key=os.path.getctime)
    df = pd.read_csv(latest)
    return df

### This will generate a lot of output! not that it is set to read only 5 values currently

In [6]:
out = pd.DataFrame()

for smile in smiles[1:6]:
    try:
        smile = smile.replace("[Na+].","")
        if out.empty == True:
            out = get_props(smile)
        else:
            df = get_props(smile)
            out = pd.concat([out, df], ignore_index=True)
    except:
        print('error')

[32m2025-10-07 15:10:26.754[0m | [1mINFO    [0m | [36mpksmart.main[0m:[36mmain[0m:[36m575[0m - [1mLog level set to INFO[0m
[32m2025-10-07 15:10:26.755[0m | [1mINFO    [0m | [36mpksmart.main[0m:[36mpredict_pk_params[0m:[36m410[0m - [1mStarting PK parameter prediction[0m
[32m2025-10-07 15:10:26.757[0m | [1mINFO    [0m | [36mpksmart.main[0m:[36mpredict_pk_params[0m:[36m417[0m - [1mStandardizing SMILES and calculating descriptors[0m
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.92it/s]
[32m2025-10-07 15:10:32.129[0m | [1mINFO    [0m | [36mpksmart.main[0m:[36mpredict_pk_params[0m:[36m431[0m - [1mPredicting animal pharmacokinetic parameters[0m
[32m2025-10-07 15:10:33.630[0m | [1mINFO    [0m | [36mpksmart.main[0m:[36mpredict_pk_params[0m:[36m449[0m - [1mPredicting human pharmacokinetic parameters[0m
[32m2025-10-07 15:10:34.212[0m | [1mINFO    [0m | [36mpksmart.main[0m:[36mmain[0m:[36m585[0m - [1m

In [7]:
out.head()

Unnamed: 0,smiles_r,VDss_L_kg,Volume_of_distribution_(VDss)_folderror,Volume_of_distribution_(VDss)_lowerbound,Volume_of_distribution_(VDss)_upperbound,CL_mL_min_kg,Clearance_(CL)_folderror,Clearance_(CL)_lowerbound,Clearance_(CL)_upperbound,Fraction_unbound_in_plasma_(fup),...,comments,dog_VDss_L_kg,dog_CL_mL_min_kg,dog_fup,monkey_VDss_L_kg,monkey_CL_mL_min_kg,monkey_fup,rat_VDss_L_kg,rat_CL_mL_min_kg,rat_fup
0,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c(c...,0.87491,2.95,0.29658,2.580986,3.981289,5.43,0.733202,21.618401,0.045549,...,,1.469133,7.256471,0.235075,0.775937,8.712392,0.234171,1.91501,15.143788,0.12899
1,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c(c...,1.193819,2.95,0.404684,3.521767,4.033709,5.43,0.742856,21.903038,0.07692,...,,1.205301,6.746506,0.248666,0.583302,7.716065,0.26024,2.143999,18.121,0.246816
2,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c2C...,0.974609,2.87,0.339585,2.797129,3.799489,4.92,0.772254,18.693484,0.055848,...,,1.694422,9.776354,0.193109,1.150591,10.172208,0.239051,1.608044,23.183914,0.151055
3,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c2C...,0.974609,2.87,0.339585,2.797129,3.799489,4.92,0.772254,18.693484,0.055848,...,,1.694422,9.776354,0.193109,1.150591,10.172208,0.239051,1.608044,23.183914,0.151055
4,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c(c...,0.648362,2.92,0.222042,1.893216,2.807339,5.25,0.534731,14.738532,0.067571,...,,1.070188,5.015993,0.268459,0.579078,6.825263,0.217323,1.389224,12.878917,0.217148


## View just Human, Monkey, Rat or Dog values

In [17]:
df_human = out[['smiles_r', 'VDss_L_kg', 'CL_mL_min_kg','Fraction_unbound_in_plasma_(fup)','MRT_hr','thalf_hr']]

df_dog = out[['smiles_r','dog_VDss_L_kg','dog_CL_mL_min_kg', 'dog_fup']]

df_monkey = out[['smiles_r','monkey_VDss_L_kg','monkey_CL_mL_min_kg', 'monkey_fup']]

df_rat = out[['smiles_r','rat_VDss_L_kg','rat_CL_mL_min_kg', 'rat_fup']]

In [18]:
df_human.head()

Unnamed: 0,smiles_r,VDss_L_kg,CL_mL_min_kg,Fraction_unbound_in_plasma_(fup),MRT_hr,thalf_hr
0,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c(c...,0.87491,3.981289,0.045549,4.444865,7.668279
1,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c(c...,1.193819,4.033709,0.07692,5.117584,8.204483
2,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c2C...,0.974609,3.799489,0.055848,5.599033,8.135067
3,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c2C...,0.974609,3.799489,0.055848,5.599033,8.135067
4,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c(c...,0.648362,2.807339,0.067571,3.363455,5.515214


In [19]:
df_rat.head()

Unnamed: 0,smiles_r,rat_VDss_L_kg,rat_CL_mL_min_kg,rat_fup
0,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c(c...,1.91501,15.143788,0.12899
1,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c(c...,2.143999,18.121,0.246816
2,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c2C...,1.608044,23.183914,0.151055
3,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c2C...,1.608044,23.183914,0.151055
4,CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c(c...,1.389224,12.878917,0.217148
