# ADMET Screening and Drug-Likeness Filtering Pipeline

## Overview
This notebook provides an automated workflow for ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) screening and drug-likeness evaluation of small-molecule candidates following virtual screening.

## Screening Criteria
- Lipinski Rule of Five
- Veber rules
- Molecular weight thresholds
- LogP limits
- Hydrogen bond donor/acceptor constraints

## Workflow Steps
1. Load compound dataset
2. Calculate physicochemical descriptors
3. Apply rule-based filtering
4. Rank and export filtered candidates

This workflow is designed for reproducible medicinal chemistry filtering prior to molecular dynamics validation.


# Step 1: Load Data

In [None]:
"""
ADMET Dataset Loading Module

This cell loads computed ADMET descriptors for screened ligands.
"""

import pandas as pd
import os

# Define input file
admet_file = 'admet_descriptors.csv'

if not os.path.exists(admet_file):
    raise FileNotFoundError(f"{admet_file} not found in project directory.")

# Load ADMET results
df = pd.read_csv(admet_file)

print(f"Initial number of compounds: {df.shape[0]}")
df.head()

# **DRUG LIKENESS FILTER**

# Step 1: Apply Lipinski’s Rule of Five

In [None]:
"""
Lipinski Rule of Five Filtering

The Lipinski Rule of Five is applied to evaluate oral drug-likeness.
Compounds with more than one violation are excluded from further analysis.
"""

# Apply Lipinski filtering (allow ≤1 violation)
lipinski_violations_col = 'Lipinski #violations'

dropped_lipinski = df[df[lipinski_violations_col] > 1]
filtered_df = df[df[lipinski_violations_col] <= 1]

print(f"Compounds remaining after Lipinski filtering: {filtered_df.shape[0]}")

if not dropped_lipinski.empty:
    print("Excluded due to excessive Lipinski violations:")
    print(dropped_lipinski['ID'].tolist())
else:
    print("No compounds excluded by Lipinski criteria.")

# Step 2: Apply Veber’s Rule

In [None]:
"""
Veber Rule Filtering

The Veber criteria are applied to assess oral bioavailability.
Compounds exceeding 10 rotatable bonds or TPSA > 140 Å² are excluded.
"""

# Apply Veber filtering to Lipinski-filtered compounds
dropped_veber = filtered_df[
    (filtered_df['#Rotatable bonds'] > 10) |
    (filtered_df['TPSA'] > 140)
]

filtered_df = filtered_df[
    (filtered_df['#Rotatable bonds'] <= 10) &
    (filtered_df['TPSA'] <= 140)
]

print(f"Compounds remaining after Lipinski + Veber filtering: {filtered_df.shape[0]}")

if not dropped_veber.empty:
    print("Excluded due to Veber criteria:")
    print(dropped_veber['ID'].tolist())
else:
    print("No compounds excluded by Veber criteria.")

# Step 3: Filter for Bioavailability

In [None]:
"""
Bioavailability Filtering

A bioavailability score threshold (≥ 0.55) is applied to prioritise
compounds with a higher probability of adequate oral absorption.
"""

# Apply bioavailability filtering
dropped_bio = filtered_df[filtered_df['Bioavailability Score'] < 0.55]

filtered_df = filtered_df[
    filtered_df['Bioavailability Score'] >= 0.55
]

print(f"Compounds remaining after Lipinski + Veber + Bioavailability filtering: {filtered_df.shape[0]}")

if not dropped_bio.empty:
    print("Excluded due to low predicted bioavailability:")
    print(dropped_bio['ID'].tolist())
else:
    print("No compounds excluded by bioavailability criteria.")

# Step 4: Synthetic Accessibility

In [None]:
"""
Synthetic Accessibility Filtering

Compounds with a synthetic accessibility score > 6 are excluded
to prioritise molecules with practical feasibility for laboratory synthesis.
"""

# Apply synthetic accessibility filtering
dropped_syn = filtered_df[
    filtered_df['Synthetic Accessibility'] > 6
]

filtered_df = filtered_df[
    filtered_df['Synthetic Accessibility'] <= 6
]

print(f"Compounds remaining after full ADMET filtering cascade: {filtered_df.shape[0]}")

if not dropped_syn.empty:
    print("Excluded due to poor synthetic accessibility:")
    print(dropped_syn['ID'].tolist())
else:
    print("No compounds excluded by synthetic accessibility criteria.")

In [None]:
print("\nFinal shortlisted compounds:")
print(filtered_df['ID'].tolist())

# **ABSORPTION FILTER**

# Step 1: Water Solubility

In [None]:
# Keep 'Very soluble', 'Soluble', and 'Moderately soluble'
dropped_soluble = kept_syn[~kept_syn['ESOL Class'].isin(['Very soluble', 'Soluble', 'Moderately soluble'])]
kept_soluble = kept_syn[kept_syn['ESOL Class'].isin(['Very soluble', 'Soluble', 'Moderately soluble'])]

print(f"Compounds with Very/Moderately soluble or Soluble ESOL Class: {kept_soluble.shape[0]}")

if not dropped_soluble.empty:
    print("Compounds dropped due to ESOL Class filter:")
    print(dropped_soluble['ID'].tolist())
else:
    print("No compounds dropped due to ESOL Class filter.")

In [None]:
"""
Final shortlisted compounds after sequential ADMET prioritisation.
"""

print("\nFinal shortlisted compounds:")
print(filtered_df['ID'].tolist())

# Step 2: Filter by GI Absorption

Gastrointestinal (GI) Absorption Filtering

Only compounds predicted to exhibit high gastrointestinal absorption are retained to prioritise candidates with favourable oral exposure.

In [None]:
# Apply GI absorption filtering
dropped_gi = filtered_df[
    filtered_df['GI absorption'] != 'High'
]

filtered_df = filtered_df[
    filtered_df['GI absorption'] == 'High'
]

print(f"Compounds remaining after ADMET + solubility + GI filtering: {filtered_df.shape[0]}")

if not dropped_gi.empty:
    print("Excluded due to low predicted GI absorption:")
    print(dropped_gi['ID'].tolist())
else:
    print("No compounds excluded by GI absorption criteria.")

# **DISRIBUTION FILTER**

In [None]:
"""
Blood–Brain Barrier (BBB) Permeability Filtering

This step should be applied depending on the therapeutic target profile.
For central nervous system (CNS) indications, BBB-permeant compounds are prioritised.
For peripheral targets, non-permeant compounds may be preferred.
"""

**Option A — Retain BBB-Permeant Compounds (CNS Targets)**

In [None]:
# Retain compounds predicted to cross the BBB (CNS-targeted studies)

dropped_bbb = filtered_df[
    filtered_df['BBB permeant'] != 'Yes'
]

filtered_df = filtered_df[
    filtered_df['BBB permeant'] == 'Yes'
]

print(f"Compounds remaining after BBB-permeability filtering (CNS focus): {filtered_df.shape[0]}")

if not dropped_bbb.empty:
    print("Excluded due to lack of predicted BBB permeability:")
    print(dropped_bbb['ID'].tolist())
else:
    print("No compounds excluded by BBB permeability criteria.")

**Option B — Exclude BBB-Permeant Compounds (Peripheral Targets)**

In [None]:
# Retain compounds predicted NOT to cross the BBB (peripheral-targeted studies)

dropped_bbb = filtered_df[
    filtered_df['BBB permeant'] == 'Yes'
]

filtered_df = filtered_df[
    filtered_df['BBB permeant'] == 'No'
]

print(f"Compounds remaining after BBB exclusion (non-CNS focus): {filtered_df.shape[0]}")

if not dropped_bbb.empty:
    print("Excluded due to predicted BBB permeability:")
    print(dropped_bbb['ID'].tolist())
else:
    print("No compounds excluded by BBB permeability criteria.")

# Step 2: Remove Pgp Substrates
Being a P-gp substrate can limit CNS penetration or oral bioavailability.

In [None]:
# Remove Pgp substrates
dropped_pg = kept_bbb[kept_bbb['Pgp substrate'] != 'No']
kept_pg = kept_bbb[kept_bbb['Pgp substrate'] == 'No']

print(f"After removing Pgp substrates: {kept_pg.shape[0]} compounds left")

if not dropped_pg.empty:
    print("Compounds dropped due to Pgp substrate filter:")
    print(dropped_pg['ID'].tolist())
else:
    print("No compounds dropped due to Pgp substrate filter.")

# **METABOLISM FILTER**

# Option 1: Remove CYP Inhibitors

In [None]:
"""
Cytochrome P450 (CYP) Inhibition Filtering

Compounds predicted to inhibit major CYP isoforms are excluded
to reduce the risk of metabolic drug–drug interactions and
unfavourable pharmacokinetic liabilities.
"""

# Major CYP isoforms commonly evaluated in early drug discovery
cyp_columns = [
    'CYP1A2 inhibitor',
    'CYP2C19 inhibitor',
    'CYP2C9 inhibitor',
    'CYP2D6 inhibitor',
    'CYP3A4 inhibitor'
]

# Identify compounds predicted to inhibit any CYP isoform
dropped_cyp = filtered_df[
    (filtered_df[cyp_columns] == 'Yes').any(axis=1)
]

# Retain only non-inhibitors across all listed CYPs
filtered_df = filtered_df[
    (filtered_df[cyp_columns] == 'No').all(axis=1)
]

print(f"Compounds remaining after CYP inhibition filtering: {filtered_df.shape[0]}")

if not dropped_cyp.empty:
    print("Excluded due to predicted CYP inhibition:")
    print(dropped_cyp['ID'].tolist())
else:
    print("No compounds excluded by CYP inhibition criteria.")

# Option 2: Allow at Most 1 CYP Inhibition per Compound

In [None]:
"""
Cytochrome P450 (CYP) Inhibition Filtering – Relaxed Strategy

To balance developability and chemical diversity, compounds predicted
to inhibit more than one major CYP isoform are excluded.
Compounds with ≤1 predicted CYP inhibition are retained.
"""

# Major CYP isoforms evaluated
cyp_columns = [
    'CYP1A2 inhibitor',
    'CYP2C19 inhibitor',
    'CYP2C9 inhibitor',
    'CYP2D6 inhibitor',
    'CYP3A4 inhibitor'
]

# Count predicted CYP inhibitions per compound
cyp_inhibition_count = (filtered_df[cyp_columns] == 'Yes').sum(axis=1)

# Identify compounds exceeding tolerance threshold
dropped_cyp = filtered_df[cyp_inhibition_count > 1]

# Retain compounds with ≤1 CYP inhibition
filtered_df = filtered_df[cyp_inhibition_count <= 1]

print(f"Compounds remaining after relaxed CYP filtering (≤1 inhibition): {filtered_df.shape[0]}")

if not dropped_cyp.empty:
    print("Excluded due to multiple predicted CYP inhibitions:")
    print(dropped_cyp['ID'].tolist())
else:
    print("No compounds excluded under relaxed CYP criteria.")

# Option 3: Relaxed CYP Inhibitors

Cytochrome P450 (CYP) Inhibition Filtering – Selective Relaxed Strategy

To balance metabolic risk and chemical diversity:

• Compounds with no predicted CYP inhibition are retained.

• Compounds with at most one CYP inhibition are retained only if CYP3A4 is not inhibited.

• Compounds predicted to inhibit CYP3A4 are excluded due to its central role in drug metabolism, unless chemical space is critically limited.

In [None]:
# Major CYP isoforms evaluated
cyp_columns = [
    'CYP1A2 inhibitor',
    'CYP2C19 inhibitor',
    'CYP2C9 inhibitor',
    'CYP2D6 inhibitor',
    'CYP3A4 inhibitor'
]

# Count total CYP inhibitions
cyp_inhibition_count = (filtered_df[cyp_columns] == 'Yes').sum(axis=1)

# Define selective relaxed criteria:
# Keep compounds with:
# 1) Zero CYP inhibition
# OR
# 2) ≤1 inhibition AND NOT CYP3A4 inhibitor

criteria = (
    (cyp_inhibition_count == 0) |
    (
        (cyp_inhibition_count <= 1) &
        (filtered_df['CYP3A4 inhibitor'] == 'No')
    )
)

dropped_cyp = filtered_df[~criteria]
filtered_df = filtered_df[criteria]

print(f"Compounds remaining after selective CYP filtering: {filtered_df.shape[0]}")

if not dropped_cyp.empty:
    print("Excluded due to CYP inhibition risk:")
    print(dropped_cyp['ID'].tolist())
else:
    print("No compounds excluded under selective CYP criteria.")

# Step 10: Final Dataset Export
The fully prioritised compound set is exported for downstream structure-based virtual screening and docking studies.

In [None]:
# Define output filename
final_output_file = "final_prioritised_ligands.csv"

# Export final filtered dataset
filtered_df.to_csv(final_output_file, index=False)

print(f"Final selected compounds saved to '{final_output_file}'")