# Sigma-1 Receptor (SIGMAR1) Drug Repurposing Pipeline

**Target:** Human Sigma-1 Receptor | **ChEMBL ID:** CHEMBL287  
**Goal:** Identify FDA-approved drugs (Phase 4) that bind SIGMAR1 and select the top 5 repurposing candidates for molecular docking  

---

### Background
Sigma-1 Receptor (SIGMAR1) is an intracellular chaperone protein located at the endoplasmic reticulum-mitochondria interface. It plays a key role in neuroprotection and neuroplasticity, neuroinflammation modulation, and cellular stress response (ER stress). It is a validated and emerging target for neurological and psychiatric disorders, making it an excellent candidate for drug repurposing.

### Why Repurposing?
Repurposing FDA-approved (Phase 4) drugs bypasses Phase 1 safety trials since human safety, tolerability, pharmacokinetics and bioavailability are already established. This dramatically reduces time and cost compared to de novo drug development.

## Step 0 — Import Libraries and Configure ChEMBL Client

In [7]:
from chembl_webresource_client.new_client import new_client
from chembl_webresource_client.settings import Settings
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

Settings.Instance().CACHING = False

activity  = new_client.activity
molecule  = new_client.molecule
mechanism = new_client.mechanism

print('Libraries loaded successfully.')

Libraries loaded successfully.


## Step 1 — Confirm the Target

We work exclusively with **Human Sigma-1 Receptor (CHEMBL287)**. We use only CHEMBL287 (human) because our repurposing goal is human therapeutic application, and docking will be performed on the human sigma-1R crystal structure (PDB: 6DK1).

In [8]:
TARGET_ID = 'CHEMBL287'

print('=' * 60)
print('TARGET  : Human Sigma Non-Opioid Intracellular Receptor 1')
print(f'ChEMBL  : {TARGET_ID}')
print('PDB     : 6DK1 (crystal structure with pentazocine bound)')
print('Gene    : SIGMAR1')
print('Organism: Homo sapiens')
print('=' * 60)

TARGET  : Human Sigma Non-Opioid Intracellular Receptor 1
ChEMBL  : CHEMBL287
PDB     : 6DK1 (crystal structure with pentazocine bound)
Gene    : SIGMAR1
Organism: Homo sapiens


## Step 2 — Retrieve Quantitative Binding Activity Records

We filter specifically for:
- **`standard_type` = Ki or IC50** — direct binding affinity measurements only, not functional or ADMET assays
- **`pchembl_value` not null** — only records with a computed pChEMBL value, ensuring every record is quantitatively comparable

**Why not use the mechanism table as a filter?**  
The ChEMBL mechanism table stores only a drug's PRIMARY approved mechanism. Drugs like fluvoxamine (SSRI), donepezil (AChE inhibitor), dextromethorphan (antitussive), and pentazocine (opioid analgesic) all have documented SIGMAR1 binding in peer-reviewed literature but are absent from the mechanism table because SIGMAR1 is not their primary approved target. Using the mechanism table as a filter would silently discard exactly the repurposing candidates we are looking for — causing a 90% data loss.

In [None]:
print('Fetching quantitative binding activity records from ChEMBL...')
print('Filters: standard_type = Ki or IC50, pChEMBL value present\n')

activities = activity.filter(
    target_chembl_id      = TARGET_ID,
    standard_type__in     = ['Ki', 'IC50'],
    pchembl_value__isnull = False
).only(
    'molecule_chembl_id',
    'molecule_pref_name',
    'parent_molecule_chembl_id',
    'standard_type',
    'standard_value',
    'standard_units',
    'pchembl_value'
)

activities_df = pd.DataFrame(activities)
activities_df['pchembl_value'] = pd.to_numeric(activities_df['pchembl_value'], errors='coerce')
activities_df.to_csv('claude_activities3.csv', index=False)

print(f'Total activity records retrieved  : {len(activities_df)}')
print(f'Unique molecules with binding data: {activities_df["parent_molecule_chembl_id"].nunique()}')
print('claude_activities3.csv saved.')
activities_df.head()

Fetching quantitative binding activity records from ChEMBL...
Filters: standard_type = Ki or IC50, pChEMBL value present



## Step 3 — Aggregate pChEMBL Values Per Molecule

A single molecule may have multiple binding records from different labs and publications. We compute three statistics per molecule:

- **`mean_pchembl`** — average binding affinity across all assays. Primary ranking metric.
- **`best_pchembl`** — highest pChEMBL ever recorded. Shows the binding ceiling.
- **`assay_count`** — number of independent measurements. The reliability indicator: high count = reproducible binding.

**Note on the syntax:** Inside `.agg()`, writing `('pchembl_value', 'mean')` is identical to calling `.mean()` — the string `'mean'` is how pandas receives the instruction. The name on the left (e.g. `mean_pchembl`) becomes the new column name.

In [None]:
# dropna() removes blank parent IDs, unique() removes duplicates,
# tolist() converts to plain Python list which the ChEMBL API requires
parent_ids = activities_df['parent_molecule_chembl_id'].dropna().unique().tolist()

agg_df = (
    activities_df
    .groupby('parent_molecule_chembl_id')
    .agg(
        mean_pchembl = ('pchembl_value', 'mean'),
        best_pchembl = ('pchembl_value', 'max'),
        assay_count  = ('pchembl_value', 'count')
    )
    .reset_index()
    .round({'mean_pchembl': 2, 'best_pchembl': 2})
)

print(f'Unique parent molecules after aggregation: {len(agg_df)}')
print(f'\nTop 10 by mean pChEMBL:')
agg_df.sort_values('mean_pchembl', ascending=False).head(10)

## Step 4 — Fetch Molecule Metadata (Phase, Name)

We retrieve `max_phase` and `pref_name` for each parent molecule. `max_phase = 4` means FDA-approved. We batch in groups of 100 to avoid ChEMBL API timeout with large ID lists.

In [None]:
print(f'Fetching molecule metadata for {len(parent_ids)} unique parent molecules...')

all_mols   = []
batch_size = 100

for i in range(0, len(parent_ids), batch_size):
    batch  = parent_ids[i : i + batch_size]
    result = molecule.filter(
        molecule_chembl_id__in = batch
    ).only('molecule_chembl_id', 'max_phase', 'pref_name')
    all_mols.extend(list(result))
    print(f'  Batch {i//batch_size + 1} done — total fetched: {len(all_mols)}')

mol_df = pd.DataFrame(all_mols)
mol_df['max_phase'] = pd.to_numeric(mol_df['max_phase'], errors='coerce')
mol_df.to_csv('claude_molecule3.csv', index=False)

print(f'\nTotal molecule records : {len(mol_df)}')
print(f'Phase 4 drugs          : {(mol_df["max_phase"] == 4).sum()}')
print('claude_molecule3.csv saved.')

## Step 5 — Fetch Mechanism Labels (Action Type)

The mechanism table is used **only as a label here, not as a filter gate**. We LEFT JOIN it onto our molecule list — every molecule survives. Drugs not in the mechanism table simply get `NaN` for action_type and are dropped later at the filtering step. This is fundamentally different from the original pipeline where mechanism was the filter itself, causing 90% data loss.

In [None]:
print('Fetching mechanism labels...')

all_mec = []

for i in range(0, len(parent_ids), batch_size):
    batch  = parent_ids[i : i + batch_size]
    result = mechanism.filter(
        molecule_chembl_id__in = batch
    ).only('molecule_chembl_id', 'action_type', 'parent_molecule_chembl_id')
    all_mec.extend(list(result))

mec_df = pd.DataFrame(all_mec) if all_mec else pd.DataFrame(
    columns=['molecule_chembl_id', 'action_type', 'parent_molecule_chembl_id']
)

# One action_type per parent molecule
mec_clean = (
    mec_df[['parent_molecule_chembl_id', 'action_type']]
    .drop_duplicates(subset='parent_molecule_chembl_id')
)

mec_df.to_csv('claude_mechanism3.csv', index=False)

print(f'Mechanism records fetched: {len(mec_df)}')
print('\nAction types present:')
print(mec_df['action_type'].value_counts())
print('\nclaude_mechanism3.csv saved.')

## Step 6 — Merge Everything Into One Master Dataframe

We merge three sources of information using **LEFT JOIN** — every molecule from `agg_df` is kept regardless of whether it has a mechanism entry or molecule metadata. No molecule is silently dropped at this stage.

In [None]:
merged_df = (
    agg_df
    .merge(
        mol_df.rename(columns={'molecule_chembl_id': 'parent_molecule_chembl_id'}),
        on='parent_molecule_chembl_id', how='left'
    )
    .merge(
        mec_clean,
        on='parent_molecule_chembl_id', how='left'
    )
)

merged_df['max_phase']   = pd.to_numeric(merged_df['max_phase'], errors='coerce')
merged_df['action_type'] = merged_df['action_type'].str.upper().str.strip()

print(f'Merged dataframe shape: {merged_df.shape}')
print(f'\nAction type distribution (top 10):')
print(merged_df['action_type'].value_counts(dropna=False).head(10))

## Step 7 — Filter to Phase 4 Agonists and Modulators

We keep only drugs that are Phase 4 approved AND have an agonist or modulator action type. Agonists and modulators activate or positively modulate SIGMAR1, which is the desired mechanism for neuroprotection. Antagonists and inhibitors block receptor function and are excluded from repurposing consideration.

In [None]:
AGONIST_TYPES = ['AGONIST', 'MODULATOR', 'POSITIVE ALLOSTERIC MODULATOR', 'PARTIAL AGONIST']

pipeline_df = (
    merged_df[
        (merged_df['max_phase']   == 4) &
        (merged_df['action_type'].isin(AGONIST_TYPES))
    ]
    .copy()
    .sort_values('mean_pchembl', ascending=False)
    .reset_index(drop=True)
)

pipeline_df['selection_basis'] = 'Identified via ChEMBL quantitative binding pipeline (Ki/IC50, Phase 4, agonist/modulator filter)'
pipeline_df['key_reference']   = 'ChEMBL Database (CHEMBL287 activity records)'
pipeline_df['source']          = 'Pipeline Derived'

print(f'Phase 4 agonists/modulators from pipeline: {len(pipeline_df)}')
pipeline_df[['pref_name', 'mean_pchembl', 'best_pchembl', 'assay_count', 'action_type']]

## Step 8 — Knowledge-Guided Candidates

The following four drugs are confirmed SIGMAR1 binders in activities3.csv but are absent from the mechanism table under SIGMAR1 because their primary approved mechanism is a different target. They are manually added here with published literature evidence. Their pChEMBL values are pulled directly from the activities data for quantitative consistency.

| Drug | Primary Mechanism | SIGMAR1 Evidence |
|---|---|---|
| Pentazocine | Opioid receptor agonist | Prototypical sigma-1R reference agonist; crystal structure PDB 6DK1 |
| Dextromethorphan | NMDA antagonist | Sigma-1R agonist (Ki=142-652 nM); effect proven sigma-1R specific |
| Donepezil | AChE inhibitor | IC50=14.6 nM; 93% PET occupancy at therapeutic doses |
| Fluvoxamine | SSRI | Highest sigma-1R affinity among all SSRIs (Ki=36 nM); human PET confirmed |

In [None]:
knowledge_guided = pd.DataFrame([
    {
        'parent_molecule_chembl_id': 'CHEMBL60542',
        'pref_name'                : 'PENTAZOCINE',
        'max_phase'                : 4,
        'action_type'              : 'AGONIST',
        'source'                   : 'Knowledge Guided',
        'selection_basis'          : 'Prototypical SIGMAR1 reference agonist. Standard radioligand assay across all sigma-1R pharmacology is the [3H](+)-pentazocine displacement assay (Ki ~7 nM). Human sigma-1R crystal structure solved with pentazocine bound (PDB: 6DK1).',
        'key_reference'            : 'Huang et al., 2017, Cell; Schmidt et al., 2016, Nature'
    },
    {
        'parent_molecule_chembl_id': 'CHEMBL52440',
        'pref_name'                : 'DEXTROMETHORPHAN',
        'max_phase'                : 4,
        'action_type'              : 'AGONIST',
        'source'                   : 'Knowledge Guided',
        'selection_basis'          : 'Documented SIGMAR1 agonist (Ki=142-652 nM). Neuroprotective effects confirmed as SIGMAR1-specific: other NMDA antagonists with equivalent NMDA affinity failed to replicate its effect on levodopa-induced dyskinesia, proving the mechanism is SIGMAR1-mediated.',
        'key_reference'            : 'Nguyen et al., 2014, Trends in Pharmacological Sciences'
    },
    {
        'parent_molecule_chembl_id': 'CHEMBL502',
        'pref_name'                : 'DONEPEZIL',
        'max_phase'                : 4,
        'action_type'              : 'AGONIST',
        'source'                   : 'Knowledge Guided',
        'selection_basis'          : 'Binds SIGMAR1 with IC50=14.6 nM. PET imaging confirmed 93% receptor occupancy at therapeutic doses. Blocking SIGMAR1 abolishes donepezil neuroprotection while leaving cholinergic activity intact — proving SIGMAR1 contribution is mechanistically independent.',
        'key_reference'            : 'Meunier et al., 2006, British Journal of Pharmacology'
    },
    {
        'parent_molecule_chembl_id': 'CHEMBL814',
        'pref_name'                : 'FLUVOXAMINE',
        'max_phase'                : 4,
        'action_type'              : 'AGONIST',
        'source'                   : 'Knowledge Guided',
        'selection_basis'          : 'Highest SIGMAR1 binding affinity among ALL approved SSRIs (Ki=17-36 nM vs paroxetine Ki=1893-2041 nM — 100-fold difference). Human PET study confirmed dose-dependent SIGMAR1 occupancy in all brain regions at therapeutic doses. Clinical trial conducted specifically targeting SIGMAR1 (TOGETHER trial, 2021).',
        'key_reference'            : 'Hashimoto et al., 2009, Biological Psychiatry; TOGETHER trial 2021'
    },
])

# Pull pChEMBL values from activities data
pchembl_lookup = agg_df.set_index('parent_molecule_chembl_id')[['mean_pchembl', 'best_pchembl', 'assay_count']]

knowledge_guided = knowledge_guided.merge(
    pchembl_lookup, on='parent_molecule_chembl_id', how='left'
)

print('Knowledge-guided candidates with ChEMBL binding confirmation:')
knowledge_guided[['pref_name', 'mean_pchembl', 'best_pchembl', 'assay_count', 'action_type']]

## Step 9 — Merge Pipeline and Knowledge-Guided Into Master List

We combine both groups using `pd.concat` — this stacks rows together when both dataframes share the same columns. Knowledge-guided candidates go first so that if any drug appears in both groups, the knowledge-guided version (with richer literature information) is retained via `drop_duplicates(keep='first')`.

In [None]:
shared_cols = [
    'parent_molecule_chembl_id', 'pref_name', 'max_phase', 'action_type',
    'mean_pchembl', 'best_pchembl', 'assay_count',
    'selection_basis', 'key_reference', 'source'
]

master_df = pd.concat(
    [knowledge_guided[shared_cols], pipeline_df[shared_cols]],
    ignore_index = True
)

master_df = (
    master_df
    .drop_duplicates(subset='parent_molecule_chembl_id', keep='first')
    .sort_values('mean_pchembl', ascending=False)
    .reset_index(drop=True)
)

master_df.to_csv('claude_master_candidates.csv', index=False)

print(f'claude_Master candidate list: {len(master_df)} drugs')
print('claude_master_candidates.csv saved.\n')
master_df[['pref_name', 'mean_pchembl', 'best_pchembl', 'assay_count', 'action_type', 'source']]

## Step 10 — Apply Elimination Criteria to Select Final 5 Docking Candidates

Three elimination rules are applied:

**Rule 1 — pChEMBL threshold (≥ 6)**  
pChEMBL < 6 means binding weaker than 1 µM — below the medicinal chemistry threshold for biologically meaningful binding.

**Rule 2 — Literature contradiction**  
Drugs where published literature directly contradicts the ChEMBL data. Fentanyl's published sigma-1R IC50 is ~5000 nM (pChEMBL ~5.3), far below its apparent ChEMBL value which reflects analogue assay data. Additionally, sigma-1R has been shown to *inhibit* mu-opioid receptor analgesia — making fentanyl conceptually unsuitable.

**Rule 3 — Single assay, no mechanistic support**  
Drugs with only one assay and no published mechanistic evidence that sigma-1R engagement produces a real biological effect.

**Pentazocine is reserved as the docking reference standard — not counted among the 5.**

In [None]:
EXCLUDED = {
    'FENTANYL'      : 'Rule 2 — literature IC50 ~5000 nM contradicts ChEMBL data; sigma-1R inhibits its opioid mechanism',
    'BREXPIPRAZOLE' : 'Rule 3 — single assay; published review lists it among antipsychotics WITHOUT sigma-1R affinity',
    'TESTOSTERONE'  : 'Rule 1 & 3 — pChEMBL 5.92 borderline; single assay; no mechanistic evidence',
    'LASMIDITAN'    : 'Rule 1 & 3 — pChEMBL 5.5 below threshold; no sigma-1R mechanistic basis',
    'RAMELTEON'     : 'Rule 1 & 3 — pChEMBL 5.05 far below threshold; nonspecific screening hit',
}
STANDARD = ['PENTAZOCINE', '(+)-PENTAZOCINE']

print('Elimination log:')
for drug, reason in EXCLUDED.items():
    print(f'  EXCLUDED — {drug}: {reason}')
print(f'  STANDARD — PENTAZOCINE: Reserved as docking reference (PDB: 6DK1)\n')

final_5 = (
    master_df[
        (master_df['mean_pchembl'] >= 6) &
        (~master_df['pref_name'].isin(list(EXCLUDED.keys()))) &
        (~master_df['pref_name'].isin(STANDARD))
    ]
    .sort_values('mean_pchembl', ascending=False)
    .head(5)
    .reset_index(drop=True)
)

final_5.to_csv('claude_final_5_candidates.csv', index=False)
print(f'claude_Final 5 docking candidates saved to final_5_candidates.csv')
final_5[['pref_name', 'mean_pchembl', 'best_pchembl', 'assay_count', 'action_type', 'source']]

## Step 11 — Final Summary and Docking Protocol Overview

### Docking Protocol (to follow after this pipeline)

1. **Download structure:** PDB ID 6DK1 (human sigma-1R crystallized with pentazocine)
2. **Protein preparation:** Remove pentazocine, water molecules, and heteroatoms. Add hydrogen atoms. Assign charges.
3. **Protocol validation (critical):** Redock pentazocine back into the prepared structure. Calculate RMSD between redocked pose and original crystal pose. **RMSD must be ≤ 2.0 Å** — this validates the entire docking protocol.
4. **Define binding site:** Grid/box centered on the pentazocine binding pocket from the crystal structure.
5. **Dock all 5 candidates** using the same validated settings.
6. **Compare scores against pentazocine baseline** — candidates scoring equal to or better are the strongest repurposing hits.

In [None]:
print('=' * 65)
print('SIGMA-1R REPURPOSING PIPELINE — COMPLETE SUMMARY')
print('=' * 65)
print(f'  Total activity records (Ki/IC50) retrieved : {len(activities_df)}')
print(f'  Unique parent molecules with binding data  : {len(agg_df)}')
print(f'  Phase 4 approved molecules                 : {(merged_df["max_phase"] == 4).sum()}')
print(f'  Phase 4 agonists/modulators (pipeline)     : {len(pipeline_df)}')
print(f'  Knowledge-guided candidates added          : {len(knowledge_guided)}')
print(f'  Final docking candidates selected          : {len(final_5)}')
print(f'  Docking reference standard                 : PENTAZOCINE (PDB: 6DK1)')
print('=' * 65)
print('\nFINAL 5 CANDIDATES:')
for i, row in final_5.iterrows():
    print(f'  {i+1}. {str(row["pref_name"]):20s} | pChEMBL: {row["mean_pchembl"]} | {row["source"]}')
print('\nDOCKING STANDARD:')
print('     PENTAZOCINE          | Prototypical sigma-1R agonist | PDB: 6DK1')
print('=' * 65)