# Sequence exploration for the protein-ligand complex

This notebook isolates the bioinformatics checks (sequence, UniProt, PDB similarity) so the main introduction notebook can stay focused on visualization and parameter creation.


## Table of contents

- [Data sources](#data-sources)
- [Notebook settings](#notebook-settings)
- [Sequence parsing](#sequence-parsing)
- [UniProt BLAST](#uniprot-blast)
- [UniProt entry details](#uniprot-entry-details)
- [PDB BLAST](#pdb-blast)
- [Explore BindingDB](#explore-bindingdb)
- [BindingDB ligand dataframe](#bindingdb-ligand-dataframe)
- [Ligand clustering](#ligand-clustering)
- [Explore ChEMBL](#explore-chembl)
- [Explore PDBe-KB](#explore-pdbe-kb)
- [DrugBank and Drugs@FDA](#drugbank-and-drugsfda)


## Table of contents

- [Sequence exploration for the protein-ligand complex](#sequence-exploration-for-the-protein-ligand-complex)
- [Data sources](#data-sources)
- [Sequence parsing](#sequence-parsing)
- [UniProt BLAST](#uniprot-blast)
- [UniProt entry details](#uniprot-entry-details)
- [PDB BLAST](#pdb-blast)
- [Explore BindingDB](#explore-bindingdb)
- [BindingDB ligand dataframe](#bindingdb-ligand-dataframe)
- [Ligand clustering](#ligand-clustering)
- [Explore ChEMBL](#explore-chembl)
- [Explore PDBe-KB](#explore-pdbe-kb)
- [DrugBank and Drugs@FDA](#drugbank-and-drugsfda)


## Data sources

We reuse the protein and ligand files from `data/complex/`. The molecule parsing and sequence queries below demonstrate how to go from a PDB to UniProt accessions and PDB codes.


## Notebook settings

Define limits and environment-wide constants here so every downstream cell can reuse them.


In [1]:
# how many results to return from each database

UNIPROT_LIMIT = 20
PDB_LIMIT = 20


## Sequence parsing

Extract the protein sequence from the input PDB and keep it in `SEQ` for downstream queries.


In [2]:
import os
from pathlib import Path
from Bio.PDB import PDBParser

COURSE_DIR = Path(os.environ.get("COURSE_DIR", str(Path.home() / "Concepcion26"))).expanduser()
PROTEIN_PDB = COURSE_DIR / "data" / "complex" / "protein.pdb"

parser = PDBParser(QUIET=True)
structure = parser.get_structure("protein", str(PROTEIN_PDB))
three_to_one = {
    "ALA": "A", "ARG": "R", "ASN": "N", "ASP": "D", "CYS": "C",
    "GLN": "Q", "GLU": "E", "GLY": "G", "HIS": "H", "ILE": "I",
    "LEU": "L", "LYS": "K", "MET": "M", "PHE": "F", "PRO": "P",
    "SER": "S", "THR": "T", "TRP": "W", "TYR": "Y", "VAL": "V",
    "HID": "H", "HIE": "H", "HIP": "H",
}

sequences = {}
for model in structure:
    for chain in model:
        seq = []
        for residue in chain:
            if residue.id[0] != " ":
                continue
            seq.append(three_to_one.get(residue.resname, "X"))
        if seq:
            sequences[chain.id] = "".join(seq)
    break

print("Protein PDB:", PROTEIN_PDB)
if sequences:
    for chain_id, seq in sequences.items():
        print(f"Chain {chain_id}: {len(seq)} residues")
        print(seq)
else:
    print("No chains parsed from the structure.")

SEQ = next(iter(sequences.values()), "")


Protein PDB: /home/jordivilla/Concepcion26/data/complex/protein.pdb
Chain A: 304 residues
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVT


## UniProt BLAST

Get some information from the pdbfile. Run BLASTP against UniProt (via SwissProt) to collect candidate accessions.


In [3]:
from Bio.Blast import NCBIWWW, NCBIXML

sequence = globals().get("SEQ", "")

def run_uniprot_blast(seq, max_hits=UNIPROT_LIMIT):
    trimmed = seq if len(seq) <= 500 else seq[:500]
    if not trimmed:
        return []
    try:
        handle = NCBIWWW.qblast("blastp", "swissprot", trimmed, hitlist_size=max_hits, format_type="XML")
    except Exception as exc:
        print("UniProt BLAST request failed:", exc)
        return []
    try:
        record = NCBIXML.read(handle)
    except Exception as exc:
        print("Could not parse UniProt BLAST output:", exc)
        return []
    hits = []
    for alignment in record.alignments:
        desc = alignment.hit_def
        identity = alignment.hsps[0].identities if alignment.hsps else 0
        accessions = []
        for token in desc.split():
            if token.count("|") >= 2:
                accessions.append(token.split("|")[1])
        hits.append((alignment.accession, accessions, identity))
    return hits

if not sequence:
    print("Sequence missing; rerun the parsing cell.")
else:
    UNIPROT_HITS = run_uniprot_blast(sequence)
    globals()["UNIPROT_HITS"] = UNIPROT_HITS
    globals()["TOP_UNIPROT_ACCESSION"] = UNIPROT_HITS[0][0] if UNIPROT_HITS else None
    if UNIPROT_HITS:
        print("Top UniProt hits (primary accession / parsed ids / identities):")
        for accession, parsed, identity in UNIPROT_HITS:
            print(f"  {accession} | {parsed or ['(no parsed ids)']} | identities={identity}")
    else:
        print("No UniProt hits returned.")


Top UniProt hits (primary accession / parsed ids / identities):
  P0DTC1 | ['(no parsed ids)'] | identities=304
  P0DTD1 | ['(no parsed ids)'] | identities=304
  P0C6F5 | ['(no parsed ids)'] | identities=292
  P0C6U8 | ['(no parsed ids)'] | identities=292
  P0C6V9 | ['(no parsed ids)'] | identities=292
  P0C6T7 | ['(no parsed ids)'] | identities=291
  P0C6X7 | ['(no parsed ids)'] | identities=292
  P0C6F8 | ['(no parsed ids)'] | identities=290
  P0C6W2 | ['(no parsed ids)'] | identities=290
  P0C6W6 | ['(no parsed ids)'] | identities=291
  P0C6T6 | ['(no parsed ids)'] | identities=158
  P0C6W5 | ['(no parsed ids)'] | identities=158
  K9N638 | ['(no parsed ids)'] | identities=156
  K9N7C7 | ['(no parsed ids)'] | identities=156
  P0C6T5 | ['(no parsed ids)'] | identities=154
  P0C6W4 | ['(no parsed ids)'] | identities=154
  P0C6U9 | ['(no parsed ids)'] | identities=154
  P0C6X8 | ['(no parsed ids)'] | identities=154
  P0C6T4 | ['(no parsed ids)'] | identities=153
  P0C6F7 | ['(no parsed 

## UniProt entry details

Once the BLAST above has identified UniProt accessions, this cell fetches entry summaries via the [REST API](https://www.uniprot.org/help/api_queries).


In [4]:
import requests

hits = globals().get("UNIPROT_HITS") or []

def fetch_entry(accession):
    url = f"https://rest.uniprot.org/uniprotkb/{accession}.json"
    resp = requests.get(url, params={"format": "json"}, timeout=15)
    resp.raise_for_status()
    return resp.json()

def summarize(entry):
    name = entry.get("proteinDescription", {}).get("recommendedName", {}).get("fullName", {}).get("value")
    if not name:
        name = entry.get("proteinName", {}).get("value") or entry.get("entryType", "<entry>")
    organism = entry.get("organism", {}).get("scientificName")
    length = entry.get("sequence", {}).get("length") or entry.get("length")
    function = next((c.get("texts", [])[0].get("value") for c in entry.get("comments", []) if c.get("commentType", c.get("type")) in ("FUNCTION", "function") and c.get("texts")), None)
    return name, organism, length, function

if not hits:
    print("No UniProt BLAST hits yet; rerun that cell first.")
else:
    print("Fetching UniProt entry details for BLAST hits...")
    for accession, parsed, identity in hits[:UNIPROT_LIMIT]:
        try:
            entry = fetch_entry(accession)
        except requests.HTTPError as exc:
            print("Failed to fetch", accession, exc)
            continue
        name, organism, length, function = summarize(entry)
        print(f"{accession} (identity {identity}) -> {name} | {organism or '<organism?>'} | length={length or '?'}")
        if function:
            print(f"  Function: {function}")


Fetching UniProt entry details for BLAST hits...
P0DTC1 (identity 304) -> Replicase polyprotein 1a | Severe acute respiratory syndrome coronavirus 2 | length=4405
  Function: Multifunctional protein involved in the transcription and replication of viral RNAs. Contains the proteinases responsible for the cleavages of the polyprotein
P0DTD1 (identity 304) -> Replicase polyprotein 1ab | Severe acute respiratory syndrome coronavirus 2 | length=7096
  Function: Multifunctional protein involved in the transcription and replication of viral RNAs. Contains the proteinases responsible for the cleavages of the polyprotein
P0C6F5 (identity 292) -> Replicase polyprotein 1a | Bat coronavirus 279/2005 | length=4388
  Function: The papain-like proteinase (PL-PRO) is responsible for the cleavages located at the N-terminus of replicase polyprotein. In addition, PL-PRO possesses a deubiquitinating/deISGylating activity and processes both 'Lys-48'- and 'Lys-63'-linked polyubiquitin chains from cellular s

## PDB BLAST

Search the PDB with BLASTP to find structural relatives to the protein.


In [5]:
from Bio.Blast import NCBIWWW, NCBIXML
from time import perf_counter
import os
from pathlib import Path
import requests

COURSE_DIR = Path(os.environ.get("COURSE_DIR", str(Path.home() / "Concepcion26"))).expanduser()
PDB_OUT = COURSE_DIR / "results" / "01-introduction-sequence-check" / "pdb"
PDB_OUT.mkdir(parents=True, exist_ok=True)

sequence = globals().get("SEQ", "")

rcsb_entry_url = "https://data.rcsb.org/rest/v1/core/entry/{}"
pdb_file_url = "https://files.rcsb.org/download/{}.pdb"


def run_pdb_blast(seq, max_hits=PDB_LIMIT):
    print(f"BLAST request: submitting {len(seq)} aa sequence to the PDB (max {max_hits} hits)...")
    start = perf_counter()
    try:
        handle = NCBIWWW.qblast("blastp", "pdb", seq, hitlist_size=max_hits, format_type="XML")
    except Exception as exc:
        print("PDB BLAST request failed:", exc)
        return []
    try:
        record = NCBIXML.read(handle)
    except Exception as exc:
        print("Could not parse PDB BLAST response:", exc)
        return []
    duration = perf_counter() - start
    align_count = len(record.alignments)
    print(f"PDB BLAST finished in {duration:.1f}s with {align_count} alignments.")
    if not align_count:
        return []
    hits = []
    for idx, alignment in enumerate(record.alignments, start=1):
        print(f"  processing alignment {idx}/{align_count}: {alignment.accession} ({alignment.hit_def.split()[0]})")
        accessions = []
        for token in alignment.hit_def.split():
            if token.count("|") >= 2:
                accessions.append(token.split("|")[1])
        hits.append((alignment.accession, accessions, alignment.hsps[0].identities if alignment.hsps else 0, alignment.hsps[0].bits if alignment.hsps else 0))
    return hits


def fetch_rcsb_summary(pdb_code):
    try:
        resp = requests.get(rcsb_entry_url.format(pdb_code), timeout=10)
        resp.raise_for_status()
        data = resp.json()
        title = data.get("struct", {}).get("title")
        exp = data.get("exptl", [{}])[0].get("method")
        resolution = data.get("rcsb_entry_info", {}).get("resolution_combined")
        return title, exp, resolution
    except Exception as exc:
        print(f"  Could not fetch RCSB metadata for {pdb_code}: {exc}")
        return None, None, None


if not sequence:
    print("Sequence missing; rerun the parsing cell.")
else:
    PDB_HITS = run_pdb_blast(sequence)
    globals()["PDB_HITS"] = PDB_HITS
    globals()["TOP_PDB_ACCESSION"] = PDB_HITS[0][0] if PDB_HITS else None
    if not PDB_HITS:
        print("No PDB hits returned.")
    else:
        print("Top PDB hits (accession / parsed ids / identity / bits):")
        pdb_codes = []
        for acc, parsed, identity, bits in PDB_HITS:
            pdb_code = acc[:4]
            pdb_codes.append(pdb_code.lower())
            print(f"  {acc} | {parsed or ['(no accession)']} | identities={identity} | bits={bits}")
        seen = []
        for pdb_code in pdb_codes:
            if pdb_code in seen:
                continue
            seen.append(pdb_code)
            title, method, resolution = fetch_rcsb_summary(pdb_code)
            print(f"- {pdb_code.upper()}: {title or '<no title>'} | method={method or '<unknown>'} | resolution={resolution or '<n/a>'}")


BLAST request: submitting 304 aa sequence to the PDB (max 20 hits)...
PDB BLAST finished in 422.9s with 20 alignments.
  processing alignment 1/20: 8I4S_A (Chain)
  processing alignment 2/20: 6XA4_A (Chain)
  processing alignment 3/20: 9LVR_A (Chain)
  processing alignment 4/20: 8ZQ8_A (Chain)
  processing alignment 5/20: 7W9G_A (Chain)
  processing alignment 6/20: 7VU6_A (Chain)
  processing alignment 7/20: 7CWC_A (Chain)
  processing alignment 8/20: 7KFI_A (Chain)
  processing alignment 9/20: 7VTH_A (Chain)
  processing alignment 10/20: 9ASV_A (Chain)
  processing alignment 11/20: 9DTZ_A (Chain)
  processing alignment 12/20: 6M0K_A (Chain)
  processing alignment 13/20: 9KGJ_A (Chain)
  processing alignment 14/20: 7CB7_A (Chain)
  processing alignment 15/20: 6XMK_A (Chain)
  processing alignment 16/20: 7BRO_A (Chain)
  processing alignment 17/20: 5R7Y_A (Chain)
  processing alignment 18/20: 6YB7_A (Chain)
  processing alignment 19/20: 9NNG_A (Chain)
  processing alignment 20/20: 8VQX_

## Explore BindingDB

BindingDB collects binding affinity data for small molecules versus protein targets; the cell below fetches the records for the top UniProt and PDB accessions.


In [6]:
import requests
from requests.exceptions import ReadTimeout, RequestException

API_BASE = "https://www.bindingdb.org/rwd/bind/BindingDBRESTfulAPI.jsp"
uniprot_hits = globals().get("UNIPROT_HITS") or []
pdb_hits = [hit[0][:4].upper() for hit in (globals().get("PDB_HITS") or []) if hit]
pdb_hits = list(dict.fromkeys(pdb_hits))[:PDB_LIMIT]


def flatten_bindingdb_response(data):
    if isinstance(data, list):
        return data
    if isinstance(data, dict):
        for key in ("records", "ligands", "data", "entries", "hits", "bindEntries", "bindings"):
            value = data.get(key)
            if isinstance(value, list) and value:
                return value
        nested = []
        for value in data.values():
            if isinstance(value, list):
                nested.extend(value)
        if nested:
            return nested
        return [data]
    return []


def fetch_bindingdb(params):
    try:
        resp = requests.get(API_BASE, params=params, headers={"Accept": "application/json"}, timeout=15)
        resp.raise_for_status()
        data = resp.json()
    except ReadTimeout:
        print("BindingDB call timed out (15s)", params)
        return []
    except RequestException as exc:
        print("BindingDB call failed:", exc)
        return []
    except ValueError:
        text = resp.text.strip() if resp is not None else ""
        print("BindingDB returned non-JSON; snippet:", text[:400])
        if text:
            return [line.strip() for line in text.splitlines() if line.strip()]
        return []
    return flatten_bindingdb_response(data)

bindingdb_ligands = []
uniprot_records = {}
for accession, *_ in uniprot_hits[:UNIPROT_LIMIT]:
    if not accession:
        continue
    params = {"target": "uniprot", "targetid": accession, "format": "json"}
    print("Fetching ligands for UniProt", accession)
    ligands = fetch_bindingdb(params)
    uniprot_records[accession] = ligands
    bindingdb_ligands.extend([{"source": "uniprot", "accession": accession, "record": ligand} for ligand in ligands])
    print(f"  {len(ligands)} ligand records")

similar_proteins = {}
for code in pdb_hits:
    params = {"pdb": code, "format": "json"}
    print("Fetching BindingDB records for PDB", code)
    entries = fetch_bindingdb(params)
    if not entries:
        continue
    similar_proteins[code] = entries
    bindingdb_ligands.extend([{"source": "pdb", "accession": code, "record": entry} for entry in entries])
    print(f"  {len(entries)} records for PDB {code}")

globals()["BINDINGDB_LIGANDS"] = bindingdb_ligands
globals()["BINDINGDB_SIMILAR"] = similar_proteins
globals()["BINDINGDB_UNIPROT_RECORDS"] = uniprot_records
print("Stored", len(bindingdb_ligands), "records in BINDINGDB_LIGANDS.")


Fetching ligands for UniProt P0DTC1
BindingDB call failed: Expecting value: line 18 column 1 (char 19)
  0 ligand records
Fetching ligands for UniProt P0DTD1
BindingDB call failed: Expecting value: line 18 column 1 (char 19)
  0 ligand records
Fetching ligands for UniProt P0C6F5
BindingDB call failed: Expecting value: line 18 column 1 (char 19)
  0 ligand records
Fetching ligands for UniProt P0C6U8
BindingDB call failed: Expecting value: line 18 column 1 (char 19)
  0 ligand records
Fetching ligands for UniProt P0C6V9
BindingDB call failed: Expecting value: line 18 column 1 (char 19)
  0 ligand records
Fetching ligands for UniProt P0C6T7
BindingDB call failed: Expecting value: line 18 column 1 (char 19)
  0 ligand records
Fetching ligands for UniProt P0C6X7
BindingDB call failed: Expecting value: line 18 column 1 (char 19)
  0 ligand records
Fetching ligands for UniProt P0C6F8
BindingDB call failed: Expecting value: line 18 column 1 (char 19)
  0 ligand records
Fetching ligands for Uni

## BindingDB ligand dataframe

Convert the stored ligand records into a `pandas.DataFrame` for downstream analysis, extracting ligand names and PubChem IDs.


In [7]:
import pandas as pd

records = globals().get("BINDINGDB_LIGANDS") or []
if not records:
    print("No BindingDB ligand records yet; rerun the BindingDB cell after the ligand fetch completes.")
else:
    parsed = []
    for item in records:
        raw = item.get("record")
        if isinstance(raw, dict):
            entry = dict(raw)
        elif isinstance(raw, str):
            entry = {
                "raw": raw,
                "source_string": raw,
            }
        else:
            continue
        entry["source_type"] = item.get("source")
        entry["source_query"] = item.get("accession")
        parsed.append(entry)
    if not parsed:
        print("No structured entry data available; raw records were returned.")
    else:
        df = pd.DataFrame(parsed)
        def tseries(col):
            if col in df.columns:
                return df[col]
            return pd.Series([None] * len(df), index=df.index)
        df = df.assign(
            name=tseries("name").fillna(tseries("ligandName")).fillna(tseries("compoundName")).fillna(tseries("LIGAND_NAME")),
            pubchem_cid=tseries("pubchem_cid").fillna(tseries("pubChemCompoundID")).fillna(tseries("pubchemCID")).fillna(tseries("cid"))
        )
        subset = df[[col for col in ("name", "pubchem_cid", "source_type", "source_query") if col in df.columns]]
        print("Unique ligands discovered in BindingDB:")
        print(subset.drop_duplicates().to_dict("records"))
        globals()["BINDINGDB_LIGANDS_DF"] = subset


No BindingDB ligand records yet; rerun the BindingDB cell after the ligand fetch completes.


## Ligand clustering

Use RDKit fingerprints to group the BindingDB ligands we collected before querying ChEMBL, highlighting the main clusters and their PubChem IDs.


In [8]:
from rdkit import Chem
from rdkit.Chem import AllChem, DataStructs
from sklearn.cluster import AgglomerativeClustering
import numpy as np

ligand_df = globals().get("BINDINGDB_LIGANDS_DF")
if ligand_df is None or ligand_df.empty:
    print("No BindingDB ligand dataframe yet; run the ligand-summary cell first.")
else:
    smiles_col = next((col for col in ligand_df.columns if "smiles" in col.lower()), None)
    if smiles_col is None:
        print("No SMILES column found in ligand dataframe (looked for columns containing 'smiles').")
    else:
        rows = []
        fps = []
        for idx, row in ligand_df.iterrows():
            smi = row.get(smiles_col)
            if not isinstance(smi, str) or not smi.strip():
                continue
            mol = Chem.MolFromSmiles(smi)
            if mol is None:
                continue
            fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024)
            name = row.get("name") or row.get("ligandName") or row.get("compoundName") or f"ligand_{idx}"
            pubchem = row.get("pubchem_cid") or row.get("PubChemCompoundID") or row.get("pubChemCompoundID") or row.get("cid")
            rows.append((idx, name, smi, pubchem))
            fps.append(fp)
        n = len(fps)
        if n == 0:
            print("No valid ligands with SMILES could be parsed.")
        elif n == 1:
            print("Only one ligand available; nothing to cluster.")
        else:
            dist = np.zeros((n, n))
            for i in range(n):
                for j in range(i + 1, n):
                    sim = DataStructs.TanimotoSimilarity(fps[i], fps[j])
                    dist[i, j] = dist[j, i] = 1.0 - sim
            n_clusters = min(4, n)
            clustering = AgglomerativeClustering(n_clusters=n_clusters, affinity="precomputed", linkage="average")
            labels = clustering.fit_predict(dist)
            clusters = {i: [] for i in range(n_clusters)}
            for label, info in zip(labels, rows):
                clusters[label].append(info)
            for label, items in clusters.items():
                print(f"Cluster {label} ({len(items)} ligands):")
                for item in items:
                    idx, name, smi, pubchem = item
                    print(f"  - {name or '<unnamed>'} | SMILES={smi[:60]}{'...' if len(smi)>60 else ''} | PubChem={pubchem or 'N/A'}")
            globals()["BINDINGDB_LIGAND_CLUSTERS"] = clusters


No BindingDB ligand dataframe yet; run the ligand-summary cell first.


## Explore ChEMBL

ChEMBL provides curated bioactivity tables; this cell searches for targets with the UniProt accession.


In [9]:
import requests

accession = globals().get("TOP_UNIPROT_ACCESSION")
if not accession:
    print("No UniProt accession available for ChEMBL search.")
else:
    url = "https://www.ebi.ac.uk/chembl/api/data/target/search.json"
    params = {"query": accession}
    resp = requests.get(url, params=params, timeout=15)
    resp.raise_for_status()
    hits = resp.json().get("targets", [])
    print(f"ChEMBL targets matching {accession}:")
    for target in hits[:UNIPROT_LIMIT]:
        print("  " + target.get("target_chembl_id", "<none>") + " / " + target.get("pref_name", "<unnamed>"))


HTTPError: 400 Client Error: Bad Request for url: https://www.ebi.ac.uk/chembl/api/data/target/search.json?query=P0DTC1

## Explore PDBe-KB

PDBe-KB maps UniProt sequences to PDB entries; the snippet below prints that mapping for the top accession.


In [None]:
import requests

accession = globals().get("TOP_UNIPROT_ACCESSION")
if not accession:
    print("No UniProt accession available for PDBe-KB search.")
else:
    url = f"https://www.ebi.ac.uk/pdbe/api/mappings/uniprot/{accession}"
    resp = requests.get(url, timeout=15)
    resp.raise_for_status()
    mapping = resp.json().get(accession, {})
    print("PDBe-KB mappings for", accession)
    print(mapping)


## DrugBank and Drugs@FDA

These cells open the approved/investigational ligand search pages for the UniProt accession.


In [None]:
import requests

accession = globals().get("TOP_UNIPROT_ACCESSION")
if not accession:
    print("No UniProt accession available for drug database searches.")
else:
    urls = [
        ("DrugBank", f"https://go.drugbank.com/unearth/q?searcher=drugs&q={accession}"),
        ("Drugs@FDA", f"https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=basicSearch.process&ApplNo={accession}"),
    ]
    for label, url in urls:
        print("Querying", label, url)
        resp = requests.get(url, timeout=15)
        print("Status:", resp.status_code)
        snippet = resp.text[:500] + ("..." if len(resp.text) > 500 else "")
        print(snippet)
