# Ontology Tagger Tutorial
A hands-on notebook to configure BioPortal access, download ontology versions, and annotate terms with strict vs fallback behaviors.


## 1) Prerequisites
- Python 3.9+
- BioPortal account + API key
- Install deps: `pip install -r requirements.txt`
- Key files: `tagger.py`, `utils.py`, `const.py`, `main.py`
- Caching: OWL downloads saved under `downloads/{ontology}_{version}.owl`


## 2) Configure your BioPortal API key
1. Log in at https://bioportal.bioontology.org/accounts
2. Copy your API key from your account page.
3. Set it via environment or `.env` in repo root:
   - `export BIOPORTAL_API_KEY=your_key`
   - or create `.env` with `BIOPORTAL_API_KEY=your_key`


In [5]:
# 3) Imports & client setup
from pathlib import Path

from tagger import BioPortalClient, OntologyTagger
from utils import get_api_key, outcomes_to_json, save_json_output

api_key = get_api_key()
client = BioPortalClient(api_key=api_key)
tagger = OntologyTagger(client)
print("Client ready; API key loaded.")


Client ready; API key loaded.


In [6]:
print(api_key)

b6c96a8b-aeb5-4833-8013-507ce7fd36cc


In [7]:
# 4) Inspect available submissions (versions) for an ontology
subs = client.list_submissions("CL")
[(s.get("version"), s.get("submissionId")) for s in subs[:5]]  # show first few


[('2025-12-17', 126),
 ('2025-11-25', 125),
 ('2025-11-03', 124),
 ('2025-10-16', 123),
 ('2025-07-30', 122)]

## 5) Annotate with strict version (error if missing or absent)
- `on_version_missing="error"` is strict
- If the requested version is missing → raises `OntologyVersionNotFound`
- If the concept is not present in that version → returns strict outcome (no fallback)


In [8]:
# Strict example
try:
    outcomes_strict = tagger.annotate_terms(
        ontology="CL",
        terms=["melanocyte"],
        version="2025-10-16",
        on_version_missing="error",
    )
    print(outcomes_to_json(outcomes_strict))
except Exception as e:
    print("Strict mode raised:", e)
finally:
    print("Download URL (if any):", client.last_download_url)


[
  {
    "input_text": "melanocyte",
    "standardized_term": "melanocyte",
    "ontology_id": "CL_0000148",
    "ontology_version": "2025-10-16",
    "comment": "matched in user specified version"
  }
]
Download URL (if any): (cached) downloads/CL_2025-10-16.owl


## 6) Annotate with flexible fallback
- `on_version_missing="latest"`
- If the requested version is missing OR the concept is absent in that version, it falls back to latest and notes the fallback in the comment.


In [None]:
# Flexible example
outcomes_flex = tagger.annotate_terms(
    ontology="CL",
    terms=["melanocyte"],
    version="2025-10-16",
    on_version_missing="latest",
)
print(outcomes_to_json(outcomes_flex))
print("Download URL (latest or fallback):", client.last_download_url)


## 7) Annotate from a text file (one term per line)
Pass `file_path=Path("terms.txt")`. Empty lines are ignored.


In [9]:
# File-based annotation
outcomes_file = tagger.annotate_terms(
    ontology="CL",
    file_path=Path("test.txt"),
    version="2025-10-16",
    on_version_missing="latest",
)
print(outcomes_to_json(outcomes_file))


[
  {
    "input_text": "b cell",
    "standardized_term": "B cell",
    "ontology_id": "CL_0000236",
    "ontology_version": "2025-10-16",
    "comment": "matched in user specified version"
  },
  {
    "input_text": "t cell",
    "standardized_term": "T cell",
    "ontology_id": "CL_0000084",
    "ontology_version": "2025-10-16",
    "comment": "matched in user specified version"
  },
  {
    "input_text": "cd4 t cell",
    "standardized_term": "T-cell surface glycoprotein CD4 (mouse)",
    "ontology_id": "PR_P06332",
    "ontology_version": "2025-10-16",
    "comment": "matched in user specified version"
  },
  {
    "input_text": "monoctye",
    "standardized_term": null,
    "ontology_id": null,
    "ontology_version": null,
    "comment": "not matched at all"
  }
]


## 8) Save annotation results to JSON
Use `save_json_output` to write outcomes to disk (parents are created automatically).


In [11]:
# Save outcomes (e.g., from the flexible run) to a JSON file
save_path = Path("results/annotations_flex.json")
save_json_output(outcomes_flex, save_path)
save_path


PosixPath('results/annotations_flex.json')

## 9) Notes on ontology IDs and matching
- Stored ontology IDs are short fragments (no full URIs).
- Presence in specified version is checked by ID and prefLabel in the downloaded OWL.
- If the prefLabel is null in BioPortal, the tagger now attempts to reverse lookup the label from the OWL file using the ID.
- If you rerun with the same version, the OWL file is reused from `downloads/` (cached).


## 10) Troubleshooting
- Restart kernel if you change code; stale imports can mask fixes.
- If an OWL file parses as JSON, ensure the URL has `/download` and the `apikey` query param (and Authorization header is set by the client).
- Inspect `client.last_download_url` to confirm the endpoint used.
- Ensure the requested version string matches a submission in `client.list_submissions()`. 


## 11) Verify Reverse Lookup Fix
The following block verifies that `standardized_term` is correctly populated from the OWL file even if BioPortal returns null.

In [1]:
# Verify the fix for 'melanocyte' where standardized_term was previously null
# This cell is self-contained and can be run even if previous cells were not run (provided dependencies are installed)
import sys
from pathlib import Path
try:
    from tagger import BioPortalClient, OntologyTagger
    from utils import get_api_key
except ImportError:
    pass

# Ensure tagger is initialized if it doesn't exist in the current session
if 'tagger' not in locals():
    print("Initializing tagger for verification...")
    try:
        api_key = get_api_key()
        client = BioPortalClient(api_key=api_key)
        tagger = OntologyTagger(client)
    except Exception as e:
        print(f"Setup failed: {e}")
        print("Please ensure .env is set or run the setup cells above.")

if 'tagger' in locals():
    print("Verifying reverse lookup...")
    try:
        outcomes_verify = tagger.annotate_terms(
            ontology="CL",
            terms=["melanocyte"],
            version="2025-10-16",
            on_version_missing="latest"
        )

        for outcome in outcomes_verify:
            print(f"Term: {outcome.input_text}")
            print(f"Standardized Term: {outcome.standardized_term}")
            print(f"ID: {outcome.ontology_id}")
            if outcome.standardized_term:
                print("SUCCESS: Standardized term populated via reverse lookup.")
            else:
                print("FAILURE: Standardized term is null.")
    except Exception as e:
        print(f"Verification run failed: {e}")

Initializing tagger for verification...
Verifying reverse lookup...
Term: melanocyte
Standardized Term: melanocyte
ID: CL_0000148
SUCCESS: Standardized term populated via reverse lookup.
