Skip to content

JonasHeinickeBio/biomedical-knowledge-lookup

Repository files navigation

🧬 Biomedical Knowledge Lookup

PyPI version Python 3.10+ License: MIT Tests Coverage Ruff Documentation PyPI downloads GitHub last commit DOI

A unified Python library for biological concept lookup across 29+ biomedical knowledge sources including BioPortal, OLS, UMLS, ChEMBL, DisGeNET, and more. Built for bioinformatics researchers, knowledge graph developers, and biomedical data scientists.

✨ Features

  • πŸ” 29+ Knowledge Sources: Comprehensive coverage of biomedical ontologies and databases
  • ⚑ Unified API: Single interface for all sources with consistent results
  • πŸ”„ Multi-source Annotation: Cross-reference concepts across multiple databases
  • πŸ“Š RDF Export: Convert results to RDF format for knowledge graphs
  • πŸ’Ύ Intelligent Caching: Built-in caching system for performance optimization
  • πŸ”„ Async Support: Asynchronous operations for scalable applications
  • πŸ§ͺ Comprehensive Testing: Full test suite with unit and integration tests
  • πŸ“š Rich Documentation: Extensive examples and API documentation

πŸš€ Quick Start

Installation

pip install biomedical-knowledge-lookup
# or
poetry add biomedical-knowledge-lookup
# or from source
git clone https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup.git
cd biomedical-knowledge-lookup
poetry install

Basic Usage

from knowledge_lookup import CentralKnowledgeLookup, KnowledgeSource

# Initialize the lookup system
lookup = CentralKnowledgeLookup()

# Search for concepts across multiple sources
results = await lookup.search_concepts(
    "diabetes mellitus",
    sources=[KnowledgeSource.BIOPORTAL, KnowledgeSource.OLS, KnowledgeSource.UMLS]
)

# Get detailed information about a specific concept
concept_details = await lookup.get_concept_details("DOID:9351")

# Export results to RDF
rdf_graph = lookup.export_to_rdf(results)

Advanced Usage with Multi-source Annotation

from knowledge_lookup import MultiSourceAnnotator

# Annotate text with concepts from multiple sources
annotator = MultiSourceAnnotator()
annotations = await annotator.annotate_text(
    "Type 2 diabetes is associated with insulin resistance",
    confidence_threshold=0.7
)

# Get consensus annotations across sources
consensus = annotator.get_consensus_annotations(annotations)

πŸ“‹ Supported Knowledge Sources

Source Description API Key Required
BioPortal NCBI BioPortal ontology repository Yes
OLS Ontology Lookup Service No
UMLS Unified Medical Language System Yes
ChEMBL Chemical database No
DisGeNET Disease-gene associations No
DrugBank Drug information database No
Ensembl Genome annotation database No
Gene Ontology Molecular function/process/component No
HPO Human Phenotype Ontology No
Mondo Mondo Disease Ontology No
OpenTargets Target-disease associations No
PubChem Chemical information No
Reactome Pathway database No
UniProt Protein sequence database No
WikiData Structured knowledge base No
ZOOMA Ontology mapping service No
And 13+ more... See full list in documentation Varies

πŸ—οΈ Architecture

knowledge_lookup/
β”œβ”€β”€ adapters/           # Individual source adapters
β”œβ”€β”€ models.py          # Data models and enums
β”œβ”€β”€ central_lookup.py  # Main lookup coordinator
β”œβ”€β”€ multi_source_annotator.py  # Cross-source annotation
β”œβ”€β”€ rdf_converter.py   # RDF export utilities
β”œβ”€β”€ cache.py          # Caching system
└── base.py           # Abstract base classes

πŸ“– Documentation

Additional Resources

Example Notebooks

Explore interactive examples in the examples/ directory:

  • Basic concept lookup
  • Multi-source annotation
  • RDF export and knowledge graph construction
  • Performance benchmarking

πŸ”§ Configuration

API Keys

Some sources require API keys. Set them as environment variables:

export BIOPORTAL_API_KEY="your_key_here"
export UMLS_API_KEY="your_key_here"
# ... etc

Or create a .env file:

BIOPORTAL_API_KEY=your_key_here
UMLS_API_KEY=your_key_here

Advanced Configuration

from knowledge_lookup import LookupConfig

config = LookupConfig(
    rate_limits={
        KnowledgeSource.BIOPORTAL: 10,  # requests per second
        KnowledgeSource.OLS: 20,
    },
    cache_enabled=True,
    cache_dir="./cache"
)

lookup = CentralKnowledgeLookup(config)

πŸ§ͺ Testing

# Run all tests
poetry run pytest

# Run specific test categories
poetry run pytest -m "unit"        # Unit tests only
poetry run pytest -m "integration" # Integration tests
poetry run pytest -m "not slow"    # Skip slow tests

# Run with coverage
poetry run pytest --cov=knowledge_lookup

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Adding New Adapters

  1. Extend KnowledgeSourceAdapter in base.py
  2. Implement required methods: search_concepts(), get_concept_details()
  3. Add to adapters/__init__.py
  4. Add tests in tests/unit/test_adapters/
  5. Update documentation

Development Setup

git clone https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup.git
cd biomedical-knowledge-lookup
poetry install
poetry run pre-commit install

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Built upon the AID-PAIS Knowledge Graph project
  • Thanks to all contributors and the biomedical research community
  • Special thanks to the maintainers of the various knowledge sources

πŸ“ž Support

πŸ”¬ Citation

If you use this library in your research, please cite:

@software{heinicke_biomedical_knowledge_lookup_2025,
  author = {Heinicke, Jonas},
  title = {Biomedical Knowledge Lookup: Unified biological concept lookup across 29+ biomedical knowledge sources},
  url = {https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup},
  version = {1.0.0},
  year = {2025}
}

GitHub stars GitHub forks

⭐ Star this repository if you find it useful!

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors