A unified Python library for biological concept lookup across 29+ biomedical knowledge sources including BioPortal, OLS, UMLS, ChEMBL, DisGeNET, and more. Built for bioinformatics researchers, knowledge graph developers, and biomedical data scientists.
- π 29+ Knowledge Sources: Comprehensive coverage of biomedical ontologies and databases
- β‘ Unified API: Single interface for all sources with consistent results
- π Multi-source Annotation: Cross-reference concepts across multiple databases
- π RDF Export: Convert results to RDF format for knowledge graphs
- πΎ Intelligent Caching: Built-in caching system for performance optimization
- π Async Support: Asynchronous operations for scalable applications
- π§ͺ Comprehensive Testing: Full test suite with unit and integration tests
- π Rich Documentation: Extensive examples and API documentation
pip install biomedical-knowledge-lookup
# or
poetry add biomedical-knowledge-lookup
# or from source
git clone https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup.git
cd biomedical-knowledge-lookup
poetry installfrom knowledge_lookup import CentralKnowledgeLookup, KnowledgeSource
# Initialize the lookup system
lookup = CentralKnowledgeLookup()
# Search for concepts across multiple sources
results = await lookup.search_concepts(
"diabetes mellitus",
sources=[KnowledgeSource.BIOPORTAL, KnowledgeSource.OLS, KnowledgeSource.UMLS]
)
# Get detailed information about a specific concept
concept_details = await lookup.get_concept_details("DOID:9351")
# Export results to RDF
rdf_graph = lookup.export_to_rdf(results)from knowledge_lookup import MultiSourceAnnotator
# Annotate text with concepts from multiple sources
annotator = MultiSourceAnnotator()
annotations = await annotator.annotate_text(
"Type 2 diabetes is associated with insulin resistance",
confidence_threshold=0.7
)
# Get consensus annotations across sources
consensus = annotator.get_consensus_annotations(annotations)| Source | Description | API Key Required |
|---|---|---|
| BioPortal | NCBI BioPortal ontology repository | Yes |
| OLS | Ontology Lookup Service | No |
| UMLS | Unified Medical Language System | Yes |
| ChEMBL | Chemical database | No |
| DisGeNET | Disease-gene associations | No |
| DrugBank | Drug information database | No |
| Ensembl | Genome annotation database | No |
| Gene Ontology | Molecular function/process/component | No |
| HPO | Human Phenotype Ontology | No |
| Mondo | Mondo Disease Ontology | No |
| OpenTargets | Target-disease associations | No |
| PubChem | Chemical information | No |
| Reactome | Pathway database | No |
| UniProt | Protein sequence database | No |
| WikiData | Structured knowledge base | No |
| ZOOMA | Ontology mapping service | No |
| And 13+ more... | See full list in documentation | Varies |
knowledge_lookup/
βββ adapters/ # Individual source adapters
βββ models.py # Data models and enums
βββ central_lookup.py # Main lookup coordinator
βββ multi_source_annotator.py # Cross-source annotation
βββ rdf_converter.py # RDF export utilities
βββ cache.py # Caching system
βββ base.py # Abstract base classes
Explore interactive examples in the examples/ directory:
- Basic concept lookup
- Multi-source annotation
- RDF export and knowledge graph construction
- Performance benchmarking
Some sources require API keys. Set them as environment variables:
export BIOPORTAL_API_KEY="your_key_here"
export UMLS_API_KEY="your_key_here"
# ... etcOr create a .env file:
BIOPORTAL_API_KEY=your_key_here
UMLS_API_KEY=your_key_herefrom knowledge_lookup import LookupConfig
config = LookupConfig(
rate_limits={
KnowledgeSource.BIOPORTAL: 10, # requests per second
KnowledgeSource.OLS: 20,
},
cache_enabled=True,
cache_dir="./cache"
)
lookup = CentralKnowledgeLookup(config)# Run all tests
poetry run pytest
# Run specific test categories
poetry run pytest -m "unit" # Unit tests only
poetry run pytest -m "integration" # Integration tests
poetry run pytest -m "not slow" # Skip slow tests
# Run with coverage
poetry run pytest --cov=knowledge_lookupWe welcome contributions! Please see our Contributing Guide for details.
- Extend
KnowledgeSourceAdapterinbase.py - Implement required methods:
search_concepts(),get_concept_details() - Add to
adapters/__init__.py - Add tests in
tests/unit/test_adapters/ - Update documentation
git clone https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup.git
cd biomedical-knowledge-lookup
poetry install
poetry run pre-commit installThis project is licensed under the MIT License - see the LICENSE file for details.
- Built upon the AID-PAIS Knowledge Graph project
- Thanks to all contributors and the biomedical research community
- Special thanks to the maintainers of the various knowledge sources
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: jonas.heinicke@helmholtz-hzi.de
If you use this library in your research, please cite:
@software{heinicke_biomedical_knowledge_lookup_2025,
author = {Heinicke, Jonas},
title = {Biomedical Knowledge Lookup: Unified biological concept lookup across 29+ biomedical knowledge sources},
url = {https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup},
version = {1.0.0},
year = {2025}
}β Star this repository if you find it useful!