# Notebook 08: Integration Recommendations

## Objective

Provide actionable production recommendations for biomapper2 based on v3 findings.

## Key Deliverables

1. Production-ready `/one-hop` wrapper functions
2. When to use search vs graph traversal
3. Hybrid approach implementation guide
4. API integration code snippets

In [1]:
# Standard imports
import sys
from pathlib import Path
from datetime import datetime

# Add project root to path
PROJECT_ROOT = Path.cwd().parents[1]
sys.path.insert(0, str(PROJECT_ROOT / 'src'))
sys.path.insert(0, str(Path.cwd()))

# Import utilities
from kg_o1_v3_utils import save_json, load_json

# Output directory
OUTPUT_DIR = Path.cwd() / 'outputs'

print(f"Project root: {PROJECT_ROOT}")
print(f"Output directory: {OUTPUT_DIR}")

Project root: /home/trentleslie/Insync/projects/biomapper2
Output directory: /home/trentleslie/Insync/projects/biomapper2/notebooks/kg_o1_v3/outputs


## 1. Load v3 Findings

In [2]:
# Load gap analysis from NB07
if (OUTPUT_DIR / 'semantic_gap_analysis_v3.json').exists():
    gap_analysis = load_json(OUTPUT_DIR / 'semantic_gap_analysis_v3.json')
    print("Loaded gap analysis from NB07")
    
    capability_level = gap_analysis.get('semantic_gap_assessment', {}).get('capability_level', 'UNKNOWN')
    print(f"Semantic capability level: {capability_level}")
else:
    print("WARNING: Gap analysis not found. Using defaults.")
    gap_analysis = {}
    capability_level = 'UNKNOWN'

Loaded gap analysis from NB07
Semantic capability level: MINIMAL


## 2. Production-Ready One-Hop Wrapper

Here's the code to add to biomapper2 for semantic graph traversal.

In [3]:
# Production one-hop wrapper code
one_hop_wrapper_code = '''
"""One-hop graph traversal for semantic queries."""

from typing import Optional
from biomapper2.utils import kestrel_request


def get_semantic_relations(
    entity_id: str,
    direction: str = "both",
    predicate_filter: Optional[str] = None,
    category_filter: Optional[str] = None,
    limit: int = 50,
) -> list[dict]:
    """
    Get semantic relations for an entity via one-hop traversal.
    
    Use this for semantic queries like:
    - "What pathways does X participate in?"
    - "What diseases are associated with X?"
    - "What genes interact with X?"
    
    For entity resolution (vocabulary mapping), use hybrid_search instead.
    
    Args:
        entity_id: Entity ID (e.g., "CHEBI:4167")
        direction: "forward", "reverse", or "both"
        predicate_filter: Filter by predicate (e.g., "participates_in")
        category_filter: Filter by end category (e.g., "Pathway")
        limit: Maximum results
    
    Returns:
        List of relation dictionaries with predicate, object_id, object_name
    """
    payload = {
        "start_node_ids": entity_id,
        "direction": direction,
        "limit": limit,
        "mode": "slim",
    }
    
    if predicate_filter:
        payload["predicate_filter"] = predicate_filter
    if category_filter:
        payload["end_category_filter"] = category_filter
    
    try:
        response = kestrel_request("POST", "one-hop", json=payload)
        
        if isinstance(response, list):
            return response
        elif isinstance(response, dict) and "error" not in response:
            return response.get("edges", response.get("results", []))
        return []
        
    except Exception:
        return []


def get_pathways_for_entity(entity_id: str, limit: int = 20) -> list[dict]:
    """Convenience function to get pathways for a metabolite."""
    return get_semantic_relations(
        entity_id,
        direction="forward",
        predicate_filter="participates_in",
        category_filter="Pathway",
        limit=limit,
    )


def get_associated_diseases(entity_id: str, limit: int = 20) -> list[dict]:
    """Convenience function to get diseases associated with an entity."""
    return get_semantic_relations(
        entity_id,
        direction="both",
        predicate_filter="associated_with",
        category_filter="Disease",
        limit=limit,
    )
'''

print("Production One-Hop Wrapper Code:")
print("="*60)
print(one_hop_wrapper_code)

Production One-Hop Wrapper Code:

"""One-hop graph traversal for semantic queries."""

from typing import Optional
from biomapper2.utils import kestrel_request


def get_semantic_relations(
    entity_id: str,
    direction: str = "both",
    predicate_filter: Optional[str] = None,
    category_filter: Optional[str] = None,
    limit: int = 50,
) -> list[dict]:
    """
    Get semantic relations for an entity via one-hop traversal.

    Use this for semantic queries like:
    - "What pathways does X participate in?"
    - "What diseases are associated with X?"
    - "What genes interact with X?"

    For entity resolution (vocabulary mapping), use hybrid_search instead.

    Args:
        entity_id: Entity ID (e.g., "CHEBI:4167")
        direction: "forward", "reverse", or "both"
        predicate_filter: Filter by predicate (e.g., "participates_in")
        category_filter: Filter by end category (e.g., "Pathway")
        limit: Maximum results

    Returns:
        List of relation d

## 3. When to Use Each Method

In [4]:
# Decision tree for method selection
decision_tree = {
    'entity_resolution': {
        'description': 'Finding the same entity across different vocabulary IDs',
        'examples': [
            'What is the HMDB ID for glucose?',
            'Map CHEBI:4167 to KEGG',
            'Find DRUGBANK equivalents for this compound',
        ],
        'recommended_method': 'hybrid_search + reranking',
        'expected_performance': '95%+ EM',
        'code_snippet': 'hybrid_search("glucose", limit=10)',
    },
    'semantic_1_hop': {
        'description': 'Finding entities related by semantic predicates',
        'examples': [
            'What pathways does glucose participate in?',
            'What diseases are associated with cholesterol?',
            'What genes are affected by this drug?',
        ],
        'recommended_method': 'get_semantic_relations() via one-hop',
        'expected_performance': 'Varies by predicate coverage',
        'code_snippet': 'get_semantic_relations("CHEBI:4167", predicate_filter="participates_in")',
    },
    'semantic_multi_hop': {
        'description': 'Finding connections through multiple semantic hops',
        'examples': [
            'How is glucose connected to diabetes?',
            'What pathways link these two metabolites?',
        ],
        'recommended_method': 'BFS path finding with safeguards',
        'expected_performance': 'Depends on graph connectivity',
        'code_snippet': 'find_path_bfs(start_id, end_id, max_hops=3, semantic_only=True)',
    },
    'hybrid_query': {
        'description': 'Complex queries requiring both entity resolution and semantic traversal',
        'examples': [
            'Find pathways for "vitamin B12" (need to resolve name first)',
            'Get disease associations for metabolites in my dataset',
        ],
        'recommended_method': 'Search first, then graph traversal',
        'expected_performance': 'Compound accuracy',
        'code_snippet': '''# Step 1: Resolve entity\nresults = hybrid_search("vitamin B12")\nentity_id = results[0]["id"]\n\n# Step 2: Get semantic relations\npathways = get_pathways_for_entity(entity_id)''',
    },
}

print("METHOD SELECTION GUIDE")
print("="*60)

for query_type, info in decision_tree.items():
    print(f"\n{query_type.upper()}")
    print("-" * 40)
    print(f"Description: {info['description']}")
    print(f"\nExamples:")
    for ex in info['examples']:
        print(f"  - {ex}")
    print(f"\nMethod: {info['recommended_method']}")
    print(f"Performance: {info['expected_performance']}")
    print(f"\nCode:")
    print(f"  {info['code_snippet']}")

METHOD SELECTION GUIDE

ENTITY_RESOLUTION
----------------------------------------
Description: Finding the same entity across different vocabulary IDs

Examples:
  - What is the HMDB ID for glucose?
  - Map CHEBI:4167 to KEGG
  - Find DRUGBANK equivalents for this compound

Method: hybrid_search + reranking
Performance: 95%+ EM

Code:
  hybrid_search("glucose", limit=10)

SEMANTIC_1_HOP
----------------------------------------
Description: Finding entities related by semantic predicates

Examples:
  - What pathways does glucose participate in?
  - What diseases are associated with cholesterol?
  - What genes are affected by this drug?

Method: get_semantic_relations() via one-hop
Performance: Varies by predicate coverage

Code:
  get_semantic_relations("CHEBI:4167", predicate_filter="participates_in")

SEMANTIC_MULTI_HOP
----------------------------------------
Description: Finding connections through multiple semantic hops

Examples:
  - How is glucose connected to diabetes?
  - What

## 4. Hybrid Query Implementation

In [5]:
# Hybrid query implementation code
hybrid_query_code = '''
"""Hybrid query implementation combining search and graph traversal."""

from typing import Optional


def query_with_semantic_expansion(
    query: str,
    predicates: Optional[list[str]] = None,
    search_limit: int = 5,
    relation_limit: int = 20,
) -> dict:
    """
    Execute a hybrid query: search for entity, then expand semantically.
    
    Args:
        query: Natural language query or entity name
        predicates: Predicates to expand (default: participates_in, associated_with)
        search_limit: Max entities to consider from search
        relation_limit: Max relations per entity
    
    Returns:
        Dict with resolved_entities and their semantic_relations
    """
    if predicates is None:
        predicates = ["participates_in", "associated_with", "affects"]
    
    # Step 1: Resolve entity via search
    search_results = hybrid_search(query, limit=search_limit)
    
    if not search_results:
        return {"resolved_entities": [], "semantic_relations": []}
    
    # Step 2: Get semantic relations for top entities
    all_relations = []
    
    for entity in search_results[:search_limit]:
        entity_id = entity.get("id")
        entity_name = entity.get("name", entity_id)
        
        for predicate in predicates:
            relations = get_semantic_relations(
                entity_id,
                predicate_filter=predicate,
                limit=relation_limit,
            )
            
            for rel in relations:
                all_relations.append({
                    "source_entity": entity_name,
                    "source_id": entity_id,
                    "predicate": predicate,
                    "target_entity": rel.get("object_name", rel.get("end_node_name")),
                    "target_id": rel.get("object_id", rel.get("end_node_id")),
                    "target_category": rel.get("object_category", rel.get("category")),
                })
    
    return {
        "resolved_entities": search_results[:search_limit],
        "semantic_relations": all_relations,
    }


# Example usage:
# result = query_with_semantic_expansion(
#     "glucose",
#     predicates=["participates_in", "affects"],
# )
# print(f"Found {len(result[\'semantic_relations\'])} relations")
'''

print("Hybrid Query Implementation:")
print("="*60)
print(hybrid_query_code)

Hybrid Query Implementation:

"""Hybrid query implementation combining search and graph traversal."""

from typing import Optional


def query_with_semantic_expansion(
    query: str,
    predicates: Optional[list[str]] = None,
    search_limit: int = 5,
    relation_limit: int = 20,
) -> dict:
    """
    Execute a hybrid query: search for entity, then expand semantically.

    Args:
        query: Natural language query or entity name
        predicates: Predicates to expand (default: participates_in, associated_with)
        search_limit: Max entities to consider from search
        relation_limit: Max relations per entity

    Returns:
        Dict with resolved_entities and their semantic_relations
    """
    if predicates is None:
        predicates = ["participates_in", "associated_with", "affects"]

    # Step 1: Resolve entity via search
    search_results = hybrid_search(query, limit=search_limit)

    if not search_results:
        return {"resolved_entities": [], "semantic

## 5. Integration Checklist

In [6]:
# Integration checklist
integration_checklist = [
    {
        'task': 'Add one-hop wrapper to biomapper2',
        'file': 'src/biomapper2/core/semantic_query.py',
        'priority': 'HIGH',
        'status': 'TODO',
    },
    {
        'task': 'Add hybrid query function',
        'file': 'src/biomapper2/core/semantic_query.py',
        'priority': 'HIGH',
        'status': 'TODO',
    },
    {
        'task': 'Add convenience functions (get_pathways, get_diseases)',
        'file': 'src/biomapper2/core/semantic_query.py',
        'priority': 'MEDIUM',
        'status': 'TODO',
    },
    {
        'task': 'Add BFS path finding with safeguards',
        'file': 'src/biomapper2/core/path_finder.py',
        'priority': 'LOW',
        'status': 'TODO',
    },
    {
        'task': 'Update Mapper class with semantic query methods',
        'file': 'src/biomapper2/mapper.py',
        'priority': 'MEDIUM',
        'status': 'TODO',
    },
    {
        'task': 'Add tests for semantic queries',
        'file': 'tests/test_semantic_query.py',
        'priority': 'HIGH',
        'status': 'TODO',
    },
    {
        'task': 'Document semantic query capabilities',
        'file': 'docs/semantic_queries.md',
        'priority': 'LOW',
        'status': 'TODO',
    },
]

print("INTEGRATION CHECKLIST")
print("="*60)

for item in integration_checklist:
    priority_marker = {'HIGH': '!', 'MEDIUM': '-', 'LOW': ' '}[item['priority']]
    print(f"\n[{priority_marker}] {item['task']}")
    print(f"    File: {item['file']}")
    print(f"    Priority: {item['priority']}")

INTEGRATION CHECKLIST

[!] Add one-hop wrapper to biomapper2
    File: src/biomapper2/core/semantic_query.py
    Priority: HIGH

[!] Add hybrid query function
    File: src/biomapper2/core/semantic_query.py
    Priority: HIGH

[-] Add convenience functions (get_pathways, get_diseases)
    File: src/biomapper2/core/semantic_query.py
    Priority: MEDIUM

[ ] Add BFS path finding with safeguards
    File: src/biomapper2/core/path_finder.py
    Priority: LOW

[-] Update Mapper class with semantic query methods
    File: src/biomapper2/mapper.py
    Priority: MEDIUM

[!] Add tests for semantic queries
    File: tests/test_semantic_query.py
    Priority: HIGH

[ ] Document semantic query capabilities
    File: docs/semantic_queries.md
    Priority: LOW


## 6. Save Integration Recommendations

In [7]:
# Save integration recommendations
output_data = {
    'timestamp': datetime.now().isoformat(),
    'v3_capability_level': capability_level,
    'decision_tree': decision_tree,
    'integration_checklist': integration_checklist,
    'code_snippets': {
        'one_hop_wrapper': one_hop_wrapper_code,
        'hybrid_query': hybrid_query_code,
    },
    'key_recommendations': [
        'Continue using search for entity resolution (95%+ EM)',
        'Add one-hop wrapper for semantic queries',
        'Implement hybrid approach for complex queries',
        'Use BFS with safeguards for multi-hop paths',
    ],
}

save_json(output_data, OUTPUT_DIR / 'integration_recommendations.json')
print(f"\nIntegration recommendations saved to: {OUTPUT_DIR / 'integration_recommendations.json'}")


Integration recommendations saved to: /home/trentleslie/Insync/projects/biomapper2/notebooks/kg_o1_v3/outputs/integration_recommendations.json


## Summary

In [8]:
# Final summary
print("\n" + "="*60)
print("NOTEBOOK 08 COMPLETE - KG-o1 v3 SERIES COMPLETE")
print("="*60)

print(f"\nv3 Exploration Summary:")
print(f"  - Tested /one-hop endpoint for semantic graph traversal")
print(f"  - Classified predicates as SEMANTIC vs EQUIVALENCY")
print(f"  - Extracted semantic subgraphs with true relations")
print(f"  - Implemented BFS path finding with explosion safeguards")
print(f"  - Generated and validated semantic QA pairs")
print(f"  - Compared search vs graph traversal performance")
print(f"  - Quantified v2 vs v3 capability gap")
print(f"  - Provided production integration recommendations")

print(f"\nKey Takeaway:")
print(f"  - Search: Best for entity resolution ({gap_analysis.get('capability_comparison', {}).get('v2_vocabulary_qa', {}).get('exact_match', 0.952)*100:.0f}% EM)")
print(f"  - Graph: Adds semantic capability (pathways, diseases, etc.)")
print(f"  - Hybrid: Combine both for comprehensive coverage")

print(f"\nOutput files in: {OUTPUT_DIR}")
print(f"\nNext steps:")
print(f"  1. Review integration checklist")
print(f"  2. Add code to biomapper2")
print(f"  3. Update KG_O1_EXPLORATION_REPORT.md with v3 findings")


NOTEBOOK 08 COMPLETE - KG-o1 v3 SERIES COMPLETE

v3 Exploration Summary:
  - Tested /one-hop endpoint for semantic graph traversal
  - Classified predicates as SEMANTIC vs EQUIVALENCY
  - Extracted semantic subgraphs with true relations
  - Implemented BFS path finding with explosion safeguards
  - Generated and validated semantic QA pairs
  - Compared search vs graph traversal performance
  - Quantified v2 vs v3 capability gap
  - Provided production integration recommendations

Key Takeaway:
  - Search: Best for entity resolution (95% EM)
  - Graph: Adds semantic capability (pathways, diseases, etc.)
  - Hybrid: Combine both for comprehensive coverage

Output files in: /home/trentleslie/Insync/projects/biomapper2/notebooks/kg_o1_v3/outputs

Next steps:
  1. Review integration checklist
  2. Add code to biomapper2
  3. Update KG_O1_EXPLORATION_REPORT.md with v3 findings
