# Notebook 01: One-Hop API Discovery (GO/NO-GO Gate)

## Critical Mission

**v2 concluded KRAKEN has a "semantic gap" but NEVER tested the `/one-hop` endpoint.**

This notebook definitively answers: **Does `/one-hop` return semantic relations?**

This is a **GATE CHECK** - if `/one-hop` doesn't exist or only returns vocabulary equivalency,
we pivot immediately rather than continuing with NB02-08.

## Decision Criteria

| Finding | Decision |
|---------|----------|
| `/one-hop` returns `participates_in`, `catalyzed_by`, etc. | **GO** - proceed with v3 |
| `/one-hop` only returns `same_as`, `equivalent_to` | **PIVOT** - still vocabulary, not semantic |
| `/one-hop` doesn't exist (404) | **PIVOT** - use Reactome/KEGG fallback |
| Endpoint exists but empty results | **INVESTIGATE** - may need different entity types |

In [2]:
# Standard imports
import sys
import json
from pathlib import Path
from datetime import datetime
from collections import Counter, defaultdict

# Add project root to path
PROJECT_ROOT = Path.cwd().parents[1]
sys.path.insert(0, str(PROJECT_ROOT / 'src'))
sys.path.insert(0, str(Path.cwd()))

# Import utilities
from kg_o1_v3_utils import (
    test_one_hop, get_predicates, parse_one_hop_edges, get_semantic_edges,
    classify_predicate, classify_all_predicates,
    hybrid_search, save_json, load_json,
    TEST_ENTITIES, SEMANTIC_PREDICATES, EQUIVALENCY_PREDICATES,
)
from biomapper2.utils import kestrel_request

# Output directory
OUTPUT_DIR = Path.cwd() / 'outputs'
OUTPUT_DIR.mkdir(exist_ok=True)

print(f"Project root: {PROJECT_ROOT}")
print(f"Output directory: {OUTPUT_DIR}")

Project root: /home/trentleslie/Insync/projects/biomapper2
Output directory: /home/trentleslie/Insync/projects/biomapper2/notebooks/kg_o1_v3/outputs


## 1. Test /one-hop Endpoint Existence

First, let's check if the endpoint exists at all.

In [3]:
# Test 1: Check if /one-hop endpoint exists
print("="*60)
print("TEST 1: Checking /one-hop endpoint existence")
print("="*60)

test_entity_id, test_entity_name = TEST_ENTITIES[0]  # glucose
print(f"\nTest entity: {test_entity_name} ({test_entity_id})")

one_hop_result = test_one_hop(test_entity_id, direction="both")

# Check for 404 error
if isinstance(one_hop_result, dict) and one_hop_result.get('error') == 'endpoint_not_found':
    print("\n" + "!" * 60)
    print("CRITICAL: /one-hop endpoint does NOT exist (404)")
    print("!" * 60)
    endpoint_exists = False
else:
    print(f"\nEndpoint exists! Response type: {type(one_hop_result).__name__}")
    endpoint_exists = True
    
    # Show response structure
    if isinstance(one_hop_result, dict):
        print(f"Response keys: {list(one_hop_result.keys())}")
    elif isinstance(one_hop_result, list):
        print(f"Response is a list with {len(one_hop_result)} items")
        if one_hop_result:
            print(f"First item keys: {list(one_hop_result[0].keys()) if isinstance(one_hop_result[0], dict) else 'N/A'}")

TEST 1: Checking /one-hop endpoint existence

Test entity: glucose (CHEBI:4167)

Endpoint exists! Response type: dict
Response keys: ['edge_schema', 'results', 'nodes']


In [4]:
# If endpoint doesn't exist, check alternative endpoints
if not endpoint_exists:
    print("\nChecking alternative endpoints...")
    
    alternative_endpoints = [
        ('get-edges', {'node_id': test_entity_id}),
        ('predicates', None),
        ('similar-nodes', {'node_id': test_entity_id, 'limit': 5}),
    ]
    
    for endpoint, payload in alternative_endpoints:
        try:
            if payload:
                result = kestrel_request('POST', endpoint, json=payload)
            else:
                result = kestrel_request('GET', endpoint)
            print(f"  {endpoint}: EXISTS (type: {type(result).__name__})")
        except Exception as e:
            status = getattr(e, 'response', None)
            if status and hasattr(status, 'status_code'):
                print(f"  {endpoint}: {status.status_code}")
            else:
                print(f"  {endpoint}: ERROR - {e}")

## 2. Test Multiple Entities and Directions

If the endpoint exists, test it with multiple known metabolites.

In [5]:
# Test 2: Multiple entities and directions
if endpoint_exists:
    print("="*60)
    print("TEST 2: Testing multiple entities and directions")
    print("="*60)
    
    all_results = []
    
    for entity_id, entity_name in TEST_ENTITIES:
        for direction in ["forward", "reverse", "both"]:
            result = test_one_hop(entity_id, direction=direction)
            
            # Use parse_one_hop_edges to properly extract edges
            edges = parse_one_hop_edges(result)
            
            all_results.append({
                'entity_id': entity_id,
                'entity_name': entity_name,
                'direction': direction,
                'num_edges': len(edges),
                'edges': edges,
            })
            
            print(f"{entity_name} ({direction}): {len(edges)} edges")
else:
    all_results = []

TEST 2: Testing multiple entities and directions
glucose (forward): 105 edges
glucose (reverse): 78 edges
glucose (both): 141 edges
NAD+ (forward): 52 edges
NAD+ (reverse): 46 edges
NAD+ (both): 64 edges
cholesterol (forward): 84 edges
cholesterol (reverse): 62 edges
cholesterol (both): 98 edges
alanine (forward): 105 edges
alanine (reverse): 78 edges
alanine (both): 141 edges
ATP (forward): 16 edges
ATP (reverse): 44 edges
ATP (both): 55 edges
water (forward): 87 edges
water (reverse): 63 edges
water (both): 113 edges


In [6]:
# Summarize results
if all_results:
    total_edges = sum(r['num_edges'] for r in all_results)
    entities_with_edges = len(set(r['entity_id'] for r in all_results if r['num_edges'] > 0))
    
    print(f"\nSummary:")
    print(f"  Total edges found: {total_edges}")
    print(f"  Entities with edges: {entities_with_edges}/{len(TEST_ENTITIES)}")
    
    # Show sample edge structure (now properly parsed as dict)
    for r in all_results:
        if r['edges']:
            print(f"\nSample edge structure from {r['entity_name']}:")
            edge = r['edges'][0]
            print(f"  subject_id: {edge.get('subject_id')}")
            print(f"  predicate: {edge.get('predicate')}")
            print(f"  object_id: {edge.get('object_id')}")
            print(f"  end_node_id: {edge.get('end_node_id')}")
            break


Summary:
  Total edges found: 1432
  Entities with edges: 6/6

Sample edge structure from glucose:
  subject_id: CHEBI:4167
  predicate: biolink:mentioned_in_clinical_trials_for
  object_id: MONDO:0004946
  end_node_id: MONDO:0004946


## 3. Analyze Predicates (CRITICAL CHECK)

This is the **most important analysis**. We need to determine if the predicates are:
- **SEMANTIC**: `participates_in`, `catalyzes`, `treats`, etc.
- **EQUIVALENCY**: `same_as`, `equivalent_to`, `xref`, etc.

In [7]:
# Test 3: Extract and classify all predicates
if all_results:
    print("="*60)
    print("TEST 3: Predicate Analysis (CRITICAL CHECK)")
    print("="*60)
    
    # Collect all predicates (edges are now properly parsed dicts)
    predicate_counts = Counter()
    predicate_examples = defaultdict(list)
    
    for r in all_results:
        for edge in r['edges']:
            pred = edge.get('predicate', 'unknown')
            predicate_counts[pred] += 1
            if len(predicate_examples[pred]) < 3:
                predicate_examples[pred].append({
                    'subject': r['entity_name'],
                    'subject_id': edge.get('subject_id'),
                    'object_id': edge.get('object_id', edge.get('end_node_id', 'unknown')),
                })
    
    print(f"\nUnique predicates found: {len(predicate_counts)}")
    print("\nPredicate frequency:")
    for pred, count in predicate_counts.most_common(20):
        classification = classify_predicate(pred)
        marker = "SEMANTIC" if classification == 'semantic' else "EQUIV" if classification == 'equivalency' else "???"
        print(f"  {count:4d}x  [{marker:8s}] {pred}")

TEST 3: Predicate Analysis (CRITICAL CHECK)

Unique predicates found: 18

Predicate frequency:
   490x  [SEMANTIC] biolink:related_to
   359x  [SEMANTIC] biolink:subclass_of
   129x  [SEMANTIC] biolink:has_participant
    92x  [SEMANTIC] biolink:has_input
    80x  [SEMANTIC] biolink:in_clinical_trials_for
    67x  [SEMANTIC] biolink:close_match
    50x  [SEMANTIC] biolink:has_chemical_role
    46x  [SEMANTIC] biolink:treats
    34x  [SEMANTIC] biolink:mentioned_in_clinical_trials_for
    28x  [SEMANTIC] biolink:chemically_similar_to
    18x  [SEMANTIC] biolink:has_output
    11x  [SEMANTIC] biolink:applied_to_treat
    10x  [SEMANTIC] biolink:has_part
     7x  [???     ] biolink:physically_interacts_with
     4x  [???     ] biolink:produces
     4x  [EQUIV   ] biolink:same_as
     2x  [SEMANTIC] biolink:affects
     1x  [SEMANTIC] biolink:causes


In [8]:
# Classify all predicates
if all_results and predicate_counts:
    all_predicates = list(predicate_counts.keys())
    classification = classify_all_predicates(all_predicates)
    
    semantic_count = sum(predicate_counts[p] for p in classification['semantic'])
    equiv_count = sum(predicate_counts[p] for p in classification['equivalency'])
    unknown_count = sum(predicate_counts[p] for p in classification['unknown'])
    total_count = semantic_count + equiv_count + unknown_count
    
    print("\n" + "="*60)
    print("PREDICATE CLASSIFICATION SUMMARY")
    print("="*60)
    print(f"\nSemantic predicates: {len(classification['semantic'])} types ({semantic_count} edges, {100*semantic_count/total_count:.1f}%)")
    for p in classification['semantic']:
        print(f"  - {p}")
    
    print(f"\nEquivalency predicates: {len(classification['equivalency'])} types ({equiv_count} edges, {100*equiv_count/total_count:.1f}%)")
    for p in classification['equivalency']:
        print(f"  - {p}")
    
    print(f"\nUnknown predicates: {len(classification['unknown'])} types ({unknown_count} edges, {100*unknown_count/total_count:.1f}%)")
    for p in classification['unknown']:
        print(f"  - {p}")


PREDICATE CLASSIFICATION SUMMARY

Semantic predicates: 15 types (1417 edges, 99.0%)
  - biolink:mentioned_in_clinical_trials_for
  - biolink:in_clinical_trials_for
  - biolink:treats
  - biolink:related_to
  - biolink:applied_to_treat
  - biolink:has_chemical_role
  - biolink:subclass_of
  - biolink:close_match
  - biolink:has_output
  - biolink:has_participant
  - biolink:has_part
  - biolink:has_input
  - biolink:chemically_similar_to
  - biolink:affects
  - biolink:causes

Equivalency predicates: 1 types (4 edges, 0.3%)
  - biolink:same_as

Unknown predicates: 2 types (11 edges, 0.8%)
  - biolink:produces
  - biolink:physically_interacts_with


## 4. Semantic Relation Examples

If semantic predicates exist, show examples of the entity-relation-entity triplets.

In [9]:
# Show semantic relation examples
if all_results and predicate_counts:
    semantic_examples = []
    
    for p in classification.get('semantic', []):
        if p in predicate_examples:
            for ex in predicate_examples[p]:
                semantic_examples.append({
                    'subject': ex['subject'],
                    'predicate': p,
                    'object': ex.get('object_id', ex.get('object', 'unknown')),
                })
    
    if semantic_examples:
        print("\n" + "="*60)
        print("SEMANTIC RELATION EXAMPLES (TRUE MULTI-HOP CAPABILITY!)")
        print("="*60)
        for ex in semantic_examples[:10]:
            print(f"  {ex['subject']} --[{ex['predicate']}]--> {ex['object']}")
    else:
        print("\n" + "!"*60)
        print("NO SEMANTIC RELATIONS FOUND")
        print("!"*60)
        print("This confirms v2's finding: KRAKEN is vocabulary-focused, not semantic.")


SEMANTIC RELATION EXAMPLES (TRUE MULTI-HOP CAPABILITY!)
  glucose --[biolink:mentioned_in_clinical_trials_for]--> MONDO:0004946
  glucose --[biolink:mentioned_in_clinical_trials_for]--> MONDO:0005148
  glucose --[biolink:mentioned_in_clinical_trials_for]--> MONDO:0002909
  glucose --[biolink:in_clinical_trials_for]--> MONDO:0004946
  glucose --[biolink:in_clinical_trials_for]--> MONDO:0004946
  glucose --[biolink:in_clinical_trials_for]--> MONDO:0005148
  glucose --[biolink:treats]--> MONDO:0004946
  glucose --[biolink:treats]--> MONDO:0004946
  glucose --[biolink:treats]--> MONDO:0005148
  glucose --[biolink:related_to]--> MONDO:0004946


## 5. Test with Predicate Filters

Try filtering for specific semantic predicates.

In [10]:
# Test predicate filtering
# NOTE: The API may not support predicate_filter parameter - we'll test and document
if endpoint_exists:
    print("="*60)
    print("TEST 5: Predicate Filtering (API capability check)")
    print("="*60)
    
    # Instead of filtering at API level (which may not be supported),
    # we filter the edges we already retrieved
    print("\nPredicate filtering by post-processing retrieved edges:")
    
    target_predicates = [
        'biolink:participates_in',
        'biolink:treats', 
        'biolink:in_clinical_trials_for',
        'biolink:associated_with',
        'biolink:related_to',
    ]
    
    for entity_id, entity_name in TEST_ENTITIES[:3]:
        print(f"\n{entity_name} ({entity_id}):")
        
        # Get all edges first
        result = test_one_hop(entity_id, direction="both", limit=50)
        all_edges = parse_one_hop_edges(result)
        
        for target_pred in target_predicates:
            matching_edges = [e for e in all_edges if e.get('predicate') == target_pred]
            if matching_edges:
                print(f"  {target_pred}: {len(matching_edges)} edges")
                for e in matching_edges[:2]:
                    obj_id = e.get('object_id', e.get('end_node_id', '?'))
                    print(f"    --> {obj_id}")

TEST 5: Predicate Filtering (API capability check)

Predicate filtering by post-processing retrieved edges:

glucose (CHEBI:4167):
  biolink:treats: 10 edges
    --> MONDO:0004946
    --> MONDO:0004946
  biolink:in_clinical_trials_for: 14 edges
    --> MONDO:0004946
    --> MONDO:0004946
  biolink:related_to: 104 edges
    --> MONDO:0004946
    --> CHEBI:4167

NAD+ (CHEBI:15846):
  biolink:treats: 1 edges
    --> UMLS:C1112459
  biolink:in_clinical_trials_for: 3 edges
    --> MONDO:0100233
    --> MONDO:0100096
  biolink:related_to: 52 edges
    --> CHEBI:13389
    --> CHEBI:13389

cholesterol (CHEBI:16113):
  biolink:treats: 5 edges
    --> MONDO:0021187
    --> MONDO:0010035
  biolink:in_clinical_trials_for: 10 edges
    --> HP:0003124
    --> MONDO:0021187
  biolink:related_to: 66 edges
    --> HP:0003124
    --> CHEBI:16113


## 6. Test Category Filtering

Try filtering for specific end categories like Pathway, Disease, etc.

In [11]:
# Test category filtering
# Similar to predicate filtering, we may need to post-process rather than API-filter
if endpoint_exists:
    print("="*60)
    print("TEST 6: Category Analysis (from edges)")
    print("="*60)
    
    # Analyze categories from the edges we've collected
    category_counts = Counter()
    category_examples = defaultdict(list)
    
    for r in all_results:
        for edge in r['edges']:
            # The end_node_id can tell us category info
            end_node = edge.get('end_node_id', edge.get('object_id', ''))
            
            # Extract category from node ID prefix
            if ':' in end_node:
                prefix = end_node.split(':')[0]
                category_counts[prefix] += 1
                if len(category_examples[prefix]) < 3:
                    category_examples[prefix].append({
                        'subject': r['entity_name'],
                        'predicate': edge.get('predicate', ''),
                        'object_id': end_node,
                    })
    
    print("\nEnd node category distribution (by ID prefix):")
    for cat, count in category_counts.most_common(15):
        print(f"  {cat}: {count} edges")
        for ex in category_examples[cat][:2]:
            print(f"    {ex['subject']} --[{ex['predicate']}]--> {ex['object_id']}")

TEST 6: Category Analysis (from edges)

End node category distribution (by ID prefix):
  CHEBI: 766 edges
    glucose --[biolink:has_chemical_role]--> CHEBI:78675
    glucose --[biolink:related_to]--> CHEBI:78675
  MONDO: 225 edges
    glucose --[biolink:mentioned_in_clinical_trials_for]--> MONDO:0004946
    glucose --[biolink:in_clinical_trials_for]--> MONDO:0004946
  GO: 184 edges
    glucose --[biolink:has_output]--> GO:0006094
    glucose --[biolink:has_output]--> GO:0006094
  UMLS: 95 edges
    glucose --[biolink:related_to]--> UMLS:C4082776
    glucose --[biolink:related_to]--> UMLS:C4082776
  SMPDB: 66 edges
    NAD+ --[biolink:has_participant]--> SMPDB:SMP0017930
    NAD+ --[biolink:has_participant]--> SMPDB:SMP0017930
  MESH: 31 edges
    glucose --[biolink:subclass_of]--> MESH:D000429
    glucose --[biolink:subclass_of]--> MESH:D000429
  HP: 22 edges
    cholesterol --[biolink:related_to]--> HP:0003124
    cholesterol --[biolink:related_to]--> HP:0003124
  PathWhiz: 12 edges


## 7. Check /predicates Endpoint

Get the full list of available predicates in KRAKEN.

In [12]:
# Check /predicates endpoint
print("="*60)
print("TEST 7: /predicates Endpoint")
print("="*60)

all_predicates_from_api = get_predicates()

if all_predicates_from_api:
    print(f"\nTotal predicates from API: {len(all_predicates_from_api)}")
    
    api_classification = classify_all_predicates(all_predicates_from_api)
    
    print(f"\nClassification:")
    print(f"  Semantic: {len(api_classification['semantic'])}")
    for p in api_classification['semantic']:
        print(f"    - {p}")
    
    print(f"\n  Equivalency: {len(api_classification['equivalency'])}")
    for p in api_classification['equivalency']:
        print(f"    - {p}")
    
    print(f"\n  Unknown: {len(api_classification['unknown'])}")
    for p in api_classification['unknown'][:10]:  # Show first 10
        print(f"    - {p}")
    if len(api_classification['unknown']) > 10:
        print(f"    ... and {len(api_classification['unknown']) - 10} more")
else:
    print("\n/predicates endpoint not available or returned empty results")

TEST 7: /predicates Endpoint

Total predicates from API: 87

Classification:
  Semantic: 25
    - biolink:actively_involved_in
    - biolink:affects
    - biolink:applied_to_treat
    - biolink:associated_with
    - biolink:capable_of
    - biolink:catalyzes
    - biolink:causes
    - biolink:chemically_similar_to
    - biolink:close_match
    - biolink:coexists_with
    - biolink:contributes_to
    - biolink:correlated_with
    - biolink:expressed_in
    - biolink:has_chemical_role
    - biolink:has_input
    - biolink:has_metabolite
    - biolink:has_output
    - biolink:has_part
    - biolink:has_participant
    - biolink:in_clinical_trials_for
    - biolink:located_in
    - biolink:participates_in
    - biolink:related_to
    - biolink:subclass_of
    - biolink:treats

  Equivalency: 1
    - biolink:same_as

  Unknown: 61
    - biolink:affects_likelihood_of
    - biolink:affects_response_to
    - biolink:ameliorates_condition
    - biolink:assesses
    - biolink:associated_with_res

## 8. GO/NO-GO Decision

Based on all the tests, make the final decision.

In [13]:
# GO/NO-GO Decision
print("="*60)
print("GO/NO-GO DECISION")
print("="*60)

decision_data = {
    'timestamp': datetime.now().isoformat(),
    'endpoint_exists': endpoint_exists,
    'total_edges_found': sum(r['num_edges'] for r in all_results) if all_results else 0,
    'unique_predicates': len(predicate_counts) if 'predicate_counts' in dir() and predicate_counts else 0,
    'semantic_predicate_count': len(classification.get('semantic', [])) if 'classification' in dir() else 0,
    'equivalency_predicate_count': len(classification.get('equivalency', [])) if 'classification' in dir() else 0,
    'semantic_edge_percent': 0,
    'decision': None,
    'reasoning': None,
}

# Calculate semantic percentage
if 'classification' in dir() and 'predicate_counts' in dir():
    semantic_count = sum(predicate_counts.get(p, 0) for p in classification.get('semantic', []))
    total_count = sum(predicate_counts.values())
    if total_count > 0:
        decision_data['semantic_edge_percent'] = 100 * semantic_count / total_count

# Make decision
if not endpoint_exists:
    decision_data['decision'] = 'PIVOT'
    decision_data['reasoning'] = '/one-hop endpoint does not exist (404). Must use Reactome/KEGG fallback.'
elif decision_data['total_edges_found'] == 0:
    decision_data['decision'] = 'INVESTIGATE'
    decision_data['reasoning'] = 'Endpoint exists but returned no edges. May need different entity types or parameters.'
elif decision_data['semantic_predicate_count'] == 0:
    decision_data['decision'] = 'PIVOT'
    decision_data['reasoning'] = 'No semantic predicates found. Confirms v2: KRAKEN is vocabulary-focused only.'
elif decision_data['semantic_edge_percent'] < 10:
    decision_data['decision'] = 'PIVOT'
    decision_data['reasoning'] = f'Only {decision_data["semantic_edge_percent"]:.1f}% semantic edges. Insufficient for multi-hop reasoning.'
else:
    decision_data['decision'] = 'GO'
    decision_data['reasoning'] = f'Found {decision_data["semantic_predicate_count"]} semantic predicates with {decision_data["semantic_edge_percent"]:.1f}% semantic edges. Proceed with v3.'

# Display decision
print(f"\n{'='*40}")
print(f"DECISION: {decision_data['decision']}")
print(f"{'='*40}")
print(f"\nReasoning: {decision_data['reasoning']}")
print(f"\nKey metrics:")
print(f"  - Endpoint exists: {decision_data['endpoint_exists']}")
print(f"  - Total edges: {decision_data['total_edges_found']}")
print(f"  - Semantic predicates: {decision_data['semantic_predicate_count']}")
print(f"  - Semantic edge %: {decision_data['semantic_edge_percent']:.1f}%")

GO/NO-GO DECISION

DECISION: GO

Reasoning: Found 15 semantic predicates with 99.0% semantic edges. Proceed with v3.

Key metrics:
  - Endpoint exists: True
  - Total edges: 1432
  - Semantic predicates: 15
  - Semantic edge %: 99.0%


## 9. Save Audit Results

In [14]:
# Save audit results
audit_data = {
    'timestamp': datetime.now().isoformat(),
    'go_no_go_decision': decision_data,
    'endpoint_tests': {
        'one_hop_exists': endpoint_exists,
        'test_entities': TEST_ENTITIES,
        'results_per_entity': [
            {
                'entity_id': r['entity_id'],
                'entity_name': r['entity_name'],
                'direction': r['direction'],
                'num_edges': r['num_edges'],
            }
            for r in all_results
        ] if all_results else [],
    },
    'predicate_analysis': {
        'unique_predicates': list(predicate_counts.keys()) if 'predicate_counts' in dir() and predicate_counts else [],
        'predicate_counts': dict(predicate_counts) if 'predicate_counts' in dir() and predicate_counts else {},
        'classification': classification if 'classification' in dir() else {},
        'predicate_examples': dict(predicate_examples) if 'predicate_examples' in dir() else {},
    },
    'api_predicates': all_predicates_from_api if 'all_predicates_from_api' in dir() else [],
}

save_json(audit_data, OUTPUT_DIR / 'one_hop_api_audit.json')
print(f"\nAudit results saved to: {OUTPUT_DIR / 'one_hop_api_audit.json'}")


Audit results saved to: /home/trentleslie/Insync/projects/biomapper2/notebooks/kg_o1_v3/outputs/one_hop_api_audit.json


## Summary

This notebook tested the `/one-hop` endpoint to determine if KRAKEN supports true semantic graph traversal.

**Key findings are saved to `outputs/one_hop_api_audit.json`.**

### Next Steps Based on Decision:

- **GO**: Proceed to NB02 (Predicate Relationship Mapping)
- **PIVOT (no semantic relations)**: Document finding, v3 not viable with current KRAKEN
- **INVESTIGATE**: Try different entity types or parameters before deciding

In [15]:
# Final summary
print("\n" + "="*60)
print("NOTEBOOK 01 COMPLETE")
print("="*60)
print(f"\nDecision: {decision_data['decision']}")
print(f"\nNext step: ", end="")

if decision_data['decision'] == 'GO':
    print("Proceed to NB02: Predicate & Relationship Mapping")
elif decision_data['decision'] == 'PIVOT':
    print("Document finding. Consider Reactome/KEGG fallback approach.")
else:
    print("Investigate further with different entity types or parameters.")


NOTEBOOK 01 COMPLETE

Decision: GO

Next step: Proceed to NB02: Predicate & Relationship Mapping
