# Example 5: Complete Mechanism Analysis

This notebook demonstrates end-to-end multi-KG analysis:
1. Start with space biology differential expression (GeneLab)
2. Map to diseases and pathways (PrimeKG)
3. Find drug targets (PrimeKG)
4. Add tissue context (PrimeKG anatomy)
5. Include geographic disease patterns (SPOKE-OKN)
6. Identify environmental factors (SPOKE-OKN)
7. Create comprehensive visualization

In [None]:
from mcp_space_life_sciences import IntegratedKGClient
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx

client = IntegratedKGClient()

## Step 1: GeneLab - Space Biology Starting Point

In [None]:
# Get differential expression from microgravity experiment
space_genes = client.get_genelab_de_genes(
    assay_id="OSD-48-EXAMPLE",
    log2fc_threshold=1.5,  # Strong changes only
    padj_threshold=0.01
)

print(f"Upregulated in space: {len(space_genes['upregulated'])}")
print(f"Downregulated in space: {len(space_genes['downregulated'])}")
print(f"\nTop upregulated genes: {space_genes['upregulated'][:10]}")

## Step 2: PrimeKG - Complete Gene Annotation

In [None]:
# Enrich upregulated genes with ALL PrimeKG annotations
enrichment = client.enrich_genes_with_primekg(
    gene_names=space_genes['upregulated'][:50],
    include_drugs=True,
    include_diseases=True,
    include_pathways=True,
    include_anatomy=True,
    include_go_terms=True
)

print("\nEnrichment Summary:")
print(f"  Drugs: {len(enrichment.get('drugs', []))}")
print(f"  Diseases: {len(enrichment.get('diseases', []))}")
print(f"  Pathways: {len(enrichment.get('pathways', []))}")
print(f"  Anatomies: {len(enrichment.get('anatomies', []))}")
print(f"  GO Terms: {len(enrichment.get('go_terms', []))}")

## Step 3: Disease Mechanism Discovery

In [None]:
# Focus on top disease from enrichment
top_diseases = enrichment['diseases'][:5]
print(f"\nTop diseases affected by space-altered genes:")
for disease in top_diseases:
    print(f"  - {disease['name']}: {disease['gene_count']} genes")

# Find complete mechanism for top disease
disease_name = top_diseases[0]['name']
mechanism = client.find_drug_disease_mechanisms(
    disease_name=disease_name,
    max_depth=3  # Multi-hop queries
)

print(f"\nMechanism analysis for {disease_name}:")
print(f"  Direct drug targets: {len(mechanism.get('direct_targets', []))}")
print(f"  Pathway-mediated: {len(mechanism.get('pathway_mediated', []))}")
print(f"  Gene-mediated: {len(mechanism.get('gene_mediated', []))}")

## Step 4: Tissue-Specific Analysis

In [None]:
# Find which tissues express space-altered genes
tissue_expression = {}
for anatomy in enrichment.get('anatomies', [])[:10]:
    genes_in_tissue = client.find_genes_in_anatomy(
        anatomy_name=anatomy['name'],
        gene_filter=space_genes['upregulated'][:50]
    )
    tissue_expression[anatomy['name']] = len(genes_in_tissue)

print("\nTissue expression of space-altered genes:")
for tissue, count in sorted(tissue_expression.items(), 
                           key=lambda x: x[1], 
                           reverse=True)[:10]:
    print(f"  {tissue}: {count} genes")

## Step 5: Geographic Disease Context (SPOKE-OKN)

In [None]:
# Get disease prevalence for top diseases
disease_prevalence = client.get_disease_prevalence_by_location(
    disease_names=[d['name'] for d in top_diseases],
    location="United States"
)

df_prevalence = pd.DataFrame(disease_prevalence)
print("\nDisease prevalence by state (top 10):")
print(df_prevalence.nlargest(10, 'prevalence')[['disease', 'state', 'prevalence', 'year']])

## Step 6: Environmental Factors (SPOKE-OKN)

In [None]:
# Get SDoH factors associated with diseases
sdoh_associations = client.find_sdoh_disease_associations(
    diseases=[d['name'] for d in top_diseases[:3]],
    p_value_threshold=0.05
)

print("\nSocial determinants associated with space-related diseases:")
for assoc in sdoh_associations[:10]:
    print(f"  {assoc['sdoh']} → {assoc['disease']}: "
          f"enrichment={assoc['enrichment']:.2f}, p={assoc['p_value']:.4f}")

## Step 7: Drug Countermeasure Discovery

In [None]:
# Find drugs for top pathways
top_pathways = enrichment.get('pathways', [])[:5]
pathway_drugs = client.find_drugs_for_pathway(
    pathway_names=[p['name'] for p in top_pathways]
)

print("\nPotential drug countermeasures:")
for drug in pathway_drugs[:10]:
    print(f"  {drug['drug_name']}:")
    print(f"    Targets: {drug['target_count']} genes")
    print(f"    Pathways: {drug['pathway_count']}")

# Check for drug-drug interactions
top_drug_names = [d['drug_name'] for d in pathway_drugs[:5]]
interactions = client.find_drug_interactions(
    drug_names=top_drug_names
)

if interactions:
    print("\nWarning - Drug interactions detected:")
    for interaction in interactions:
        print(f"  {interaction['drug1']} ↔ {interaction['drug2']}: "
              f"{interaction['risk_level']}")

## Step 8: Comprehensive Visualization

In [None]:
# Create multi-KG network
fig = client.create_multi_kg_network(
    genes=space_genes['upregulated'][:20],
    diseases=[d['name'] for d in top_diseases[:3]],
    drugs=[d['drug_name'] for d in pathway_drugs[:5]],
    pathways=[p['name'] for p in top_pathways[:3]],
    anatomies=[a['name'] for a in enrichment.get('anatomies', [])[:3]]
)

plt.title("Complete Mechanism: Space Biology → Diseases → Drugs → Geography")
plt.show()

## Step 9: Generate Comprehensive Report

In [None]:
# Compile complete analysis
report = {
    "space_biology": {
        "experiment": "OSD-48 Microgravity",
        "upregulated_genes": len(space_genes['upregulated']),
        "downregulated_genes": len(space_genes['downregulated'])
    },
    "biological_impact": {
        "affected_diseases": len(enrichment['diseases']),
        "affected_pathways": len(enrichment['pathways']),
        "affected_tissues": len(tissue_expression)
    },
    "therapeutic_options": {
        "drug_targets": len(pathway_drugs),
        "drug_interactions": len(interactions) if interactions else 0
    },
    "geographic_context": {
        "disease_prevalence_records": len(df_prevalence),
        "sdoh_associations": len(sdoh_associations)
    }
}

print("\n" + "="*60)
print("COMPLETE MECHANISM ANALYSIS REPORT")
print("="*60)

for section, data in report.items():
    print(f"\n{section.upper().replace('_', ' ')}:")
    for key, value in data.items():
        print(f"  {key.replace('_', ' ').title()}: {value}")

print("\n" + "="*60)

## Summary

This notebook demonstrated complete multi-KG analysis:

### Data Flow:
1. ✅ **GeneLab**: Space biology differential expression
2. ✅ **PrimeKG**: Disease/pathway/drug/anatomy/GO annotations
3. ✅ **SPOKE-OKN**: Geographic prevalence, SDoH, environmental factors

### Insights Generated:
- Space-induced gene changes mapped to diseases
- Biological pathways affected by spaceflight
- Tissue-specific expression patterns
- Potential drug countermeasures
- Geographic disease distribution
- Social determinants influencing risk
- Drug interaction warnings

### Integration Value:
By connecting all three knowledge graphs, we achieved:
- **Systems-level understanding** from genes → pathways → diseases → geography
- **Therapeutic discovery** with safety checks (interactions)
- **Risk assessment** combining biology and social factors
- **Mechanistic insights** spanning molecular to population level

This comprehensive approach would be impossible with any single knowledge graph!