# HPCMA Atlas Query Examples

## Interactive Tutorial for the Hypertension Pan-Comorbidity Multi-Modal Atlas

This notebook provides practical examples for querying the HPCMA biological atlas database.

**Repository**: https://github.com/Benjamin-JHou/HPCMA

## Setup: Load Required Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("‚úÖ Libraries loaded successfully!")

## Example 1: Query Gene ‚Üí Show Diseases + Cell Types

**Query Type**: Gene-centric exploration

**Input**: Gene symbol (e.g., ACE, NOS3, UMOD)
**Output**: Associated diseases, cell types, mechanisms, and interventions

In [None]:
# Load the master atlas table
master_df = pd.read_csv('../atlas_resource/hypertension_atlas_master_table.csv')

# Function to query gene information
def query_gene_atlas(gene_symbol):
    """
    Query comprehensive information for a specific gene
    
    Parameters:
    -----------
    gene_symbol : str
        HGNC gene symbol (e.g., 'ACE', 'NOS3')
    """
    gene_data = master_df[master_df['gene'] == gene_symbol]
    
    if gene_data.empty:
        print(f"‚ö†Ô∏è Gene '{gene_symbol}' not found in atlas")
        return None
    
    print(f"\n{'='*60}")
    print(f"üß¨ ATLAS QUERY: {gene_symbol}")
    print(f"{'='*60}\n")
    
    # Basic information
    diseases = gene_data['disease'].unique()
    cell_types = gene_data['cell_type'].unique()
    tissues = gene_data['Tissue'].unique()
    mechanism = gene_data['mechanism_axis'].iloc[0]
    
    print(f"üìä DISEASES ({len(diseases)}):\n  " + "\n  ".join(diseases))
    print(f"\nüî¨ CELL TYPES ({len(cell_types)}):\n  " + "\n  ".join(cell_types))
    print(f"\nüè• TISSUES ({len(tissues)}):\n  " + "\n  ".join(tissues))
    print(f"\nüîÑ MECHANISM AXIS:\n  {mechanism}")
    
    # MR and coloc evidence
    print(f"\nüìà CAUSAL EVIDENCE:")
    for _, row in gene_data.iterrows():
        print(f"  ‚Ä¢ {row['disease']}:")
        print(f"      MR Effect: {row['mr_beta']:.3f}")
        print(f"      Coloc PPH4: {row['pph4']:.2f}")
        print(f"      Priority Score: {row['priority_score']:.1f}")
    
    # Clinical intervention
    intervention = gene_data['clinical_intervention'].iloc[0]
    print(f"\nüíä CLINICAL INTERVENTION:\n  {intervention}")
    
    print(f"\n{'='*60}\n")
    
    return gene_data

# Example: Query ACE gene
ace_data = query_gene_atlas('ACE')

In [None]:
# Query another gene: NOS3 (Nitric Oxide Synthase 3)
nos3_data = query_gene_atlas('NOS3')

## Example 2: Query Disease ‚Üí Show Mechanism Axis

**Query Type**: Disease-centric exploration

**Input**: Disease name (e.g., CAD, CKD, Stroke)
**Output**: All causal genes, cell types, and mechanism axes

In [None]:
# Load mechanism clusters
mech_df = pd.read_csv('../atlas_resource/mechanism_axis_clusters.csv')

def query_disease_mechanism(disease_name):
    """
    Query disease mechanism axis and associated genes
    
    Parameters:
    -----------
    disease_name : str
        Disease abbreviation (e.g., 'CAD', 'CKD', 'Stroke')
    """
    # Filter by disease mention in mechanism description
    disease_mech = mech_df[mech_df['Axis_Description'].str.contains(
        disease_name, case=False, na=False
    )]
    
    if disease_mech.empty:
        # Try matching in master table
        disease_genes = master_df[master_df['disease'] == disease_name]
        if disease_genes.empty:
            print(f"‚ö†Ô∏è Disease '{disease_name}' not found")
            return None
        
        print(f"\n{'='*60}")
        print(f"üè• DISEASE: {disease_name}")
        print(f"{'='*60}\n")
        
        mechanism_axes = disease_genes['mechanism_axis'].unique()
        print(f"üîÑ MECHANISM AXES ({len(mechanism_axes)}):")
        for axis in mechanism_axes:
            print(f"  ‚Ä¢ {axis}")
        
        print(f"\nüß¨ CAUSAL GENES ({len(disease_genes)} total):")
        for _, row in disease_genes.iterrows():
            print(f"  ‚Ä¢ {row['gene']}")
            print(f"      Cell Type: {row['cell_type']}")
            print(f"      Tissue: {row['Tissue']}")
            print(f"      MR Beta: {row['mr_beta']:.3f}")
            print(f"      Priority: {row['priority_score']:.1f}")
        
        print(f"\n{'='*60}\n")
        return disease_genes
    
    # Display from mechanism table
    print(f"\n{'='*60}")
    print(f"üè• DISEASE: {disease_name}")
    print(f"{'='*60}\n")
    
    axis_name = disease_mech['mechanism_axis'].iloc[0]
    axis_desc = disease_mech['Axis_Description'].iloc[0]
    bio_mech = disease_mech['Biological_Mechanism'].iloc[0]
    
    print(f"üîÑ MECHANISM AXIS:\n  {axis_name}")
    print(f"\nüìã DESCRIPTION:\n  {axis_desc}")
    print(f"\nüî¨ BIOLOGICAL MECHANISM:\n  {bio_mech}")
    
    genes = disease_mech['gene'].unique()
    cell_types = disease_mech['cell_type'].unique()
    
    print(f"\nüß¨ KEY GENES ({len(genes)}):")
    for gene in genes:
        gene_row = disease_mech[disease_mech['gene'] == gene].iloc[0]
        print(f"  ‚Ä¢ {gene} ({gene_row['cell_type']}) - Score: {gene_row['mechanism_score']:.2f}")
    
    print(f"\nüî¨ CELL TYPES ({len(cell_types)}):")
    for ct in cell_types:
        print(f"  ‚Ä¢ {ct}")
    
    print(f"\n{'='*60}\n")
    
    return disease_mech

# Example: Query Chronic Kidney Disease (CKD)
ckd_data = query_disease_mechanism('CKD')

In [None]:
# Query another disease: CAD (Coronary Artery Disease)
cad_data = query_disease_mechanism('CAD')

## Example 3: Input Patient PRS ‚Üí Show MMRS Risk

**Query Type**: Patient risk assessment

**Input**: Patient Polygenic Risk Scores (PRS)
**Output**: Multi-Modal Risk Score (MMRS) and individual disease risks

In [None]:
# Load risk scores
risk_df = pd.read_csv('../results/multimodal_risk_score.csv')

def query_patient_risk(patient_id):
    """
    Query multi-modal risk profile for a specific patient
    
    Parameters:
    -----------
    patient_id : str
        Patient identifier
    """
    patient_data = risk_df[risk_df['Sample_ID'] == patient_id]
    
    if patient_data.empty:
        print(f"‚ö†Ô∏è Patient '{patient_id}' not found")
        return None
    
    patient = patient_data.iloc[0]
    
    print(f"\n{'='*60}")
    print(f"üë§ PATIENT RISK PROFILE: {patient_id}")
    print(f"{'='*60}\n")
    
    diseases = ['CAD', 'Stroke', 'CKD', 'T2D', 'Depression', 'AD']
    
    # Risk stratification
    def get_risk_level(prob):
        if prob < 0.15:
            return 'üü¢ LOW'
        elif prob < 0.30:
            return 'üü° MODERATE'
        elif prob < 0.45:
            return 'üü† HIGH'
        else:
            return 'üî¥ VERY HIGH'
    
    print("üìä INDIVIDUAL DISEASE RISKS:")
    risks = []
    for disease in diseases:
        prob = patient[disease]
        level = get_risk_level(prob)
        risks.append(prob)
        print(f"  {disease:12s}: {prob:.3f}  {level}")
    
    # Calculate MMRS
    mmrs = np.mean(risks)
    mmrs_level = get_risk_level(mmrs)
    
    print(f"\nüéØ MULTI-MODAL RISK SCORE (MMRS):")
    print(f"  Composite Score: {mmrs:.3f}")
    print(f"  Risk Category:   {mmrs_level}")
    
    # Find highest risk disease
    max_disease = diseases[np.argmax(risks)]
    max_risk = max(risks)
    print(f"\n‚ö†Ô∏è  HIGHEST RISK: {max_disease} ({max_risk:.3f})")
    
    print(f"\n{'='*60}\n")
    
    return patient

# Query first patient in dataset
first_patient = risk_df['Sample_ID'].iloc[0]
patient_profile = query_patient_risk(first_patient)

In [None]:
# Visualize risk distribution
def plot_risk_distribution():
    """Plot risk distribution across all diseases"""
    diseases = ['CAD', 'Stroke', 'CKD', 'T2D', 'Depression', 'AD']
    
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    axes = axes.flatten()
    
    for idx, disease in enumerate(diseases):
        ax = axes[idx]
        risks = risk_df[disease]
        
        ax.hist(risks, bins=30, color='steelblue', edgecolor='black', alpha=0.7)
        ax.axvline(risks.mean(), color='red', linestyle='--', linewidth=2, label=f'Mean: {risks.mean():.3f}')
        ax.set_xlabel('Risk Probability')
        ax.set_ylabel('Count')
        ax.set_title(f'{disease} Risk Distribution')
        ax.legend()
    
    plt.tight_layout()
    plt.show()
    
    # Summary statistics
    print("\nüìä RISK DISTRIBUTION SUMMARY:")
    for disease in diseases:
        risks = risk_df[disease]
        print(f"{disease:12s}: Mean={risks.mean():.3f}, Std={risks.std():.3f}, Range=[{risks.min():.3f}, {risks.max():.3f}]")

plot_risk_distribution()

## Example 4: Cross-Disease Gene Analysis

Find genes that influence multiple diseases (pleiotropic effects)

In [None]:
# Load cross-disease influence table
cross_df = pd.read_csv('../results/cross_disease_gene_influence_score.csv')

# Find pleiotropic genes (affecting 3+ diseases)
pleiotropic = cross_df[cross_df['n_diseases_involved'] >= 3].sort_values(
    'total_influence_score', ascending=False
)

print(f"\n{'='*60}")
print(f"üîó PLEIOTROPIC GENES (3+ Diseases)")
print(f"{'='*60}\n")

for _, gene in pleiotropic.head(10).iterrows():
    print(f"üß¨ {gene['gene']}")
    print(f"   Diseases Involved: {gene['n_diseases_involved']}")
    print(f"   Mechanism Axis: {gene['mechanism_axis']}")
    print(f"   Top Cell Type: {gene['top_cell_type']}")
    print(f"   Influence Score: {gene['total_influence_score']:.2f}")
    print(f"   Disease Contributions:\n      {gene['Disease_Contributions']}")
    print()

## Example 5: Cell Type Enrichment Analysis

Explore which cell types are most disease-relevant

In [None]:
# Load cell type annotation
cell_df = pd.read_csv('../atlas_resource/gene_disease_celltype_annotation.csv')

# Count disease-relevant cell types
relevant_cells = cell_df[cell_df['is_disease_relevant'] == True]
cell_counts = relevant_cells['cell_type'].value_counts()

print(f"\n{'='*60}")
print(f"üî¨ DISEASE-RELEVANT CELL TYPES")
print(f"{'='*60}\n")

print("Top Cell Types by Disease Relevance:")
for cell_type, count in cell_counts.head(10).items():
    print(f"  {cell_type:20s}: {count:3d} genes")

# Visualize
plt.figure(figsize=(12, 6))
cell_counts.head(10).plot(kind='bar', color='teal', edgecolor='black')
plt.title('Top 10 Disease-Relevant Cell Types')
plt.xlabel('Cell Type')
plt.ylabel('Number of Genes')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

## Example 6: Network Visualization

Explore the disease-gene-cell network

In [None]:
# Load network edges
network_df = pd.read_csv('../atlas_resource/multilayer_network_edges.csv')

# Find hub genes (highly connected)
gene_disease_edges = network_df[network_df['Layer'] == 'Gene_Disease']
hub_genes = gene_disease_edges.groupby('Source').size().sort_values(ascending=False)

print(f"\n{'='*60}")
print(f"üï∏Ô∏è  NETWORK HUB GENES (Top Connections)")
print(f"{'='*60}\n")

print("Top 10 Hub Genes:")
for gene, connections in hub_genes.head(10).items():
    connected_diseases = gene_disease_edges[gene_disease_edges['Source'] == gene]['Target'].tolist()
    print(f"  {gene:10s}: {connections} connections")
    print(f"             ‚Üí {', '.join(connected_diseases)}")
    print()

## Summary

This notebook demonstrated:

1. ‚úÖ **Gene Query**: ACE, NOS3, UMOD ‚Üí Diseases, cell types, interventions
2. ‚úÖ **Disease Query**: CKD, CAD ‚Üí Mechanism axes and causal genes
3. ‚úÖ **Patient Risk**: PRS ‚Üí MMRS composite risk scores
4. ‚úÖ **Cross-Disease**: Pleiotropic genes affecting multiple conditions
5. ‚úÖ **Cell Types**: Disease-relevant cell populations
6. ‚úÖ **Network**: Hub genes and connectivity patterns

**Next Steps:**
- Explore `ATLAS_USAGE_GUIDE.md` for more query patterns
- Check `atlas_data_dictionary.csv` for column definitions
- Use the REST API for programmatic access

**Citation:** See main repository README.md