# Causal Search Demo

This notebook demonstrates how to use the new Causal Search method in GraphRAG. Causal Search performs causal analysis on knowledge graphs through a two-stage process:

1. **Stage 1**: Extract extended graph information (k + s nodes) and generate causal analysis report
2. **Stage 2**: Use the causal report to generate final response to user query

## Key Features

- Extended node extraction beyond local search limits
- Two-stage processing for comprehensive causal analysis
- Automatic output saving to data folders
- Configurable parameters for retrieval breadth and context proportions
- Integration with existing GraphRAG pipeline

## Prerequisites

Before running this notebook, ensure you have:

1. Run the GraphRAG indexing pipeline to generate entities, relationships, and community reports
2. Set up your configuration in `settings.yaml` with causal search parameters
3. Configured your language models and API keys

In [None]:
# Import required libraries
import asyncio
import json
import pandas as pd
from pathlib import Path
from typing import Any, Dict, List

# GraphRAG imports
from graphrag.config.load_config import load_config
from graphrag.query.factory import get_causal_search_engine
from graphrag.utils.api import create_storage_from_config
from graphrag.utils.storage import load_table_from_storage
from graphrag.query.structured_search.causal_search.search import CausalSearchError

## Configuration Setup

First, let's load the GraphRAG configuration and set up the environment.

In [None]:
# Configuration setup
ROOT_DIR = Path("./ragtest")  # Adjust this path to your project root
CONFIG_FILE = None  # Use default settings.yaml

# Load configuration
try:
    config = load_config(ROOT_DIR, CONFIG_FILE)
    print("✅ Configuration loaded successfully")
    print(f"📁 Root directory: {ROOT_DIR}")
    print(f"🔧 Causal search s_parameter: {config.causal_search.s_parameter}")
    print(f"🔧 Causal search top_k_entities: {config.causal_search.top_k_mapped_entities}")
    print(f"🔧 Causal search max_context_tokens: {config.causal_search.max_context_tokens}")
except Exception as e:
    print(f"❌ Failed to load configuration: {e}")
    raise

## Data Loading

Load the required data from your GraphRAG pipeline outputs.

In [None]:
# Load data from GraphRAG pipeline outputs
async def load_graphrag_data():
    """Load entities, relationships, text units, and community reports."""
    try:
        # Create storage object
        storage = create_storage_from_config(config.output)
        
        # Load required data
        entities = await load_table_from_storage("entities", storage)
        relationships = await load_table_from_storage("relationships", storage)
        text_units = await load_table_from_storage("text_units", storage)
        community_reports = await load_table_from_storage("community_reports", storage)
        
        # Try to load covariates if they exist
        covariates = {}
        try:
            covariates = await load_table_from_storage("covariates", storage)
        except:
            print("ℹ️  No covariates found, using empty dict")
        
        print(f"✅ Loaded {len(entities)} entities")
        print(f"✅ Loaded {len(relationships)} relationships")
        print(f"✅ Loaded {len(text_units)} text units")
        print(f"✅ Loaded {len(community_reports)} community reports")
        
        return entities, relationships, text_units, community_reports, covariates
        
    except Exception as e:
        print(f"❌ Failed to load data: {e}")
        raise

# Load the data
entities, relationships, text_units, community_reports, covariates = await load_graphrag_data()

## Create Causal Search Engine

Now let's create the causal search engine using the loaded data.

In [None]:
# Create causal search engine
try:
    # For demo purposes, we'll create a mock embedding store
    # In a real scenario, you would use your actual embedding store
    from graphrag.vector_stores.base import BaseVectorStore
    
    class MockEmbeddingStore(BaseVectorStore):
        """Mock embedding store for demonstration purposes."""
        def __init__(self):
            self.entities = {}
        
        def filter_by_id(self, entity_keys):
            pass
    
    # Create mock embedding store
    mock_embedding_store = MockEmbeddingStore()
    
    # Create causal search engine
    causal_search = get_causal_search_engine(
        config=config,
        reports=community_reports,
        text_units=text_units,
        entities=entities,
        relationships=relationships,
        covariates=covariates,
        response_type="Multiple Paragraphs",
        description_embedding_store=mock_embedding_store
    )
    
    print("✅ Causal search engine created successfully")
    print(f"🔧 s_parameter: {causal_search.s_parameter}")
    print(f"🔧 max_context_tokens: {causal_search.max_context_tokens}")
    
except Exception as e:
    print(f"❌ Failed to create causal search engine: {e}")
    raise

## Demo Queries

Let's demonstrate the causal search with some example queries.

In [None]:
# Example queries for causal analysis
demo_queries = [
    "What are the main causal relationships in this dataset?",
    "How do different factors influence the outcomes?",
    "What causes the observed patterns in the data?",
    "Analyze the causal factors driving the main themes."
]

print("📝 Demo Queries:")
for i, query in enumerate(demo_queries, 1):
    print(f"{i}. {query}")

# Select a query to run
selected_query = demo_queries[0]  # You can change this index
print(f"\n🎯 Selected query: {selected_query}")

## Run Causal Search

Now let's run the causal search with our selected query.

In [None]:
# Run causal search
async def run_causal_search_demo(query: str):
    """Run causal search and display results."""
    try:
        print(f"🚀 Starting causal search for: '{query}'")
        print("⏳ This may take a few minutes...")
        
        # Run the search
        result = await causal_search.search(
            query=query,
            top_k_mapped_entities=15,  # Override default if needed
            top_k_relationships=15
        )
        
        print("✅ Causal search completed successfully!")
        print(f"⏱️  Completion time: {result.completion_time:.2f} seconds")
        print(f"🤖 LLM calls: {result.llm_calls}")
        print(f"📝 Prompt tokens: {result.prompt_tokens}")
        print(f"📤 Output tokens: {result.output_tokens}")
        
        return result
        
    except CausalSearchError as e:
        print(f"❌ Causal search failed: {e}")
        return None
    except Exception as e:
        print(f"❌ Unexpected error: {e}")
        return None

# Run the search
result = await run_causal_search_demo(selected_query)

## Display Results

Let's examine the results from our causal search.

In [None]:
# Display search results
if result:
    print("📊 Causal Search Results")
    print("=" * 50)
    
    # Display the response
    print("\n🎯 Final Response:")
    print("-" * 30)
    print(result.response)
    
    # Display context information
    print("\n🔍 Context Information:")
    print("-" * 30)
    if result.context_data:
        for key, value in result.context_data.items():
            if isinstance(value, pd.DataFrame):
                print(f"📋 {key}: {len(value)} records")
                if len(value) > 0:
                    print(f"   Columns: {list(value.columns)}")
                    print(f"   Sample data:")
                    print(value.head(2).to_string())
            else:
                print(f"📋 {key}: {type(value)}")
    else:
        print("No context data available")
        
    # Display context text
    print("\n�� Context Text:")
    print("-" * 30)
    if result.context_text:
        if isinstance(result.context_text, str):
            print(f"Context length: {len(result.context_text)} characters")
            print("Preview:")
            print(result.context_text[:500] + "..." if len(result.context_text) > 500 else result.context_text)
        else:
            print(f"Context type: {type(result.context_text)}")
    else:
        print("No context text available")
else:
    print("❌ No results to display")

## Check Generated Outputs

The causal search automatically saves outputs to the data folder. Let's check what was generated.

In [None]:
# Check generated output files
def check_generated_outputs():
    """Check what output files were generated by the causal search."""
    print("📁 Checking Generated Outputs")
    print("=" * 40)
    
    # Check outputs directory
    outputs_dir = Path("data/outputs")
    if outputs_dir.exists():
        print(f"✅ Outputs directory: {outputs_dir}")
        
        # Check for causal search outputs
        network_data_file = outputs_dir / "causal_search_network_data.json"
        causal_report_file = outputs_dir / "causal_search_report.md"
        
        if network_data_file.exists():
            print(f"✅ Network data file: {network_data_file}")
            print(f"   Size: {network_data_file.stat().st_size} bytes")
        else:
            print(f"❌ Network data file not found: {network_data_file}")
            
        if causal_report_file.exists():
            print(f"✅ Causal report file: {causal_report_file}
            print(f"   Size: {causal_report_file.stat().st_size} bytes")
        else:
            print(f"❌ Causal report file not found: {causal_report_file}")
    else:
        print(f"❌ Outputs directory not found: {outputs_dir}")
    
    # Check prompts directory
    prompts_dir = Path("data/prompts")
    if prompts_dir.exists():
        print(f"\n✅ Prompts directory: {prompts_dir}")
        
        # Check for saved prompts
        discovery_prompt_file = prompts_dir / "causal_discovery_prompt.txt"
        summary_prompt_file = prompts_dir / "causal_summary_prompt.txt"
        
        if discovery_prompt_file.exists():
            print(f"✅ Discovery prompt file: {discovery_prompt_file}")
        else:
            print(f"❌ Discovery prompt file not found: {discovery_prompt_file}")
            
        if summary_prompt_file.exists():
            print(f"✅ Summary prompt file: {summary_prompt_file}")
        else:
            print(f"❌ Summary prompt file not found: {summary_prompt_file}")
    else:
        print(f"\n❌ Prompts directory not found: {prompts_dir}")

# Check outputs
check_generated_outputs()

## Summary

This notebook has demonstrated:

✅ **Causal Search Setup**: Configuration loading and data preparation
✅ **Engine Creation**: Building the causal search engine with proper parameters
✅ **Query Execution**: Running causal search with example queries
✅ **Results Analysis**: Examining search results and generated outputs
✅ **Output Files**: Checking automatically generated network data and causal reports
✅ **Configuration Tuning**: Understanding parameter effects and recommendations

## Next Steps

1. **Customize Configuration**: Adjust parameters in `settings.yaml` based on your needs
2. **Test Different Queries**: Try various query types to understand causal analysis capabilities
3. **Monitor Performance**: Track token usage and processing times for optimization
4. **Review Outputs**: Analyze generated reports to improve causal analysis quality

## CLI Usage

You can also use causal search from the command line:

```bash
graphrag query \
  --root ./ragtest \
  --method causal \
  --query "What are the causal relationships in this dataset?"
```

For more information, see the comprehensive documentation in `docs/causal_search_usage.md`.