# LangChain/LangGraph Framework Baseline for Clinical Trial NLP

## Overview

This notebook demonstrates how to use LangChain and LangGraph to build a stateful, graph-based agent system for clinical trial natural language inference (NLI). LangChain provides the most mature and feature-rich ecosystem for LLM applications.

### Why LangChain/LangGraph?
- **Mature ecosystem**: Most established LLM application framework
- **Rich integrations**: Extensive tool and service integrations
- **Stateful workflows**: LangGraph enables complex, stateful agent interactions
- **Advanced patterns**: Support for complex reasoning and decision patterns
- **Community support**: Large community and extensive documentation
- **Production ready**: Battle-tested in numerous real-world applications

### Agent Architecture with LangGraph
We'll implement a graph-based workflow with stateful agents:
1. **Clinical Data Extractor**: Processes and structures trial data
2. **Medical Analysis Node**: Expert medical reasoning
3. **Statistical Analysis Node**: Numerical and statistical validation
4. **Logic Verification Node**: Logical consistency checking
5. **Decision Synthesis Node**: Final entailment classification
6. **State Management**: Persistent state across the analysis workflow

## Setup and Installation

First, let's set up our environment and import the necessary libraries:

In [None]:
# Load environment variables
from dotenv import load_dotenv
import os

load_dotenv()
print("✅ Environment loaded")

In [None]:
# Import required libraries
import json
import pandas as pd
from tqdm import tqdm
from typing import Dict, List, Any, Optional, TypedDict, Annotated
import warnings
warnings.filterwarnings('ignore')

# LangChain imports
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

# LangGraph imports
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.prebuilt import ToolExecutor
import operator

print("✅ All libraries imported successfully")

## Data Loading and Utilities

Let's create utility functions for loading and processing clinical trial data:

In [None]:
def load_clinical_trial(trial_id: str) -> Dict[str, Any]:
    """Load clinical trial data from JSON file.
    
    Args:
        trial_id: The NCT identifier for the clinical trial
        
    Returns:
        Dictionary containing trial data or error information
    """
    try:
        file_path = os.path.join("training_data", "CT json", f"{trial_id}.json")
        with open(file_path, "r", encoding="utf-8") as f:
            data = json.load(f)
        return data
    except FileNotFoundError:
        return {"error": f"Clinical trial {trial_id} not found"}
    except Exception as e:
        return {"error": f"Error loading {trial_id}: {str(e)}"}

def load_dataset(filepath: str) -> Dict[str, Any]:
    """Load training or test dataset.
    
    Args:
        filepath: Path to the JSON dataset file
        
    Returns:
        Dictionary containing the dataset
    """
    try:
        with open(filepath, "r", encoding="utf-8") as f:
            return json.load(f)
    except Exception as e:
        print(f"Error loading dataset: {e}")
        return {}

def create_trial_documents(trial_data: Dict[str, Any]) -> List[Document]:
    """Create LangChain documents from trial data for better processing.
    
    Args:
        trial_data: Clinical trial data dictionary
        
    Returns:
        List of Document objects for LangChain processing
    """
    if "error" in trial_data:
        return [Document(page_content=f"Error: {trial_data['error']}", metadata={"section": "error"})]
    
    documents = []
    trial_id = trial_data.get("Clinical Trial ID", "Unknown")
    
    # Create documents for each section
    sections = {
        "Eligibility": trial_data.get("Eligibility", []),
        "Intervention": trial_data.get("Intervention", []),
        "Results": trial_data.get("Results", []),
        "Adverse_Events": trial_data.get("Adverse_Events", [])
    }
    
    for section_name, section_data in sections.items():
        if section_data:
            if isinstance(section_data, list):
                content = "\n".join(str(item) for item in section_data)
            else:
                content = str(section_data)
            
            documents.append(Document(
                page_content=content,
                metadata={
                    "trial_id": trial_id,
                    "section": section_name
                }
            ))
    
    return documents

# Test utilities
sample_trial = load_clinical_trial("NCT00066573")
sample_docs = create_trial_documents(sample_trial)
print(f"✅ Data utilities ready. Sample trial: {sample_trial.get('Clinical Trial ID', 'Error')}")
print(f"Created {len(sample_docs)} documents from sample trial")

## Model Configuration

Set up the ChatGoogleGenerativeAI model for LangChain:

In [None]:
# Initialize ChatGoogleGenerativeAI model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0.1,  # Low temperature for consistent results
    google_api_key=os.getenv("GEMINI_API_KEY")
)

# Initialize checkpointer for state persistence
checkpointer = SqliteSaver.from_conn_string(":memory:")

print("✅ Model and checkpointer configured")

## State Definition

Define the state structure for our LangGraph workflow:

In [None]:
class ClinicalAnalysisState(TypedDict):
    """State schema for clinical trial analysis workflow."""
    
    # Input data
    statement: str
    primary_trial_id: str
    secondary_trial_id: Optional[str]
    focus_section: Optional[str]
    
    # Trial data
    primary_trial_data: Dict[str, Any]
    secondary_trial_data: Optional[Dict[str, Any]]
    trial_documents: List[Document]
    
    # Analysis results
    medical_analysis: Optional[str]
    statistical_analysis: Optional[str]
    logical_analysis: Optional[str]
    
    # Final decision
    final_decision: Optional[str]
    confidence_score: Optional[float]
    
    # Workflow control
    next_action: Optional[str]
    error_messages: Annotated[List[str], operator.add]

print("✅ State schema defined")

## Node Definitions

Define the analysis nodes for our LangGraph workflow:

In [None]:
def clinical_data_extractor(state: ClinicalAnalysisState) -> ClinicalAnalysisState:
    """Extract and structure clinical trial data."""
    
    try:
        # Load primary trial data
        primary_data = load_clinical_trial(state["primary_trial_id"])
        state["primary_trial_data"] = primary_data
        
        # Load secondary trial data if provided
        if state["secondary_trial_id"]:
            secondary_data = load_clinical_trial(state["secondary_trial_id"])
            state["secondary_trial_data"] = secondary_data
        
        # Create documents for processing
        documents = create_trial_documents(primary_data)
        if state["secondary_trial_data"]:
            documents.extend(create_trial_documents(state["secondary_trial_data"]))
        
        state["trial_documents"] = documents
        state["next_action"] = "medical_analysis"
        
    except Exception as e:
        state["error_messages"].append(f"Data extraction error: {str(e)}")
        state["next_action"] = "end"
    
    return state

print("✅ Data extractor node defined")

In [None]:
def medical_analysis_node(state: ClinicalAnalysisState) -> ClinicalAnalysisState:
    """Perform medical analysis of the statement against trial data."""
    
    try:
        # Create medical analysis prompt
        medical_prompt = ChatPromptTemplate.from_messages([
            SystemMessage(content="""
You are a Medical Expert specializing in clinical trial analysis.
Your role is to analyze statements from a medical perspective and assess their accuracy against clinical trial evidence.

Focus on:
- Medical terminology accuracy
- Clinical relevance and significance
- Medical plausibility of claims
- Clinical context and implications
- Medical evidence alignment

Provide a thorough medical analysis and end with:
MEDICAL_VERDICT: [SUPPORTS/CONTRADICTS/UNCLEAR] - brief reasoning
            """.strip()),
            HumanMessage(content="""
STATEMENT TO ANALYZE: "{statement}"

CLINICAL TRIAL EVIDENCE:
{trial_evidence}

Please provide your medical analysis of this statement against the trial evidence.
            """.strip())
        ])
        
        # Prepare trial evidence
        trial_evidence = ""
        for doc in state["trial_documents"]:
            if state["focus_section"] and doc.metadata.get("section") != state["focus_section"]:
                continue
            trial_evidence += f"\n{doc.metadata.get('section', 'Unknown')}:\n{doc.page_content}\n"
        
        # Run medical analysis
        medical_chain = medical_prompt | llm | StrOutputParser()
        medical_result = medical_chain.invoke({
            "statement": state["statement"],
            "trial_evidence": trial_evidence
        })
        
        state["medical_analysis"] = medical_result
        state["next_action"] = "statistical_analysis"
        
    except Exception as e:
        state["error_messages"].append(f"Medical analysis error: {str(e)}")
        state["next_action"] = "statistical_analysis"  # Continue with next analysis
    
    return state

print("✅ Medical analysis node defined")

In [None]:
def statistical_analysis_node(state: ClinicalAnalysisState) -> ClinicalAnalysisState:
    """Perform statistical and numerical analysis."""
    
    try:
        # Create statistical analysis prompt
        statistical_prompt = ChatPromptTemplate.from_messages([
            SystemMessage(content="""
You are a Statistical Analyst specializing in clinical trial data analysis.
Your role is to analyze numerical claims, statistics, and quantitative relationships in clinical trials.

Focus on:
- Numerical accuracy and verification
- Statistical significance and validity
- Quantitative relationships and comparisons
- Data calculations and mathematical reasoning
- Confidence intervals and error margins

Perform detailed calculations and end with:
STATISTICAL_VERDICT: [ACCURATE/INACCURATE/PARTIALLY_ACCURATE] - numerical reasoning
            """.strip()),
            HumanMessage(content="""
STATEMENT TO ANALYZE: "{statement}"

CLINICAL TRIAL DATA:
{trial_evidence}

Please perform statistical analysis of the numerical claims in this statement.
            """.strip())
        ])
        
        # Prepare trial evidence (focus on Results section for statistical data)
        trial_evidence = ""
        for doc in state["trial_documents"]:
            # Prioritize Results and statistical sections
            if doc.metadata.get("section") in ["Results", "Adverse_Events"] or not state["focus_section"]:
                trial_evidence += f"\n{doc.metadata.get('section', 'Unknown')}:\n{doc.page_content}\n"
        
        # Run statistical analysis
        statistical_chain = statistical_prompt | llm | StrOutputParser()
        statistical_result = statistical_chain.invoke({
            "statement": state["statement"],
            "trial_evidence": trial_evidence
        })
        
        state["statistical_analysis"] = statistical_result
        state["next_action"] = "logical_analysis"
        
    except Exception as e:
        state["error_messages"].append(f"Statistical analysis error: {str(e)}")
        state["next_action"] = "logical_analysis"  # Continue with next analysis
    
    return state

print("✅ Statistical analysis node defined")

In [None]:
def logical_analysis_node(state: ClinicalAnalysisState) -> ClinicalAnalysisState:
    """Perform logical consistency analysis."""
    
    try:
        # Create logical analysis prompt
        logical_prompt = ChatPromptTemplate.from_messages([
            SystemMessage(content="""
You are a Logic Analyst specializing in reasoning validation and consistency checking.
Your role is to validate logical relationships, consistency, and reasoning soundness.

Focus on:
- Logical structure and coherence
- Cause-and-effect relationships
- Internal consistency
- Validity of inferences
- Detection of logical fallacies
- Reasoning pattern analysis

Provide logical analysis and end with:
LOGICAL_VERDICT: [SOUND/UNSOUND/QUESTIONABLE] - logical reasoning
            """.strip()),
            HumanMessage(content="""
STATEMENT TO ANALYZE: "{statement}"

EVIDENCE CONTEXT:
{trial_evidence}

Please analyze the logical consistency and reasoning of this statement.
            """.strip())
        ])
        
        # Prepare trial evidence
        trial_evidence = ""
        for doc in state["trial_documents"]:
            if state["focus_section"] and doc.metadata.get("section") != state["focus_section"]:
                continue
            trial_evidence += f"\n{doc.metadata.get('section', 'Unknown')}:\n{doc.page_content[:500]}...\n"
        
        # Run logical analysis
        logical_chain = logical_prompt | llm | StrOutputParser()
        logical_result = logical_chain.invoke({
            "statement": state["statement"],
            "trial_evidence": trial_evidence
        })
        
        state["logical_analysis"] = logical_result
        state["next_action"] = "decision_synthesis"
        
    except Exception as e:
        state["error_messages"].append(f"Logical analysis error: {str(e)}")
        state["next_action"] = "decision_synthesis"  # Continue to final decision
    
    return state

print("✅ Logical analysis node defined")

In [None]:
def decision_synthesis_node(state: ClinicalAnalysisState) -> ClinicalAnalysisState:
    """Synthesize all analyses and make final decision."""
    
    try:
        # Create decision synthesis prompt
        decision_prompt = ChatPromptTemplate.from_messages([
            SystemMessage(content="""
You are the Decision Synthesizer responsible for making final entailment classifications.
Your role is to synthesize expert analyses and determine the final verdict.

Classification Rules:
- ENTAILMENT: Statement is directly supported by the trial evidence
- CONTRADICTION: Statement is refuted or contradicted by the trial evidence

Weigh all evidence types:
- Medical expert analysis (clinical accuracy)
- Statistical analysis (numerical validity)
- Logical analysis (reasoning soundness)

Provide reasoning and confidence, then end with:
FINAL_DECISION: [Entailment/Contradiction]
CONFIDENCE: [0.0-1.0]
            """.strip()),
            HumanMessage(content="""
ORIGINAL STATEMENT: "{statement}"

MEDICAL ANALYSIS:
{medical_analysis}

STATISTICAL ANALYSIS:
{statistical_analysis}

LOGICAL ANALYSIS:
{logical_analysis}

Based on these expert analyses, provide your final entailment decision.
            """.strip())
        ])
        
        # Run decision synthesis
        decision_chain = decision_prompt | llm | StrOutputParser()
        decision_result = decision_chain.invoke({
            "statement": state["statement"],
            "medical_analysis": state.get("medical_analysis", "Not available"),
            "statistical_analysis": state.get("statistical_analysis", "Not available"),
            "logical_analysis": state.get("logical_analysis", "Not available")
        })
        
        # Parse decision and confidence
        if "FINAL_DECISION: Entailment" in decision_result:
            final_decision = "Entailment"
        elif "FINAL_DECISION: Contradiction" in decision_result:
            final_decision = "Contradiction"
        else:
            # Fallback parsing
            if "entailment" in decision_result.lower() and "contradiction" not in decision_result.lower():
                final_decision = "Entailment"
            else:
                final_decision = "Contradiction"
        
        # Extract confidence score
        confidence = 0.5  # Default
        try:
            if "CONFIDENCE:" in decision_result:
                confidence_str = decision_result.split("CONFIDENCE:")[1].strip().split()[0]
                confidence = float(confidence_str)
        except:
            pass
        
        state["final_decision"] = final_decision
        state["confidence_score"] = confidence
        state["next_action"] = "end"
        
    except Exception as e:
        state["error_messages"].append(f"Decision synthesis error: {str(e)}")
        state["final_decision"] = "Contradiction"  # Conservative fallback
        state["confidence_score"] = 0.1
        state["next_action"] = "end"
    
    return state

print("✅ Decision synthesis node defined")

## Workflow Definition

Create the LangGraph workflow by connecting all nodes:

In [None]:
def create_clinical_analysis_workflow():
    """Create the clinical analysis workflow using LangGraph."""
    
    # Create the StateGraph
    workflow = StateGraph(ClinicalAnalysisState)
    
    # Add nodes
    workflow.add_node("data_extraction", clinical_data_extractor)
    workflow.add_node("medical_analysis", medical_analysis_node)
    workflow.add_node("statistical_analysis", statistical_analysis_node)
    workflow.add_node("logical_analysis", logical_analysis_node)
    workflow.add_node("decision_synthesis", decision_synthesis_node)
    
    # Add edges
    workflow.set_entry_point("data_extraction")
    workflow.add_edge("data_extraction", "medical_analysis")
    workflow.add_edge("medical_analysis", "statistical_analysis")
    workflow.add_edge("statistical_analysis", "logical_analysis")
    workflow.add_edge("logical_analysis", "decision_synthesis")
    workflow.add_edge("decision_synthesis", END)
    
    # Compile the workflow
    app = workflow.compile(checkpointer=checkpointer)
    
    return app

# Create the workflow
clinical_workflow = create_clinical_analysis_workflow()

print("✅ LangGraph workflow created")

## Analysis Pipeline

Create the main pipeline function that uses our LangGraph workflow:

In [None]:
def langchain_analysis_pipeline(statement: str, primary_id: str, secondary_id: Optional[str] = None, 
                               section_id: Optional[str] = None, verbose: bool = False) -> str:
    """
    Run the complete LangChain/LangGraph analysis pipeline.
    
    Args:
        statement: The natural language statement to analyze
        primary_id: Primary clinical trial ID
        secondary_id: Secondary trial ID for comparison statements
        section_id: Relevant section of the trial
        verbose: Whether to print intermediate results
        
    Returns:
        Final decision: 'Entailment' or 'Contradiction'
    """
    
    try:
        if verbose:
            print(f"📄 Analyzing: {statement[:100]}...")
            print(f"🏥 Primary Trial: {primary_id}")
            if secondary_id:
                print(f"🏥 Secondary Trial: {secondary_id}")
        
        # Create initial state
        initial_state = {
            "statement": statement,
            "primary_trial_id": primary_id,
            "secondary_trial_id": secondary_id,
            "focus_section": section_id,
            "primary_trial_data": {},
            "secondary_trial_data": None,
            "trial_documents": [],
            "medical_analysis": None,
            "statistical_analysis": None,
            "logical_analysis": None,
            "final_decision": None,
            "confidence_score": None,
            "next_action": None,
            "error_messages": []
        }
        
        # Run the workflow
        config = {"configurable": {"thread_id": f"analysis_{hash(statement)}"[:10]}}
        result = clinical_workflow.invoke(initial_state, config)
        
        if verbose:
            print(f"🩺 Medical Analysis: {'✅' if result.get('medical_analysis') else '❌'}")
            print(f"📊 Statistical Analysis: {'✅' if result.get('statistical_analysis') else '❌'}")
            print(f"🧠 Logical Analysis: {'✅' if result.get('logical_analysis') else '❌'}")
            print(f"⚖️ Final Decision: {result.get('final_decision', 'Unknown')}")
            print(f"🎯 Confidence: {result.get('confidence_score', 0.0):.2f}")
            
            if result.get("error_messages"):
                print(f"⚠️ Errors: {len(result['error_messages'])}")
            print("-" * 50)
        
        return result.get("final_decision", "Contradiction")
        
    except Exception as e:
        if verbose:
            print(f"❌ Error in LangChain pipeline: {e}")
        return "Contradiction"  # Conservative fallback

print("✅ LangChain analysis pipeline ready")

## Test Example

Let's test our LangChain/LangGraph system:

In [None]:
# Test with a sample statement
test_statement = "there is a 13.2% difference between the results from the two the primary trial cohorts"
test_primary_id = "NCT00066573"

print(f"Testing LangChain/LangGraph system with statement:")
print(f"'{test_statement}'")
print(f"Primary trial: {test_primary_id}")
print("\n" + "="*80)

# Run the analysis with verbose output
result = langchain_analysis_pipeline(
    statement=test_statement,
    primary_id=test_primary_id,
    section_id="Results",
    verbose=True
)

print(f"\n🎯 LANGCHAIN RESULT: {result}")
print("="*80)

## Evaluation on Training Data

Let's evaluate our LangChain/LangGraph system on training data:

In [None]:
# Load training data
train_data = load_dataset("training_data/train.json")
print(f"Loaded {len(train_data)} training examples")

# Evaluate on a sample (adjust sample_size as needed)
sample_size = 20
examples = list(train_data.items())[:sample_size]

print(f"\nEvaluating LangChain/LangGraph system on {len(examples)} examples...")

results = []
correct = 0

for i, (uuid, example) in enumerate(tqdm(examples, desc="LangChain Processing")):
    try:
        statement = example.get("Statement")
        primary_id = example.get("Primary_id")
        secondary_id = example.get("Secondary_id")
        section_id = example.get("Section_id")
        expected = example.get("Label")
        
        if not statement or not primary_id:
            results.append({
                "uuid": uuid,
                "expected": expected,
                "predicted": "SKIPPED",
                "correct": False
            })
            continue
        
        # Get prediction from LangChain/LangGraph system
        predicted = langchain_analysis_pipeline(
            statement=statement,
            primary_id=primary_id,
            secondary_id=secondary_id,
            section_id=section_id,
            verbose=False
        )
        
        # Check if correct
        is_correct = (predicted.strip() == expected.strip())
        if is_correct:
            correct += 1
            
        results.append({
            "uuid": uuid,
            "statement": statement[:100] + "..." if len(statement) > 100 else statement,
            "expected": expected,
            "predicted": predicted,
            "correct": is_correct
        })
        
        status = "✅" if is_correct else "❌"
        print(f"Example {i+1:2d}: {expected:12} -> {predicted:12} {status}")
        
    except Exception as e:
        print(f"Error processing example {i+1}: {e}")
        results.append({
            "uuid": uuid,
            "expected": expected,
            "predicted": "ERROR",
            "correct": False
        })

# Calculate accuracy
accuracy = correct / len(examples) if examples else 0
print(f"\n📊 LangChain/LangGraph Results:")
print(f"Accuracy: {accuracy:.2%} ({correct}/{len(examples)})")

# Store results for comparison
langchain_results = results.copy()

## State Inspection

Let's inspect the stateful capabilities of our LangGraph workflow:

In [None]:
# Demonstrate state persistence and inspection
print("🔍 LangGraph State Inspection:")
print("=" * 50)

# Run a simple analysis and inspect intermediate states
test_config = {"configurable": {"thread_id": "demo_analysis"}}
test_state = {
    "statement": "The primary endpoint was met",
    "primary_trial_id": "NCT00066573",
    "secondary_trial_id": None,
    "focus_section": "Results",
    "primary_trial_data": {},
    "secondary_trial_data": None,
    "trial_documents": [],
    "medical_analysis": None,
    "statistical_analysis": None,
    "logical_analysis": None,
    "final_decision": None,
    "confidence_score": None,
    "next_action": None,
    "error_messages": []
}

try:
    # Stream the workflow execution to see intermediate states
    print("Streaming workflow execution:")
    for step in clinical_workflow.stream(test_state, test_config):
        for node_name, node_output in step.items():
            print(f"\n📍 Node: {node_name}")
            if node_output.get("final_decision"):
                print(f"   Decision: {node_output['final_decision']}")
                print(f"   Confidence: {node_output.get('confidence_score', 'N/A')}")
            if node_output.get("error_messages"):
                print(f"   Errors: {len(node_output['error_messages'])}")
            print(f"   Next: {node_output.get('next_action', 'N/A')}")
    
    # Get final state
    final_state = clinical_workflow.get_state(test_config)
    print(f"\n🎯 Final State Summary:")
    print(f"   Thread ID: {test_config['configurable']['thread_id']}")
    print(f"   Final Decision: {final_state.values.get('final_decision')}")
    print(f"   Total Errors: {len(final_state.values.get('error_messages', []))}")
    
except Exception as e:
    print(f"Error in state inspection: {e}")

print("\n✅ State inspection completed")

## Generate Submission File

Let's generate a submission file using our LangChain/LangGraph system:

In [None]:
def generate_langchain_submission(test_file="test.json", output_file="langchain_submission.json", sample_size=None):
    """
    Generate submission file using LangChain/LangGraph system.
    
    Args:
        test_file: Path to test data
        output_file: Output submission file
        sample_size: Number of examples to process (None for all)
    """
    
    # Load test data
    test_data = load_dataset(test_file)
    if not test_data:
        print(f"❌ Could not load test data from {test_file}")
        return
    
    examples = list(test_data.items())
    if sample_size:
        examples = examples[:sample_size]
        
    print(f"🚀 Generating LangChain/LangGraph predictions for {len(examples)} examples...")
    
    submission = {}
    
    for i, (uuid, example) in enumerate(tqdm(examples, desc="LangChain Processing")):
        try:
            statement = example.get("Statement")
            primary_id = example.get("Primary_id")
            secondary_id = example.get("Secondary_id")
            section_id = example.get("Section_id")
            
            if not statement or not primary_id:
                submission[uuid] = {"Prediction": "Contradiction"}  # Default fallback
                continue
                
            # Get prediction from LangChain/LangGraph system
            prediction = langchain_analysis_pipeline(
                statement=statement,
                primary_id=primary_id,
                secondary_id=secondary_id,
                section_id=section_id,
                verbose=False
            )
            
            submission[uuid] = {"Prediction": prediction}
            
        except Exception as e:
            print(f"Error processing {uuid}: {e}")
            submission[uuid] = {"Prediction": "Contradiction"}  # Conservative fallback
    
    # Save submission file
    with open(output_file, "w", encoding="utf-8") as f:
        json.dump(submission, f, indent=2)
    
    print(f"✅ LangChain submission saved to {output_file}")
    return submission

# Generate submission for a small sample
langchain_submission = generate_langchain_submission(
    test_file="test.json", 
    output_file="langchain_submission.json",
    sample_size=10  # Adjust as needed
)

print(f"Generated predictions for {len(langchain_submission)} examples")

## Workflow Visualization

Let's visualize our LangGraph workflow structure:

In [None]:
# Display workflow information
print("🔄 LangGraph Workflow Structure:")
print("=" * 50)

workflow_steps = [
    "1. Data Extraction → Load and structure clinical trial data",
    "2. Medical Analysis → Expert medical reasoning and assessment",
    "3. Statistical Analysis → Numerical validation and calculations",
    "4. Logical Analysis → Reasoning consistency and soundness",
    "5. Decision Synthesis → Final entailment classification"
]

for step in workflow_steps:
    print(f"   {step}")

print("\n📊 State Management Features:")
state_features = [
    "• Persistent state across workflow steps",
    "• Error tracking and recovery mechanisms",
    "• Confidence scoring and decision rationale",
    "• Intermediate result storage and inspection",
    "• Thread-based conversation management"
]

for feature in state_features:
    print(f"   {feature}")

print("\n✅ Workflow visualization complete")

## Conclusion and Insights

### LangChain/LangGraph Framework Strengths:
1. **Mature ecosystem**: Most established and feature-rich LLM application framework
2. **Stateful workflows**: LangGraph enables complex, stateful agent interactions
3. **Rich integrations**: Extensive tool ecosystem and service integrations
4. **Advanced patterns**: Support for sophisticated reasoning and decision patterns
5. **Community support**: Large community, extensive documentation, and real-world examples
6. **Production ready**: Battle-tested in numerous enterprise applications

### Key Features Demonstrated:
- **Graph-based workflow**: Structured, stateful analysis pipeline
- **State persistence**: SQLite checkpointer for conversation continuity
- **Node specialization**: Dedicated analysis nodes for different domains
- **Error handling**: Robust error tracking and recovery mechanisms
- **Streaming support**: Real-time workflow execution monitoring
- **Configuration management**: Thread-based state management

### Architecture Benefits:
- **Flexible workflows**: Easy to modify and extend the analysis pipeline
- **State management**: Persistent context across complex multi-step processes
- **Debugging support**: Clear state inspection and intermediate result tracking
- **Scalable design**: Production-ready architecture for enterprise deployment
- **Integration ready**: Seamless integration with external tools and services

### Optimization Opportunities:
1. **Prompt engineering**: Fine-tune prompts for each analysis node
2. **Conditional routing**: Add conditional logic for different statement types
3. **Parallel processing**: Implement parallel analysis nodes where possible
4. **Tool integration**: Leverage LangChain's extensive tool ecosystem
5. **Advanced patterns**: Implement self-reflection and iterative improvement

### When to Use LangChain/LangGraph:
- Complex, multi-step reasoning workflows requiring state management
- Applications needing extensive tool and service integrations
- Enterprise systems requiring robust, production-ready frameworks
- Projects where community support and documentation are crucial
- Scenarios requiring sophisticated workflow patterns and customization

### Framework Comparison Summary:
- **vs AutoGen**: Better for structured workflows, AutoGen better for free-form conversations
- **vs Atomic Agents**: More comprehensive but heavier, Atomic better for pure performance
- **vs Agno**: Broader ecosystem, Agno better for built-in memory and knowledge

### LangChain/LangGraph Unique Advantages:
1. **Graph-based reasoning**: Native support for complex, conditional workflows
2. **State persistence**: Built-in checkpointing and conversation continuity
3. **Ecosystem maturity**: Extensive tools, integrations, and community support
4. **Enterprise features**: Production-ready with monitoring, logging, and debugging
5. **Flexibility**: Highly customizable workflows and integration patterns

LangChain/LangGraph excels in complex, production environments where structured workflows, state management, and extensive integrations are crucial. It's the ideal choice for enterprise clinical NLP applications requiring sophisticated reasoning patterns and robust infrastructure support.