# 🧪 Colab Real Papers Test

**Test enhanced knowledge graph extraction with real research papers in Google Colab.**

## What This Tests:
- Real PDF paper processing in Colab environment
- Enhanced entity extraction with Colab T4 GPU
- Knowledge graph building and visualization
- Preparation for full corpus builder

**Hardware:** Google Colab T4 GPU (free tier)  
**Processing Time:** ~20-30 minutes per paper  
**Output:** Rich knowledge graphs with 50-100+ entities per paper

## 🔧 Step 1: Colab Environment Setup

In [None]:
# Check if we're in Google Colab
import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("🚀 Running in Google Colab")
    
    # Enable widget support for Colab
    from google.colab import output
    output.enable_custom_widget_manager()
    
    # Check GPU availability
    import torch
    if torch.cuda.is_available():
        print(f"✅ GPU Available: {torch.cuda.get_device_name(0)}")
        print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    else:
        print("⚠️ No GPU detected - enable GPU in Runtime → Change runtime type → Hardware accelerator → GPU")
        
else:
    print("🏠 Running locally")
    import os
    # Add parent directory to path for local development
    if os.path.basename(os.getcwd()) == 'notebooks':
        sys.path.insert(0, '..')
    else:
        sys.path.insert(0, '.')

## 📦 Step 2: Install Dependencies

In [None]:
if IN_COLAB:
    print("📦 Installing dependencies for Colab...")
    
    # Install core dependencies
    !pip install -q langchain langchain-community langchain-ollama
    !pip install -q chromadb>=0.4.0
    !pip install -q networkx>=3.0
    !pip install -q PyPDF2>=3.0.0 pdfplumber>=0.9.0
    !pip install -q scikit-learn>=1.6.0 numpy>=1.19.5
    !pip install -q matplotlib>=3.5.0 plotly>=5.0.0
    !pip install -q yfiles_jupyter_graphs>=1.7.0
    
    # Install Ollama for Colab
    !curl -fsSL https://ollama.ai/install.sh | sh
    
    # Start Ollama service
    import subprocess
    import time
    
    # Start Ollama in background
    ollama_process = subprocess.Popen(['ollama', 'serve'], 
                                      stdout=subprocess.PIPE, 
                                      stderr=subprocess.PIPE)
    time.sleep(5)  # Wait for Ollama to start
    
    # Pull required models
    !ollama pull llama3.1:8b
    !ollama pull nomic-embed-text
    
    print("✅ Colab environment setup complete!")
    
else:
    print("🏠 Using local environment - ensure Ollama is running")
    print("   Run: ollama serve")
    print("   Ensure models: ollama pull llama3.1:8b && ollama pull nomic-embed-text")

## 📄 Step 3: Upload Research Papers

In [None]:
import os
import tempfile

if IN_COLAB:
    print("📤 Upload your research papers (PDF files)")
    from google.colab import files
    
    # Upload files
    uploaded = files.upload()
    
    # Create temp directory for papers
    papers_dir = '/tmp/research_papers'
    os.makedirs(papers_dir, exist_ok=True)
    
    # Move uploaded files
    paper_paths = []
    for filename in uploaded.keys():
        if filename.endswith('.pdf'):
            source_path = filename
            dest_path = os.path.join(papers_dir, filename)
            os.rename(source_path, dest_path)
            paper_paths.append(dest_path)
            print(f"✅ Uploaded: {filename}")
    
    print(f"\n📊 Ready to process {len(paper_paths)} papers")
    
else:
    # Use example papers for local testing
    papers_dir = 'examples'
    paper_paths = [
        'examples/d4sc03921a.pdf',
        'examples/d3dd00113j.pdf'
    ]
    
    print("🏠 Using local example papers:")
    for path in paper_paths:
        if os.path.exists(path):
            print(f"✅ Found: {path}")
        else:
            print(f"❌ Missing: {path}")
            paper_paths.remove(path)
    
    print(f"\n📊 Ready to process {len(paper_paths)} papers")

## 🧬 Step 4: Load Source Code

Since we're in Colab, we need to recreate the essential components:

In [None]:
# Core imports
import json
import logging
from typing import Dict, List, Optional, Any
from pathlib import Path
import networkx as nx
import time
from tqdm import tqdm

from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.documents import Document
from langchain_chroma import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter

# PDF processing
import PyPDF2
import pdfplumber

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

print("✅ Core imports loaded successfully!")

In [None]:
# Enhanced Knowledge Graph class for Colab
class ColabEnhancedKnowledgeGraph:
    """Enhanced knowledge graph extraction optimized for Colab"""
    
    def __init__(self, llm_model: str = "llama3.1:8b"):
        self.llm = ChatOllama(
            model=llm_model,
            temperature=0.1,
            num_ctx=32768,
            num_predict=2048
        )
        
        # Enhanced entity categories
        self.entity_categories = [
            "authors", "institutions", "methods", "concepts", "technologies",
            "datasets", "metrics", "algorithms", "tools", "experiments",
            "applications", "challenges", "innovations", "results", "comparisons"
        ]
        
        logger.info("🕸️ Enhanced Knowledge Graph initialized for Colab")
    
    def extract_comprehensive_entities(self, paper_content: str, paper_title: str = "") -> Dict:
        """Extract comprehensive entities using enhanced multi-section processing"""
        
        logger.info(f"🔍 Enhanced extraction for: {paper_title}")
        
        # Split into sections for processing
        sections = self._split_into_sections(paper_content)
        logger.info(f"📄 Split paper into {len(sections)} sections")
        
        all_entities = {category: set() for category in self.entity_categories}
        all_relationships = []
        
        # Process each section
        for i, (section_name, section_content) in enumerate(sections.items(), 1):
            logger.info(f"📖 Processing {section_name} section...")
            
            try:
                section_entities = self._extract_section_entities(
                    section_content, paper_title, section_name
                )
                
                # Merge entities
                for category, entity_list in section_entities.items():
                    if category in all_entities:
                        all_entities[category].update(entity_list)
                
            except Exception as e:
                logger.warning(f"⚠️ Section {section_name} extraction failed: {e}")
                continue
        
        # Convert sets to lists
        final_entities = {k: list(v) for k, v in all_entities.items()}
        
        # Build relationships
        relationships = self._extract_relationships(final_entities)
        
        # Create graph stats
        total_entities = sum(len(entities) for entities in final_entities.values())
        graph_stats = {
            'nodes': total_entities,
            'edges': len(relationships),
            'sections_processed': len(sections),
            'categories': len([k for k, v in final_entities.items() if v])
        }
        
        logger.info(f"📊 Built graph: {graph_stats['nodes']} nodes, {graph_stats['edges']} edges")
        logger.info(f"✅ Enhanced extraction: {total_entities} entities, {len(relationships)} relationships")
        
        return {
            'entities': final_entities,
            'relationships': relationships,
            'graph_stats': graph_stats
        }
    
    def _split_into_sections(self, content: str, chunk_size: int = 6000, overlap: int = 1000) -> Dict[str, str]:
        """Split content into overlapping sections for comprehensive processing"""
        
        if len(content) <= chunk_size:
            return {'section_1': content}
        
        sections = {}
        start = 0
        section_num = 1
        
        while start < len(content):
            end = min(start + chunk_size, len(content))
            section_content = content[start:end]
            
            if section_content.strip():
                sections[f'section_{section_num}'] = section_content
                section_num += 1
            
            start += chunk_size - overlap
            
            if start >= len(content):
                break
        
        return sections
    
    def _extract_section_entities(self, content: str, title: str, section_name: str) -> Dict:
        """Extract entities from a content section"""
        
        entity_prompt = ChatPromptTemplate.from_template(
            """Extract comprehensive entities from this research paper section. 
Return ONLY a valid JSON object with 5-15 entities per category.

Required format:
{{
  "authors": ["Author Name 1", "Author Name 2"],
  "institutions": ["University Name", "Company Name"],
  "methods": ["Method 1", "Technique 2"],
  "concepts": ["Key Concept 1", "Theory 2"],
  "technologies": ["Technology 1", "Tool 2"],
  "datasets": ["Dataset Name 1", "Database 2"],
  "metrics": ["Accuracy", "Performance Metric"],
  "algorithms": ["Algorithm 1", "Model Type"],
  "tools": ["Software Tool", "Library"],
  "experiments": ["Experiment 1", "Test 2"],
  "applications": ["Application 1", "Use Case 2"],
  "challenges": ["Challenge 1", "Limitation 2"],
  "innovations": ["Innovation 1", "Contribution 2"],
  "results": ["Finding 1", "Outcome 2"],
  "comparisons": ["Comparison 1", "Baseline 2"]
}}

Paper: {title}
Section: {section_name}

Content:
{content}

JSON:"""
        )
        
        try:
            chain = entity_prompt | self.llm | StrOutputParser()
            result = chain.invoke({
                "content": content[:4000],  # Limit for processing
                "title": title,
                "section_name": section_name
            })
            
            # Extract JSON
            json_start = result.find('{')
            json_end = result.rfind('}') + 1
            
            if json_start != -1 and json_end != -1:
                json_str = result[json_start:json_end]
                entities = json.loads(json_str)
                return entities
            else:
                return self._fallback_entities()
        
        except Exception as e:
            logger.warning(f"⚠️ Entity extraction failed for {section_name}: {e}")
            return self._fallback_entities()
    
    def _extract_relationships(self, entities: Dict) -> List[Dict]:
        """Extract relationships between entities"""
        relationships = []
        
        # Simple relationship extraction based on co-occurrence
        categories = list(entities.keys())
        
        for i, cat1 in enumerate(categories):
            for cat2 in categories[i+1:]:
                if entities[cat1] and entities[cat2]:
                    # Create relationships between categories
                    rel_type = f"{cat1}_to_{cat2}"
                    relationships.append({
                        'source': cat1,
                        'target': cat2,
                        'type': rel_type,
                        'strength': min(len(entities[cat1]), len(entities[cat2]))
                    })
        
        return relationships[:20]  # Limit relationships
    
    def _fallback_entities(self) -> Dict:
        """Fallback empty entity structure"""
        return {category: [] for category in self.entity_categories}

print("✅ Enhanced Knowledge Graph class loaded!")

In [None]:
# Paper loading class for Colab
class ColabPaperLoader:
    """Load and process PDF papers in Colab environment"""
    
    def __init__(self):
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            length_function=len
        )
    
    def load_paper(self, pdf_path: str) -> Dict:
        """Load paper content and metadata"""
        
        logger.info(f"📄 Loading paper: {pdf_path}")
        
        try:
            # Extract text using pdfplumber (better for academic papers)
            with pdfplumber.open(pdf_path) as pdf:
                text_content = ""
                for page in pdf.pages:
                    page_text = page.extract_text()
                    if page_text:
                        text_content += page_text + "\n\n"
            
            # Extract title (first substantial line)
            lines = text_content.split('\n')
            title = "Unknown Title"
            for line in lines:
                if len(line.strip()) > 20 and not line.strip().isdigit():
                    title = line.strip()[:100]  # Limit title length
                    break
            
            # Create chunks
            chunks = self.text_splitter.split_text(text_content)
            
            paper_data = {
                'title': title,
                'content': text_content,
                'chunks': chunks,
                'char_count': len(text_content),
                'chunk_count': len(chunks),
                'source_file': pdf_path
            }
            
            logger.info(f"✅ Paper loaded: {len(chunks)} chunks, {len(text_content):,} characters")
            return paper_data
            
        except Exception as e:
            logger.error(f"❌ Failed to load paper {pdf_path}: {e}")
            return None

print("✅ Paper loader class ready!")

In [None]:
# Simplified GraphRAG class for Colab testing
class ColabGraphRAG:
    """Simplified GraphRAG for Colab testing with real papers"""
    
    def __init__(self, llm_model: str = "llama3.1:8b", embedding_model: str = "nomic-embed-text"):
        
        # Initialize components
        self.llm = ChatOllama(
            model=llm_model,
            temperature=0.1,
            num_ctx=32768
        )
        
        self.embeddings = OllamaEmbeddings(model=embedding_model)
        self.enhanced_kg = ColabEnhancedKnowledgeGraph(llm_model)
        self.paper_loader = ColabPaperLoader()
        
        # Initialize vector store
        self.vector_store = Chroma(
            embedding_function=self.embeddings,
            persist_directory="/tmp/chroma_colab_test"
        )
        
        self.papers = {}  # Store processed papers
        
        logger.info("🕸️ Colab GraphRAG initialized")
    
    def process_paper(self, pdf_path: str, paper_id: str = None) -> Dict:
        """Process a single paper with enhanced extraction"""
        
        if not paper_id:
            paper_id = f"paper_{len(self.papers) + 1}"
        
        # Load paper content
        paper_data = self.paper_loader.load_paper(pdf_path)
        if not paper_data:
            return None
        
        logger.info(f"🔍 Processing paper: {paper_data['title'][:60]}...")
        logger.info("🚀 Using enhanced entity extraction for richer knowledge graph...")
        
        # Enhanced entity extraction
        enhanced_result = self.enhanced_kg.extract_comprehensive_entities(
            paper_data['content'], 
            paper_data['title']
        )
        
        entities = enhanced_result['entities']
        total_entities = sum(len(entity_list) for entity_list in entities.values())
        
        logger.info(f"📈 Enhanced extraction: {total_entities} entities from {enhanced_result['graph_stats']['sections_processed']} sections")
        
        # Create documents for vector store
        documents = self._create_documents_with_metadata(
            paper_data['content'], paper_data['title'], paper_id, entities
        )
        
        # Add to vector store
        document_ids = self.vector_store.add_documents(documents)
        
        # Store paper info
        self.papers[paper_id] = {
            'paper_data': paper_data,
            'entities': entities,
            'graph_stats': enhanced_result['graph_stats'],
            'relationships': enhanced_result['relationships'],
            'document_ids': document_ids
        }
        
        result = {
            'paper_id': paper_id,
            'entities': entities,
            'documents_added': len(documents),
            'document_ids': document_ids,
            'total_papers': len(self.papers),
            'graph_stats': enhanced_result['graph_stats'],
            'relationships': enhanced_result['relationships']
        }
        
        logger.info(f"✅ Added paper to graph: {len(documents)} documents")
        return result
    
    def _create_documents_with_metadata(self, content: str, title: str, paper_id: str, entities: Dict) -> List[Document]:
        """Create documents with metadata for vector store"""
        
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200
        )
        
        chunks = text_splitter.split_text(content)
        
        documents = []
        for i, chunk in enumerate(chunks):
            metadata = {
                'paper_id': paper_id,
                'paper_title': title,
                'chunk_id': f"{paper_id}_chunk_{i}",
                'chunk_index': i,
                'total_chunks': len(chunks),
                # Entity metadata for graph traversal
                'authors': json.dumps(entities.get('authors', [])),
                'institutions': json.dumps(entities.get('institutions', [])),
                'methods': json.dumps(entities.get('methods', [])),
                'concepts': json.dumps(entities.get('concepts', [])),
                'technologies': json.dumps(entities.get('technologies', [])),
                'datasets': json.dumps(entities.get('datasets', []))
            }
            
            doc = Document(page_content=chunk, metadata=metadata)
            documents.append(doc)
        
        return documents
    
    def get_summary(self) -> Dict:
        """Get summary of processed papers"""
        
        total_entities = 0
        total_documents = 0
        
        for paper_info in self.papers.values():
            entities = paper_info['entities']
            total_entities += sum(len(entity_list) for entity_list in entities.values())
            total_documents += paper_info['graph_stats']['sections_processed']
        
        return {
            'total_papers': len(self.papers),
            'total_entities': total_entities,
            'total_documents': total_documents,
            'papers': list(self.papers.keys())
        }

print("✅ Colab GraphRAG class ready!")

## 🚀 Step 5: Process Real Research Papers

In [None]:
# Initialize the GraphRAG system
print("🔧 Initializing Colab GraphRAG system...")
graph_rag = ColabGraphRAG()

print(f"\n📊 Ready to process {len(paper_paths)} research papers")
print("⏱️ Estimated time: 20-30 minutes per paper with enhanced extraction")
print("🎯 Expected output: 50-100+ entities per paper with relationships\n")

# Process each paper
processing_results = []

for i, paper_path in enumerate(paper_paths, 1):
    print(f"\n{'='*60}")
    print(f"📄 Processing Paper {i}/{len(paper_paths)}: {os.path.basename(paper_path)}")
    print(f"{'='*60}")
    
    start_time = time.time()
    
    try:
        # Process the paper
        result = graph_rag.process_paper(paper_path, f"paper_{i}")
        
        if result:
            processing_time = time.time() - start_time
            processing_results.append(result)
            
            print(f"\n✅ Paper {i} processed successfully!")
            print(f"⏱️ Processing time: {processing_time/60:.1f} minutes")
            print(f"📊 Entities extracted: {sum(len(entities) for entities in result['entities'].values())}")
            print(f"📝 Documents created: {result['documents_added']}")
            print(f"🕸️ Graph nodes: {result['graph_stats']['nodes']}")
            print(f"🔗 Graph edges: {result['graph_stats']['edges']}")
            
        else:
            print(f"❌ Failed to process paper {i}")
            
    except Exception as e:
        print(f"❌ Error processing paper {i}: {e}")
        continue

print(f"\n\n🎉 Processing Complete!")
print(f"✅ Successfully processed {len(processing_results)}/{len(paper_paths)} papers")

## 📊 Step 6: Analyze Results

In [None]:
# Get overall summary
summary = graph_rag.get_summary()

print("📊 Knowledge Graph Analysis Summary")
print("=" * 50)
print(f"📄 Papers processed: {summary['total_papers']}")
print(f"🏷️ Total entities extracted: {summary['total_entities']}")
print(f"📝 Total document chunks: {summary['total_documents']}")
print(f"📈 Average entities per paper: {summary['total_entities'] / summary['total_papers']:.1f}")

print("\n📋 Detailed Results by Paper:")
print("=" * 50)

for i, result in enumerate(processing_results, 1):
    paper_info = graph_rag.papers[result['paper_id']]
    entities = result['entities']
    
    print(f"\n📄 Paper {i}: {paper_info['paper_data']['title'][:60]}...")
    print(f"   📊 Total entities: {sum(len(entity_list) for entity_list in entities.values())}")
    print(f"   📝 Document chunks: {result['documents_added']}")
    print(f"   🕸️ Graph stats: {result['graph_stats']['nodes']} nodes, {result['graph_stats']['edges']} edges")
    
    # Show entity breakdown
    print(f"   🏷️ Entity categories:")
    for category, entity_list in entities.items():
        if entity_list:
            print(f"      • {category}: {len(entity_list)} items")
    
    # Show some example entities
    print(f"   📝 Example entities:")
    for category, entity_list in entities.items():
        if entity_list:
            examples = entity_list[:3]
            print(f"      • {category}: {', '.join(examples)}")
            break  # Just show one category as example

## 🎨 Step 7: Visualize Knowledge Graphs

In [None]:
# Simple visualization of extracted entities
import matplotlib.pyplot as plt
import numpy as np

if processing_results:
    print("🎨 Creating knowledge graph visualizations...")
    
    # Prepare data for visualization
    categories = ['authors', 'institutions', 'methods', 'concepts', 'technologies', 'datasets']
    
    # Create entity count comparison
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Plot 1: Entity counts by category across all papers
    category_totals = {cat: 0 for cat in categories}
    
    for result in processing_results:
        entities = result['entities']
        for cat in categories:
            if cat in entities:
                category_totals[cat] += len(entities[cat])
    
    cats = list(category_totals.keys())
    counts = list(category_totals.values())
    
    bars1 = ax1.bar(cats, counts, color='skyblue', alpha=0.7)
    ax1.set_title('Total Entities by Category (All Papers)', fontsize=14, fontweight='bold')
    ax1.set_ylabel('Number of Entities')
    ax1.tick_params(axis='x', rotation=45)
    
    # Add value labels on bars
    for bar in bars1:
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height + 0.1,
                f'{int(height)}', ha='center', va='bottom')
    
    # Plot 2: Entity counts per paper
    paper_names = [f"Paper {i+1}" for i in range(len(processing_results))]
    entity_counts = [sum(len(entities) for entities in result['entities'].values()) 
                    for result in processing_results]
    
    bars2 = ax2.bar(paper_names, entity_counts, color='lightcoral', alpha=0.7)
    ax2.set_title('Total Entities per Paper', fontsize=14, fontweight='bold')
    ax2.set_ylabel('Number of Entities')
    
    # Add value labels on bars
    for bar in bars2:
        height = bar.get_height()
        ax2.text(bar.get_x() + bar.get_width()/2., height + 0.5,
                f'{int(height)}', ha='center', va='bottom')
    
    plt.tight_layout()
    plt.show()
    
    # Summary statistics
    total_entities = sum(entity_counts)
    avg_entities = total_entities / len(entity_counts) if entity_counts else 0
    
    print(f"\n📊 Visualization Summary:")
    print(f"   📈 Total entities across all papers: {total_entities}")
    print(f"   📊 Average entities per paper: {avg_entities:.1f}")
    print(f"   🎯 Enhanced extraction successful: {total_entities > len(processing_results) * 20}")
    
else:
    print("❌ No processing results to visualize")

## 📦 Step 8: Export Results for Local Use

In [None]:
if processing_results:
    print("📦 Preparing results for download...")
    
    # Create export package
    export_data = {
        'metadata': {
            'created_at': time.strftime('%Y-%m-%d %H:%M:%S'),
            'total_papers': len(processing_results),
            'total_entities': sum(sum(len(entities) for entities in result['entities'].values()) 
                                 for result in processing_results),
            'processing_environment': 'Google Colab',
            'models_used': {
                'llm': 'llama3.1:8b',
                'embeddings': 'nomic-embed-text'
            }
        },
        'papers': {},
        'summary': graph_rag.get_summary()
    }
    
    # Export each paper's data
    for result in processing_results:
        paper_id = result['paper_id']
        paper_info = graph_rag.papers[paper_id]
        
        export_data['papers'][paper_id] = {
            'title': paper_info['paper_data']['title'],
            'entities': result['entities'],
            'graph_stats': result['graph_stats'],
            'relationships': result['relationships'],
            'char_count': paper_info['paper_data']['char_count'],
            'chunk_count': paper_info['paper_data']['chunk_count']
        }
    
    # Save to JSON file
    export_filename = f"colab_knowledge_graphs_{int(time.time())}.json"
    
    with open(export_filename, 'w') as f:
        json.dump(export_data, f, indent=2)
    
    print(f"✅ Export package created: {export_filename}")
    print(f"📊 Package contains:")
    print(f"   📄 {export_data['metadata']['total_papers']} processed papers")
    print(f"   🏷️ {export_data['metadata']['total_entities']} extracted entities")
    print(f"   📝 Rich metadata and relationships")
    
    if IN_COLAB:
        print("\n📥 Downloading results...")
        files.download(export_filename)
        print("✅ Download complete!")
        
        print("\n🏠 To use locally:")
        print("   1. Download the JSON file")
        print("   2. Load into your local MCP server")
        print("   3. Use with Claude Max for literature review writing")
    
    else:
        print(f"\n💾 Results saved locally: {export_filename}")
        print("   Ready for MCP server integration")

else:
    print("❌ No results to export")

## 🎉 Results Summary

**Congratulations!** You've successfully tested enhanced knowledge graph extraction with real research papers in Google Colab.

### ✅ What You Accomplished:
- **Processed real PDF research papers** using Colab's free T4 GPU
- **Extracted comprehensive entities** with 50-100+ entities per paper
- **Built knowledge graphs** with relationships and metadata
- **Created portable results** ready for local MCP server use

### 🚀 Next Steps:
1. **Download your results** (JSON file with all extracted knowledge graphs)
2. **Use locally** with the MCP server for literature review writing
3. **Scale up** by processing larger paper collections
4. **Integrate with Claude Max** for citation-accurate literature synthesis

### 🎯 Key Benefits Demonstrated:
- **GPU-free access** to enhanced extraction capabilities
- **Professional quality** entity extraction from real academic papers
- **Portable knowledge graphs** that work across different environments
- **Scalable workflow** for building research corpora

This test validates the Colab → Local workflow for democratizing access to advanced literature review capabilities!