# 📚 Enhanced Literature Review System

**Your intelligent AI research assistant for comprehensive paper analysis and literature review preparation**

**Enhanced Features (v2.1.0):**
- 🔬 **Enhanced Paper Analysis**: Rich metadata extraction with domain classification
- 📍 **Citation Tracking**: Precise location mapping and verification
- 🕸️ **Knowledge Graph**: Entity extraction and relationship mapping
- 📊 **GraphRAG Compatible**: Export papers for cross-document literature reviews
- 💬 **Intelligent Chat**: Natural language interaction with papers
- 🧠 **Local Privacy**: Complete analysis using Ollama - no external APIs

**Perfect for:** Literature reviews, research synthesis, paper understanding, citation management

## 🛠️ Setup and Imports

In [None]:
import sys
import os
import json
from pathlib import Path

# Add parent directory for imports
sys.path.append('..')

# Import enhanced literature review system
from src import (
    # Basic analysis
    analyze_paper_with_chat,
    UnifiedPaperChat,
    
    # Enhanced analysis for literature reviews
    analyze_paper_for_corpus,
    export_paper_for_corpus,
    track_citations_in_paper,
    verify_citation_accuracy,
    
    # Core components
    EnhancedPaperAnalyzer,
    CitationTracker
)

print("✅ Enhanced Literature Review System v2.1.0 loaded!")
print("🔧 Ensure Ollama is running with: llama3.1:8b and nomic-embed-text")
print("📚 Ready for comprehensive paper analysis and literature review preparation")

## 📄 Load Paper for Analysis

Choose your analysis approach:
- **Basic Chat**: Simple paper interaction
- **Enhanced Analysis**: Rich metadata for literature reviews
- **Corpus Export**: GraphRAG-ready format for multi-paper analysis

In [None]:
# 📁 Set path to your paper
pdf_path = "../examples/d4sc03921a.pdf"  # Example paper
# pdf_path = "/path/to/your/paper.pdf"    # 👈 Change this to your paper

print(f"📚 Loading paper for enhanced analysis: {Path(pdf_path).name}")
print("⏳ This takes 60-90 seconds (enhanced extraction + citations + entities)...")
print()

# Load with basic chat functionality
chat_system = analyze_paper_with_chat(pdf_path)

print("\n🎉 Paper loaded successfully!")
print("💬 Basic chat system ready")
print("📊 Enhanced analysis capabilities available")

## 🔬 Enhanced Paper Analysis

Get comprehensive metadata extraction with domain classification, research type identification, and section analysis.

In [None]:
# 🔬 Run enhanced analysis for literature review preparation
print("🔬 Running Enhanced Paper Analysis...")
print("⏳ Extracting metadata, domain classification, citations...")

enhanced_doc = analyze_paper_for_corpus(pdf_path)

# Display enhanced metadata
metadata = enhanced_doc['metadata']

print("\n📊 ENHANCED PAPER ANALYSIS")
print("=" * 50)
print(f"📰 Title: {metadata['title']}")
print(f"👥 Authors: {', '.join(metadata['authors'])}")
print(f"📅 Year: {metadata['year']}")
print(f"🏛️ Domain: {metadata['domain']}")
print(f"🔬 Research Type: {metadata['research_type']}")
print(f"📊 Methodology: {metadata['methodology']}")

print(f"\n📈 CONTENT ANALYSIS:")
print(f"📝 Word Count: {metadata['word_count']:,}")
print(f"📑 Sections: {metadata['section_count']}")
print(f"📎 Citations: {metadata['citation_count']}")
print(f"📋 Abstract: {'✅ Found' if metadata['has_abstract'] else '❌ Missing'} ({len(metadata['abstract'])} chars)")
print(f"🔖 Keywords: {metadata['keywords'][:5] if metadata['keywords'] else 'None found'}")

print(f"\n🕸️ EXTRACTED ENTITIES:")
entities = enhanced_doc['full_analysis']['entities']
for category, items in entities.items():
    if items and isinstance(items, list) and len(items) > 0:
        print(f"  {category.title()}: {', '.join(items[:3])}{'...' if len(items) > 3 else ''}")
    elif items and isinstance(items, str):
        print(f"  {category.title()}: {items}")

print(f"\n📋 DOCUMENT ID: {enhanced_doc['document_id']}")

## 📍 Citation Tracking & Analysis

Discover precise citation locations, reference parsing, and context analysis for literature review writing.

In [None]:
# 📍 Run citation tracking analysis
print("📍 Analyzing Citations & References...")

citations = track_citations_in_paper(
    enhanced_doc['content'], 
    enhanced_doc['metadata']
)

print("\n📎 CITATION ANALYSIS")
print("=" * 40)
print(f"📄 Paper: {citations['paper_info']['title'][:60]}...")
print(f"📊 Inline Citations: {len(citations['inline_citations'])}")
print(f"📚 Reference List: {len(citations['reference_list'])}")
print(f"📈 Citation Density: {citations['citation_density']['citations_per_1000_words']:.1f} per 1000 words")

if citations['inline_citations']:
    print(f"\n🔍 SAMPLE INLINE CITATIONS:")
    for i, cite in enumerate(citations['inline_citations'][:3]):
        print(f"  {i+1}. '{cite['text']}' (Line {cite['line_number']}, {cite['section']})")
        print(f"     Context: ...{cite['context']['sentence'][:100]}...")

if citations['reference_list']:
    print(f"\n📚 SAMPLE REFERENCES:")
    for i, ref in enumerate(citations['reference_list'][:3]):
        print(f"  [{ref['number']}] {ref['authors'][:2] if ref['authors'] else 'Unknown'}")
        print(f"      Title: {ref['title'][:60]}..." if ref['title'] else "      Title: Not extracted")
        print(f"      Year: {ref['year']}" if ref['year'] else "      Year: Unknown")

if citations['key_claims']:
    print(f"\n💡 KEY CLAIMS WITH CITATIONS:")
    for i, claim in enumerate(citations['key_claims'][:2]):
        print(f"  {i+1}. [{claim['type'].title()}] {claim['claim'][:80]}...")
        print(f"     Section: {claim['section']}")

## 📊 Corpus Export for Literature Reviews

Export the paper in GraphRAG-compatible format for cross-paper analysis and literature review systems.

In [None]:
# 📊 Export paper for corpus and literature review system
print("📊 Exporting for Literature Review Corpus...")
print("⏳ This includes chat capabilities + citation tracking...")

# Note: This may take longer as it runs full analysis
try:
    corpus_doc = export_paper_for_corpus(pdf_path)
    
    print("\n📦 CORPUS EXPORT COMPLETE")
    print("=" * 40)
    print(f"📋 Document ID: {corpus_doc['document_id']}")
    print(f"📰 Title: {corpus_doc['metadata']['title'][:50]}...")
    print(f"🏛️ Domain: {corpus_doc['metadata']['domain']}")
    print(f"🔗 GraphRAG Edges Available:")
    print(f"   • Authors: {len(corpus_doc['metadata']['authors'])}")
    print(f"   • Methods: {len(corpus_doc['metadata']['methods'])}")
    print(f"   • Concepts: {len(corpus_doc['metadata']['concepts'])}")
    print(f"   • Institutions: {len(corpus_doc['metadata']['institutions'])}")
    
    print(f"\n📍 Citation Tracking:")
    print(f"   • Inline Citations: {len(corpus_doc['citation_tracking']['inline_citations'])}")
    print(f"   • Reference List: {len(corpus_doc['citation_tracking']['reference_list'])}")
    print(f"   • Section Mapping: {len(corpus_doc['citation_tracking']['sections'])} sections")
    
    print(f"\n🚀 Capabilities: {', '.join(corpus_doc['corpus_metadata']['capabilities'])}")
    print(f"📅 Export Date: {corpus_doc['corpus_metadata']['export_date']}")
    print(f"🔧 Analysis Version: {corpus_doc['corpus_metadata']['analysis_version']}")
    
    print("\n✅ Paper ready for GraphRAG and literature review systems!")
    
except Exception as e:
    print(f"⚠️ Corpus export timed out or failed: {e}")
    print("💡 This is normal for comprehensive analysis - the enhanced doc above is sufficient for most uses")

## 💬 Intelligent Chat Interface

Ask questions about your paper with intelligent routing between RAG and Knowledge Graph systems.

In [None]:
# 💬 Get suggested questions based on enhanced analysis
suggestions = chat_system.suggest_questions()

print("💡 INTELLIGENT QUESTION SUGGESTIONS")
print("=" * 45)
print("Based on your paper's content and entities:")
for i, suggestion in enumerate(suggestions[:8], 1):
    print(f"  {i}. {suggestion}")

print("\n💬 Use these questions in the chat cells below!")
print("🔀 The system automatically routes to RAG or Knowledge Graph based on question type")

In [None]:
# 💬 CHAT: Content Analysis (RAG Mode)
question = "What are the main findings and results of this research?"

print(f"❓ Question: {question}")
print(f"🎯 Expected Mode: RAG (content analysis)")
print("\n🤖 Answer:")

response = chat_system.chat(question)
print(response['answer'])

print(f"\n📊 Routing: {response.get('mode', 'unknown')} | Source: {response.get('source', 'mixed')}")

In [None]:
# 💬 CHAT: Entity Exploration (Graph Mode)
question = "Who are the authors and what methods did they use?"

print(f"❓ Question: {question}")
print(f"🎯 Expected Mode: Graph (entity-focused)")
print("\n🤖 Answer:")

response = chat_system.chat(question)
print(response['answer'])

if 'entities' in response:
    print(f"\n🔍 Related Entities: {response['entities']}")

print(f"\n📊 Routing: {response.get('mode', 'unknown')} | Source: {response.get('source', 'mixed')}")

In [None]:
# 💬 CHAT: Combined Analysis (Both RAG + Graph)
question = "How do the concepts and methods relate to the overall findings?"

print(f"❓ Question: {question}")
print(f"🎯 Expected Mode: Both (complex analysis)")
print("\n🤖 Answer:")

response = chat_system.chat(question)
print(response['answer'])

print(f"\n📊 Routing: {response.get('mode', 'unknown')} | Source: {response.get('rag_source', 'mixed')}")
if 'graph_context' in response:
    print(f"🕸️ Graph Context: {response['graph_context']['total_nodes']} entities used")

In [None]:
# 💬 CUSTOM CHAT: Your Question
question = "YOUR QUESTION HERE"  # 👈 Write your own question

print(f"❓ Your Question: {question}")
print("\n🤖 AI Assistant:")

response = chat_system.chat(question)
print(response.get('answer', response.get('error', 'No response')))

if 'answer' in response:
    print(f"\n📊 Routing: {response.get('mode', 'unknown')} | Source: {response.get('source', 'mixed')}")
    if response.get('mode') == 'graph' and 'connections' in response:
        print(f"🔗 Entity Connections Found: {len(response['connections'])}")
    elif response.get('mode') == 'both' and 'graph_context' in response:
        print(f"🕸️ Combined Analysis: RAG + {response['graph_context']['total_nodes']} entities")

## 🕸️ Knowledge Graph Deep Dive

Explore entities, relationships, and discover connections in your paper.

In [None]:
# 🕸️ Knowledge Graph Overview
graph_summary = chat_system.kg.get_graph_summary()

if 'error' not in graph_summary:
    print("🕸️ KNOWLEDGE GRAPH ANALYSIS")
    print("=" * 35)
    print(f"📊 Total Entities: {graph_summary['total_nodes']}")
    print(f"🔗 Total Relationships: {graph_summary['total_edges']}")
    print(f"🎯 Graph Density: {graph_summary['graph_density']:.3f}")
    
    print("\n📁 ENTITIES BY CATEGORY:")
    for category, count in graph_summary['nodes_by_category'].items():
        print(f"  • {category.title()}: {count}")
    
    print("\n⭐ MOST CONNECTED ENTITIES:")
    for item in graph_summary['most_connected'][:5]:
        print(f"  • {item['node']} (connection score: {item['connections']:.2f})")
else:
    print(f"❌ Graph summary not available: {graph_summary['error']}")

In [None]:
# 🔍 Entity Deep Dive - Pick an interesting entity to explore
entities = chat_system.get_entities()

# Find an interesting entity
entity_to_explore = None
for category, items in entities.items():
    if items and isinstance(items, list) and len(items) > 0:
        entity_to_explore = items[0]  # Pick the first entity from any category
        break

if entity_to_explore:
    print(f"🔍 DEEP DIVE: '{entity_to_explore}'")
    print("=" * 50)
    
    exploration = chat_system.explore_entity(entity_to_explore)
    
    if 'error' not in exploration:
        print(f"📊 Entity: {exploration['node']}")
        print(f"📁 Category: {exploration['category']}")
        print(f"🔗 Direct Connections: {exploration['total_connections']}")
        
        if exploration['connections']:
            print("\n🕸️ RELATIONSHIPS:")
            for conn in exploration['connections'][:5]:  # Show top 5
                print(f"  • {conn['relationship']} → {conn['target']} ({conn['category']})")
            
            if len(exploration['connections']) > 5:
                print(f"  ... and {len(exploration['connections']) - 5} more connections")
        
        if 'rag_context' in exploration:
            print(f"\n💬 CONTEXT FROM PAPER:")
            context = exploration['rag_context']
            print(f"  {context[:200]}{'...' if len(context) > 200 else ''}")
    else:
        print(f"❌ Could not explore entity: {exploration['error']}")
else:
    print("❌ No entities found to explore")
    print("💡 Try loading a paper with more structured content")

## 📝 Literature Review Applications

See how this analysis supports literature review writing with citation verification.

In [None]:
# 📝 Literature Review Example: Citation Verification
print("📝 LITERATURE REVIEW WRITING EXAMPLE")
print("=" * 45)

# Simulate a literature review sentence with citations
sample_review_text = f"""
Recent advances in machine learning have shown significant promise in chemistry applications [1,2]. 
The authors demonstrated improved accuracy in molecular property prediction [3], 
achieving results that outperform traditional computational methods.
"""

print("📄 Sample Literature Review Text:")
print(sample_review_text)

# Create evidence sources from our paper
evidence_sources = [{
    'paper': enhanced_doc['metadata'],
    'citation': enhanced_doc['metadata']['title'],
    'content': enhanced_doc['content'][:1000]  # First 1000 chars as evidence
}]

# Verify citation accuracy
verification = verify_citation_accuracy(sample_review_text, evidence_sources)

print("\n🔍 CITATION VERIFICATION RESULTS:")
print(f"📊 Total Citations Found: {verification['total_citations']}")
print(f"✅ Verified Citations: {verification['verified_citations']}")
print(f"❌ Unverified Citations: {len(verification['unverified_citations'])}")
print(f"📈 Accuracy Score: {verification['accuracy_score']:.1%}")

if verification['unverified_citations']:
    print("\n⚠️ Citations needing sources:")
    for cite in verification['unverified_citations']:
        print(f"  • {cite['text']} (Line {cite['line_number']})")

print("\n💡 This shows how the system supports literature review writing with citation verification!")

## 📋 Paper Summary & Citation

Get a complete overview and properly formatted citation for reference.

In [None]:
# 📋 Complete Paper Summary
overview = chat_system.get_paper_overview()
paper_info = chat_system.rag.paper_data

print("📋 COMPLETE PAPER SUMMARY")
print("=" * 30)
print(f"📰 **Title:** {paper_info['title']}")
print(f"👥 **Authors:** {', '.join(enhanced_doc['metadata']['authors']) if enhanced_doc['metadata']['authors'] else 'Not extracted'}")
print(f"📅 **Year:** {enhanced_doc['metadata']['year'] or 'Not found'}")
print(f"🏛️ **Domain:** {enhanced_doc['metadata']['domain']}")
print(f"🔬 **Research Type:** {enhanced_doc['metadata']['research_type']}")

print(f"\n📊 **Analysis Stats:**")
print(f"  • Content: {overview['paper_info']['chunks']} chunks, {enhanced_doc['metadata']['word_count']:,} words")
print(f"  • Structure: {enhanced_doc['metadata']['section_count']} sections")
print(f"  • Citations: {enhanced_doc['metadata']['citation_count']} references")
print(f"  • Entities: {overview['knowledge_graph']['total_nodes']} extracted")
print(f"  • Relationships: {overview['knowledge_graph']['total_edges']} mapped")

print(f"\n📝 **Formatted Citation:**")
print(f"{paper_info['citation']}")

print(f"\n🔖 **Document ID:** {enhanced_doc['document_id']}")
print(f"📅 **Analysis Date:** {enhanced_doc['metadata']['processed_date'][:10]}")
print(f"🚀 **System Version:** 2.1.0 (Enhanced Literature Review)")

## 🚀 Next Steps: Literature Review System

**🎉 You've successfully analyzed a paper with the Enhanced Literature Review System!**

### **What You've Accomplished:**
- ✅ **Enhanced Paper Analysis** with rich metadata extraction
- ✅ **Citation Tracking** with precise location mapping
- ✅ **Knowledge Graph** construction and exploration
- ✅ **Intelligent Chat** with automatic routing
- ✅ **Corpus Export** for GraphRAG compatibility
- ✅ **Literature Review** preparation and citation verification

### **Literature Review Workflow:**

1. **📚 Multiple Papers**: Use this notebook to analyze several papers
2. **🔗 Cross-Paper Analysis**: Export each paper for corpus building
3. **🕸️ GraphRAG Integration**: Coming in Phase 2 for cross-document discovery
4. **📝 Automated Writing**: Future Phase 3 for literature review generation

### **Advanced Usage Tips:**

- **🔄 Batch Processing**: Run this notebook on multiple papers and collect the corpus exports
- **🎯 Domain Focus**: The system classifies papers by domain for thematic grouping
- **📍 Citation Management**: Use the citation tracking for precise source attribution
- **🔍 Entity Linking**: Similar entities across papers will enable cross-document discovery

### **Coming Soon:**
- **📊 Corpus Manager**: Multi-paper database with GraphRAG
- **🔍 Literature Discovery**: Cross-paper entity and theme identification
- **✍️ Review Generator**: Automated literature review writing with citations

**🎯 Perfect for researchers, academics, and anyone conducting literature reviews!**

---

**💡 Pro Tip**: Save your corpus exports and enhanced analyses - they'll be compatible with the upcoming multi-paper GraphRAG system!