# Explanation of Key Components


**This version does not use Embedings**


### Data Collection Pipeline:

Uses arXiv API to search for papers based on research queries               
Prioritizes HTML versions of papers (available for papers since Dec 2023)               
Falls back to abstracts for older papers without HTML versions          
Caches paper information to avoid redundant processing            
Extracts clean text content using BeautifulSoup for HTML papers     

### Query Enhancement:

Employs few-shot prompting with domain-specific examples        
Expands original queries to include related concepts and terminology         
Extracts key terms from expanded queries for more effective searching         
Adapts to specific research domains with customized examples          
Handles both general research and specialized topics (like Vision Transformers)    

### LangGraph RAG Workflow:

Implements a structured workflow with defined nodes and transitions       
Maintains comprehensive state throughout the research process          
Four-stage pipeline: query expansion → paper search → analysis → response generation         
Each node updates specific parts of the state without losing information          
Handles the full research journey from question to comprehensive analysis      

### Paper Analysis Capabilities:

Generates in-depth analyses of retrieved papers using few-shot learning       
Identifies connections between papers and research question           
Extracts key contributions, methodologies, and technical details           
Provides research context through carefully selected examples          
Synthesizes information across multiple papers for comprehensive understanding          

### User Interaction:

Provides formatted Markdown output for easy reading in notebooks      
Displays expanded query to show understanding of research needs         
Presents comprehensive research analysis with insights and connections          
Lists top papers with titles, authors, publication dates, and abstracts       
Includes direct links to original papers on arXiv          
Simple interface for entering research queries and viewing results          

## Customizable Parameters in the Pipeline

Here are the key parameters to modify to experiment with results:

**Model Parameters**     
Model Version: Change model = genai.GenerativeModel('models/gemini-1.5-pro-latest') to use a different Gemini model         
Temperature: Add temperature parameter when creating the model to control creativity (e.g., model = genai.GenerativeModel('models/gemini-1.5-pro-latest', temperature=0.2))             

**Search Parameters**
Max Results: Modify max_results=20 in arxiv_api_search() to return more/fewer papers                  
Sort Criteria: Change sort_by=arxiv.SortCriterion.Relevance to sort by other criteria like Submitted or LastUpdated             
Category Filter: Customize the category filter logic in arxiv_api_search() (currently set to cs.CV for vision-related queries)              

**Content Processing**
Papers Analyzed: Change the number of papers analyzed in analyze_papers_node() and analyze_papers() (currently uses top 5)             
Few-Shot Examples: Modify the examples in expand_research_query() and analyze_papers() to better fit your domain         

**Display Settings**
Display Limit: Change the slice in results["search_results"][:5] in display_langgraph_results() to show more papers         
Abstract Length: Adjust the paper['abstract'][:300] to show more/less text      

**Workflow Configuration**
Node Ordering: Rearrange the workflow by modifying edges in create_research_workflow()               
Initial State: Add additional fields to the initial state in research_arxiv_langgraph()            


In [41]:
#!pip install -U google-generativeai langchain-google-genai

## Imports and Configuration

In [13]:
# ============= Imports and Configuration =============

import os
import arxiv
import requests
import numpy as np
import uuid
from typing import TypedDict, List, Dict, Any, Sequence, Optional, Callable
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

# LangGraph imports
from langgraph.graph import StateGraph

# LLM and message handling imports
import google.generativeai as genai
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage

# Vector database imports
import chromadb
from chromadb.utils import embedding_functions

"""
langgraph  Version: 0.3.34
google-generativeai Version: 0.8.5
"""

'\nlanggraph  Version: 0.3.34\ngoogle-generativeai Version: 0.8.5\n'

## State and Configuration 

In [26]:
class RAGState(TypedDict):
    """State for the RAG workflow.
    
    Attributes:
        query: The original user query
        expanded_query: Query after expansion with the LLM
        context: List of contextualized information from papers
        messages: List of chat messages in the conversation
        search_results: List of papers retrieved from arXiv
        analysis: Generated analysis of the papers
        embedding_results: Papers retrieved via vector search
    """
    query: str
    expanded_query: str
    context: List[str]
    messages: List[Dict[str, Any]]
    search_results: List[Dict[str, Any]]
    analysis: str
    embedding_results: List[Dict[str, Any]]


# API Configuration
GOOGLE_API_KEY = os.environ.get("GOOGLE_API_KEY", "AIzaSyBuYf9Sdm8M8tIMvfArkcS_YUjhEZfZqes")
genai.configure(api_key=GOOGLE_API_KEY)

# Create a model instance
model = genai.GenerativeModel('models/gemini-1.5-pro-latest')

# Create a custom embedding function for ChromaDB
class GoogleEmbeddingFunction(chromadb.utils.embedding_functions.EmbeddingFunction):
    """Custom embedding function that uses Google's Generative AI API."""
    
    def __init__(self, api_key: str):
        """Initialize with Google API key.
        
        Args:
            api_key: Google API key
        """
        self.api_key = api_key
        genai.configure(api_key=api_key)
    
    def __call__(self, texts: List[str]) -> List[List[float]]:
        """Generate embeddings for the provided texts.
        
        Args:
            texts: List of texts to embed
            
        Returns:
            List of embedding vectors
        """
        embeddings = []
        
        for text in texts:
            try:
                # Truncate text if too long (API has limits)
                if len(text) > 8000:
                    text = text[:8000]
                
                # Use Google's embedding model via global function
                embedding_result = genai.embed_content(
                    model="embedding-001",
                    content=text,
                    task_type="retrieval_document",
                )
                
                # Get the embedding values
                if embedding_result and hasattr(embedding_result, "embedding"):
                    embeddings.append(embedding_result.embedding)
                else:
                    # Fallback: random embeddings if API fails
                    print(f"Warning: Failed to get embedding, using random fallback")
                    embeddings.append(np.random.rand(768).tolist())
            except Exception as e:
                print(f"Error generating embedding: {e}")
                # Fallback: random embeddings
                embeddings.append(np.random.rand(768).tolist())
                
        return embeddings

# Create a ChromaDB collection
def get_vector_db():
    """Get the vector database client and collection.
    
    Creates or retrieves a ChromaDB collection for storing paper embeddings
    using Google's text embeddings model.
    
    Returns:
        A tuple of (client, collection)
    """
    client = chromadb.Client()
    
    # Set up our custom embedding function
    embedding_function = GoogleEmbeddingFunction(api_key=GOOGLE_API_KEY)
    
    # Create or get a collection
    collection = client.get_or_create_collection(
        name="arxiv_papers",
        embedding_function=embedding_function
    )
    
    return client, collection

# Initialize the vector database
try:
    client, collection = get_vector_db()
    print("Successfully initialized vector database")
except Exception as e:
    print(f"Error initializing vector database: {e}")
    # Create a fallback in-memory dictionary to store papers
    # This will allow the pipeline to run without vector search capability
    collection = None
    print("Using fallback storage without vector search capabilities")

# Papers database cache
papers_db = {}

Successfully initialized vector database


## Query Expansion and Analysis 

In [27]:
def expand_research_query(query: str) -> str:
    """Expand a research query using few-shot prompting.
    
    Uses domain-specific examples to help the model generate
    a comprehensive expansion of the original query.
    
    Args:
        query: The original research query
        
    Returns:
        An expanded version of the query with additional concepts and terms
    """
    # Vision transformer specific example if the query is about vision transformers
    if "vision transformer" in query.lower() or "vit" in query.lower():
        few_shot_examples = """
        Example 1:
        Query: "vision transformer architecture"
        Expanded: The query is about Vision Transformer (ViT) architectures for computer vision tasks. Key aspects to explore include: original ViT design and patch-based image tokenization; comparison with CNN architectures; attention mechanisms specialized for vision; hierarchical and pyramid vision transformers; efficiency improvements like token pruning and sparse attention; distillation techniques for vision transformers; adaptations for different vision tasks including detection and segmentation; recent innovations addressing quadratic complexity and attention saturation.
        
        Example 2: 
        Query: "how do vision transformers process images"
        Expanded: The query focuses on the internal mechanisms of how Vision Transformers process visual information. Key areas to investigate include: patch embedding processes; position embeddings for spatial awareness; self-attention mechanisms for global context; the role of MLP blocks in feature transformation; how class tokens aggregate information; patch size impact on performance and efficiency; multi-head attention design in vision applications; information flow through vision transformer layers; differences from convolutional approaches to feature extraction.
        """
    else:
        few_shot_examples = """
        Example 1:
        Query: "transformer models for NLP"
        Expanded: The query is about transformer architecture models used in natural language processing. Key aspects to explore include: BERT, GPT, T5, and other transformer variants; attention mechanisms; self-supervision and pre-training approaches; fine-tuning methods; performance on NLP tasks like translation, summarization, and question answering; efficiency improvements like distillation and pruning; recent innovations in transformer architectures.
        
        Example 2:
        Query: "reinforcement learning for robotics"
        Expanded: The query concerns applying reinforcement learning methods to robotic systems. Important areas to investigate include: policy gradient methods; Q-learning variants for continuous control; sim-to-real transfer; imitation learning; model-based RL for robotics; sample efficiency techniques; multi-agent RL for coordinated robots; safety constraints in robotic RL; real-world applications and benchmarks; hierarchical RL for complex tasks.
        
        Example 3:
        Query: "graph neural networks applications"
        Expanded: The query focuses on practical applications of graph neural networks. Key dimensions to explore include: GNN architectures (GCN, GAT, GraphSAGE); applications in chemistry and drug discovery; recommender systems using GNNs; traffic and transportation network modeling; social network analysis; knowledge graph completion; GNNs for computer vision tasks; scalability solutions for large graphs; theoretical foundations of graph representation learning.
        """
    
    prompt = f"""Based on the examples below, expand my research query to identify key concepts, relevant subtopics, and specific areas to explore:

    {few_shot_examples}

    Query: "{query}"
    Expanded:"""
    
    generation_config = {"temperature": 1.0}
    
    response = model.generate_content(prompt, generation_config=generation_config)
    
    return response.text


def analyze_papers(query: str, papers: List[Dict[str, Any]]) -> str:
    """Analyze papers using few-shot prompting with domain-specific examples.
    
    Generates a research analysis based on the retrieved papers and
    the original query using domain-specific examples.
    
    Args:
        query: The original research query
        papers: List of paper dictionaries containing metadata and content
        
    Returns:
        A comprehensive analysis of the papers in relation to the query
    """
    few_shot_examples = """
    Example 1:
    Papers:
    1. "Attention Is All You Need" - Introduced the transformer architecture relying entirely on attention mechanisms without recurrence or convolutions.
    2. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" - Proposed bidirectional training for transformers using masked language modeling.
    
    Analysis:
    These papers represent seminal work in transformer architectures for NLP. "Attention Is All You Need" established the foundation with the original transformer design using multi-head self-attention. BERT built upon this by introducing bidirectional context modeling and masked language modeling for pre-training, significantly advancing performance on downstream tasks. Key themes include attention mechanisms, pre-training objectives, and the importance of training methodology.
    
    Example 2:
    Query: "How does the Less-Attention Vision Transformer architecture address the computational inefficiencies and saturation problems of traditional Vision Transformers?"
    Papers: 
    1. "You Only Need Less Attention at Each Stage in Vision Transformers" - Proposed reusing early-layer attention scores through linear transformations to reduce computational costs.
    
    Analysis:
    Less-Attention Vision Transformer reduces ViT's quadratic attention cost by reusing early-layer attention scores through linear transformations. It also mitigates attention saturation using residual downsampling and a custom loss to preserve attention structure. This approach addresses two key limitations of traditional Vision Transformers: computational inefficiency due to quadratic complexity of self-attention, and the saturation problem where attention maps become increasingly similar in deeper layers.
    """
    
    # Format paper information
    paper_info = "\n".join([
        f"{i+1}. \"{p['title']}\" - {p['abstract'][:200]}..." 
        for i, p in enumerate(papers[:5])
    ])
    
    prompt = f"""Based on the examples below, analyze the following research papers related to "{query}" to identify key technical contributions, methodologies, and how they address specific challenges:

    {few_shot_examples}
    
    Papers:
    {paper_info}
    
    Analysis:"""
    
    response = model.generate_content(prompt, generation_config={"temperature": 1.0})
    
    return response.text


## arXiv Data Collection Functions 

In [28]:
def arxiv_api_search(query: str, max_results: int = 20) -> List[Any]:
    """Search arXiv for papers matching the query.
    
    Uses the arXiv API to find relevant papers based on the query,
    with optional filtering for specific categories.
    
    Args:
        query: The search query
        max_results: Maximum number of results to return (default: 20)
        
    Returns:
        List of arxiv.Result objects representing papers
    """
    # Use category filter for more relevant results
    category_filter = "cat:cs.CV" if "vision" in query.lower() or "image" in query.lower() else ""
    search_query = f"{query} {category_filter}".strip()
    
    # Create a Client instance and use it for search
    client = arxiv.Client()
    search = arxiv.Search(
        query=search_query,
        max_results=max_results,
        sort_by=arxiv.SortCriterion.Relevance,
        sort_order=arxiv.SortOrder.Descending
    )
    return list(client.results(search))


def check_html_available(paper_id: str) -> bool:
    """Check if HTML version is available for a paper.
    
    Tests if the paper has an HTML version on arXiv by
    sending a HEAD request to the HTML URL.
    
    Args:
        paper_id: The arXiv paper ID
        
    Returns:
        True if HTML version is available, False otherwise
    """
    html_url = f"https://arxiv.org/html/{paper_id}"
    response = requests.head(html_url)
    return response.status_code == 200


def get_html_content(paper_id: str) -> Optional[str]:
    """Get HTML content of a paper if available.
    
    Fetches and parses the HTML version of a paper, removing
    irrelevant elements and extracting the main content.
    
    Args:
        paper_id: The arXiv paper ID
        
    Returns:
        Extracted text content from the HTML version or None if unavailable
    """
    html_url = f"https://arxiv.org/html/{paper_id}"
    response = requests.get(html_url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        # Remove scripts, styles, and navigation elements
        for tag in soup(['script', 'style', 'nav', 'header', 'footer']):
            tag.decompose()
        # Get main content
        main_content = soup.find('main') or soup.find('body')
        if main_content:
            return main_content.get_text(separator='\n', strip=True)
    return None


## Vector Database Functions

In [29]:
def add_paper_to_vector_db(paper_info: Dict[str, Any]) -> None:
    """Add a paper to the vector database.
    
    Stores a paper's metadata and content in the vector database
    for semantic search.
    
    Args:
        paper_info: Dictionary containing paper metadata and content
    """
    # Use a combination of title and content for embedding
    text_to_embed = f"{paper_info['title']} {paper_info['content'][:5000]}"
    
    # Add to the collection
    collection.add(
        ids=[paper_info['paper_id']],
        documents=[text_to_embed],
        metadatas=[{
            'title': paper_info['title'],
            'authors': paper_info['authors'],
            'published': paper_info['published'],
            'url': paper_info['url'],
            'abstract': paper_info['abstract']
        }]
    )
    
    print(f"Added paper to vector DB: {paper_info['title']}")


def query_vector_db(query: str, n_results: int = 5) -> List[Dict[str, Any]]:
    """Query the vector database for papers semantically similar to the query.
    
    Performs a semantic search in the vector database to find
    papers most relevant to the query.
    
    Args:
        query: The search query
        n_results: Maximum number of results to return
        
    Returns:
        List of paper dictionaries containing metadata
    """
    # Query the collection
    results = collection.query(
        query_texts=[query],
        n_results=n_results
    )
    
    papers = []
    
    if results and len(results['ids'][0]) > 0:
        for i in range(len(results['ids'][0])):
            paper_id = results['ids'][0][i]
            metadata = results['metadatas'][0][i]
            
            # Get the full paper info from our database
            if paper_id in papers_db:
                papers.append(papers_db[paper_id])
            else:
                # Reconstruct from metadata if not in cache
                papers.append({
                    'paper_id': paper_id,
                    'title': metadata['title'],
                    'authors': metadata['authors'],
                    'published': metadata['published'],
                    'url': metadata['url'],
                    'abstract': metadata['abstract'],
                    'content': metadata['abstract']  # Fall back to abstract
                })
    
    return papers


## LangGraph Workflow Nodes 

In [30]:
def query_expansion_node(state: RAGState) -> RAGState:
    """LangGraph node that expands the original query.
    
    Takes the original query and generates an expanded version
    with additional concepts and search terms.
    
    Args:
        state: The current LangGraph state
        
    Returns:
        Updated state with expanded query
    """
    query = state["query"]
    
    # Use a very explicit prompt to avoid the model repeating our instructions
    prompt = f"""
    Please expand the following research query:
    
    Query: "{query}"
    
    Provide a detailed expansion that identifies key concepts, 
    terminology, and relevant subtopics. Do not include phrases like
    "Query:" or "Expanded:" in your response. Just provide the expanded content.
    """
    
    response = model.generate_content(prompt)
    expanded_query = response.text.strip()
    
    print("EXPANSION NODE - Output expanded query:", expanded_query[:100] + "...")
    
    return {"expanded_query": expanded_query}


def search_papers_node(state: RAGState) -> RAGState:
    """LangGraph node that searches for papers based on the query.
    
    Uses the original and expanded query to search arXiv for relevant
    papers, processes them, and stores the results in both the regular
    database and the vector database.
    
    Args:
        state: The current LangGraph state
        
    Returns:
        Updated state with search results
    """
    print("SEARCH NODE - Input state keys:", list(state.keys()))
    
    query = state["query"]
    expanded_query = state["expanded_query"]
    
    # Extract actual search terms from expanded query
    # First, check if expanded query starts with "Query:" (which would indicate our formatting issue)
    if "Query:" in expanded_query and "Expanded:" in expanded_query:
        # Extract just the expansion part
        expanded_query = expanded_query.split("Expanded:")[1].strip()
    
    # Extract key terms based on domain
    domain_specific_terms = []
    
    # Domain detection logic
    if "vision transformer" in query.lower() or "vit" in query.lower():
        domain_specific_terms = ["Vision Transformer", "ViT", "image patches", 
                          "self-attention", "transformer encoder", 
                          "multi-head attention", "computer vision"]
    elif "graph" in query.lower() and "neural" in query.lower():
        domain_specific_terms = ["Graph Neural Network", "GNN", "node embedding",
                          "message passing", "graph attention", "GraphSAGE"]
    # Add more domain detection as needed
    elif "reinforcement learning" in query.lower() or " rl " in f" {query.lower()} ":
        domain_specific_terms = ["Reinforcement Learning", "RL", "policy gradient", 
                          "Q-learning", "reward function", "MDP", "Markov Decision Process", 
                          "DDPG", "PPO", "TD learning", "actor-critic"]
    elif "large language model" in query.lower() or "llm" in query.lower():
        domain_specific_terms = ["Large Language Model", "LLM", "transformer", 
                          "attention mechanism", "GPT", "BERT", "prompt engineering", 
                          "fine-tuning", "few-shot learning", "instruction tuning"]
    elif "diffusion" in query.lower() and ("model" in query.lower() or "image" in query.lower()):
        domain_specific_terms = ["Diffusion Model", "DDPM", "latent diffusion", 
                          "score-based generative model", "noise prediction", 
                          "reverse diffusion", "U-Net", "text-to-image"]
    elif "robotics" in query.lower() or "robot" in query.lower():
        domain_specific_terms = ["Robotics", "robot learning", "manipulation", 
                          "grasping", "trajectory optimization", "inverse kinematics", 
                          "motion planning", "control policy", "sim2real"]
    elif "recommendation" in query.lower() or "recommender" in query.lower():
        domain_specific_terms = ["Recommender System", "collaborative filtering", 
                          "content-based filtering", "matrix factorization", 
                          "user embedding", "item embedding", "CTR prediction"]
    elif "computer vision" in query.lower() or "image" in query.lower():
        domain_specific_terms = ["Computer Vision", "CNN", "object detection", 
                          "segmentation", "image recognition", "feature extraction", 
                          "SIFT", "ResNet", "Faster R-CNN", "YOLO"]
    elif ("natural language" in query.lower() or "nlp" in query.lower()) and "transformer" not in query.lower():
        domain_specific_terms = ["Natural Language Processing", "NLP", "named entity recognition", 
                          "sentiment analysis", "text classification", "word embedding", 
                          "language model", "sequence-to-sequence", "LSTM", "RNN"]
    elif "generative" in query.lower() or "gan" in query.lower():
        domain_specific_terms = ["Generative Adversarial Network", "GAN", "StyleGAN", 
                          "generator", "discriminator", "adversarial training", 
                          "latent space", "mode collapse", "image synthesis"]
    elif "attention" in query.lower() or "transformer" in query.lower():
        domain_specific_terms = ["Transformer", "attention mechanism", "self-attention", 
                          "multi-head attention", "encoder-decoder", "positional encoding", 
                          "cross-attention", "attention weights"]
    elif "quantum" in query.lower() and ("computing" in query.lower() or "machine learning" in query.lower()):
        domain_specific_terms = ["Quantum Computing", "quantum machine learning", 
                          "quantum circuit", "qubit", "quantum gate", "variational quantum circuit", 
                          "QAOA", "quantum advantage", "quantum supremacy"]
    
    # Generic ML terms for any ML-related query
    if any(term in query.lower() for term in ["machine learning", "neural network", "deep learning", "ai"]):
        generic_ml_terms = ["neural network", "deep learning", "backpropagation", 
                     "gradient descent", "loss function", "activation function", 
                     "hyperparameter tuning", "regularization", "overfitting"]
        domain_specific_terms.extend(generic_ml_terms)
    
    # If no specific domain is detected, extract key terms from the expanded query
    if not domain_specific_terms and expanded_query:
        # Extract potential terms from expanded query
        expanded_lines = expanded_query.split('. ')
        for line in expanded_lines:
            # Find capitalized terms or terms in quotes that might be important concepts
            import re
            potential_terms = re.findall(r'([A-Z][a-zA-Z0-9]+([ \-][A-Z][a-zA-Z0-9]+)*)', line)
            quoted_terms = re.findall(r'"([^"]+)"', line)
            
            # Add these as domain terms
            for term in potential_terms:
                if isinstance(term, tuple):
                    term = term[0]  # Extract the actual term from regex match tuple
                if len(term) > 3 and term not in domain_specific_terms:  # Only terms longer than 3 chars
                    domain_specific_terms.append(term)
                    
            domain_specific_terms.extend(quoted_terms)
    
    # Log the detected terms
    if domain_specific_terms:
        print(f"Detected domain terms: {domain_specific_terms}")
    else:
        print("No specific domain terms detected, using original query only")
    
    # Create a clean search query by combining original and expanded
    search_query = query
    if domain_specific_terms:
        expanded_terms = " OR ".join(f'"{term}"' for term in domain_specific_terms)
        search_query = f'"{query}" OR ({expanded_terms})'
    
    print(f"Clean search query: {search_query}")
    
    # Regular search via arXiv API
    papers = arxiv_api_search(search_query)
    print(f"Found {len(papers)} papers via API search")
    
    # Process and store papers
    results = []
    for paper in papers:
        paper_id = paper.entry_id.split('/')[-1]
        
        # Skip if already in database
        if paper_id in papers_db:
            results.append(papers_db[paper_id])
            continue
            
        paper_info = {
            "paper_id": paper_id,
            "title": paper.title,
            "authors": ", ".join(author.name for author in paper.authors),
            "published": paper.published.strftime("%Y-%m-%d"),
            "url": paper.entry_id,
            "abstract": paper.summary,
            "has_html": check_html_available(paper_id)
        }
        
        # Get HTML content if available
        if paper_info["has_html"]:
            paper_info["content"] = get_html_content(paper_id)
        else:
            paper_info["content"] = paper_info["abstract"]
            
        # Store in our database
        papers_db[paper_id] = paper_info
        
        # Add to vector database
        add_paper_to_vector_db(paper_info)
        
        results.append(paper_info)
    
    print(f"Processed {len(results)} papers for state")
    
    # Now perform semantic search
    semantic_results = query_vector_db(query)
    print(f"Found {len(semantic_results)} papers via semantic search")
    
    # Combine results but prioritize semantic search results
    combined_results = []
    
    # First add semantic results
    paper_ids_added = set()
    for paper in semantic_results:
        combined_results.append(paper)
        paper_ids_added.add(paper["paper_id"])
    
    # Then add any API results not already included
    for paper in results:
        if paper["paper_id"] not in paper_ids_added:
            combined_results.append(paper)
            paper_ids_added.add(paper["paper_id"])
    
    # Return updated state
    updated_state = {
        "search_results": combined_results[:10],  # Limit to top 10 papers
        "embedding_results": semantic_results
    }
    print("SEARCH NODE - Output state keys:", list(updated_state.keys()))
    return updated_state


def analyze_papers_node(state: RAGState) -> RAGState:
    """LangGraph node that analyzes papers.
    
    Takes the search results and generates an analysis
    of the papers in relation to the original query.
    
    Args:
        state: The current LangGraph state
        
    Returns:
        Updated state with paper analysis and context
    """
    query = state["query"]
    search_results = state["search_results"]
    
    # Get the vector search results if available
    embedding_results = state.get("embedding_results", [])
    
    # Prioritize embedding results if available
    papers_to_analyze = embedding_results if embedding_results else search_results
    
    analysis = analyze_papers(query, papers_to_analyze)
    
    # Extract relevant content for context
    context = []
    for paper in papers_to_analyze[:5]:
        context.append(f"Title: {paper['title']}\nAuthors: {paper['authors']}\nAbstract: {paper['abstract']}")
    
    return {
        "analysis": analysis,
        "context": context
    }


def generate_response_node(state: RAGState) -> RAGState:
    """LangGraph node that generates the final response.
    
    Formats the analysis and adds it to the message history
    in the state.
    
    Args:
        state: The current LangGraph state
        
    Returns:
        Updated state with messages
    """
    query = state["query"]
    context = state["context"]
    analysis = state["analysis"]
    
    # Format the message as an AI response
    message = {
        "role": "assistant",
        "content": analysis
    }
    
    # Add the message to the state
    if "messages" not in state:
        state["messages"] = []
    
    state["messages"].append(message)
    
    return {"messages": state["messages"]}

## LangGraph Workflow 

In [31]:
def create_research_workflow():
    """Create a LangGraph workflow for research.
    
    Defines the workflow graph with nodes for query expansion,
    paper search, analysis, and response generation.
    
    Returns:
        A compiled LangGraph workflow
    """
    # Initialize the workflow with the RAGState
    workflow = StateGraph(RAGState)
    
    # Add nodes to the graph
    workflow.add_node("query_expansion", query_expansion_node)
    workflow.add_node("search_papers", search_papers_node)
    workflow.add_node("analyze_papers", analyze_papers_node)
    workflow.add_node("generate_response", generate_response_node)
    
    # Add edges to connect the nodes
    workflow.add_edge("query_expansion", "search_papers")
    workflow.add_edge("search_papers", "analyze_papers")
    workflow.add_edge("analyze_papers", "generate_response")
    
    # Set the entry point
    workflow.set_entry_point("query_expansion")
    
    # Compile the workflow
    return workflow.compile()

## Interface Functions

In [32]:
def research_arxiv_langgraph(query: str) -> Dict[str, Any]:
    """Research arXiv papers using the LangGraph workflow with embeddings.
    
    Main function that executes the full research pipeline
    on a given query, using both keyword and semantic search.
    
    Args:
        query: The research query
        
    Returns:
        The final state with all results
    """
    # Create the workflow
    workflow = create_research_workflow()
    
    # Initialize the state
    initial_state = {
        "query": query,
        "expanded_query": "",
        "context": [],
        "messages": [],
        "search_results": [],
        "embedding_results": [],
        "analysis": ""
    }
    
    # Execute the workflow
    final_state = workflow.invoke(initial_state)
    
    # Debug print state
    print("Final state keys:", list(final_state.keys()))
    
    # Check if search_results exists and has content
    if "search_results" not in final_state or not final_state["search_results"]:
        print("WARNING: No search results found in final state!")
        # If the search_results got lost, we should check if it's available in our papers_db
        if papers_db:
            print(f"Found {len(papers_db)} papers in papers_db, using those instead")
            final_state["search_results"] = list(papers_db.values())
    
    return final_state

## Display Functions 

In [33]:
def display_langgraph_results(results: Dict[str, Any]) -> None:
    """Display the research results in a formatted way.
    
    Creates formatted Markdown outputs for the expanded query,
    research analysis, and top papers, with separate sections
    for semantic search results.
    
    Args:
        results: The final state from the research workflow
    """
    from IPython.display import display, Markdown, HTML
    
    display(Markdown("### QUERY EXPANSION"))
    display(Markdown(results["expanded_query"]))
    
    display(Markdown("### RESEARCH ANALYSIS"))
    display(Markdown(results["analysis"]))
    
    # Show vector-based results first
    if "embedding_results" in results and results["embedding_results"]:
        display(Markdown("### TOP PAPERS (SEMANTIC SEARCH)"))
        display(Markdown(f"**Found {len(results['embedding_results'])} papers via semantic search.**"))
        
        for i, paper in enumerate(results["embedding_results"][:3]):
            paper_md = f"""
                        **{i+1}. {paper['title']}**

                        *Authors:* {paper['authors']}

                        *Published:* {paper['published']}

                        *URL:* {paper['url']}

                        *Abstract:* {paper['abstract'][:300]}...

                        ---
                        """
            display(Markdown(paper_md))
    
    # Show all results
    display(Markdown("### ALL TOP PAPERS"))
    
    if "search_results" not in results or not results["search_results"]:
        display(Markdown("**No papers found in search results.**"))
    else:
        display(Markdown(f"**Found {len(results['search_results'])} papers total.**"))
        for i, paper in enumerate(results["search_results"][:5]):
            paper_md = f"""
            **{i+1}. {paper['title']}**

            *Authors:* {paper['authors']}

            *Published:* {paper['published']}

            *URL:* {paper['url']}

            *Abstract:* {paper['abstract'][:300]}...

            ---
            """
            display(Markdown(paper_md))

In [34]:
# ============= Main Execution =============

if __name__ == "__main__":
    query = input("Enter your research query: ")
    results = research_arxiv_langgraph(query)
    display_langgraph_results(results)

EXPANSION NODE - Output expanded query: Vision Transformers (ViTs) represent a groundbreaking shift in computer vision, applying the Transfo...
SEARCH NODE - Input state keys: ['query', 'expanded_query', 'context', 'messages', 'search_results', 'analysis', 'embedding_results']
Detected domain terms: ['Vision Transformer', 'ViT', 'image patches', 'self-attention', 'transformer encoder', 'multi-head attention', 'computer vision']
Clean search query: "what are vision transformers?" OR ("Vision Transformer" OR "ViT" OR "image patches" OR "self-attention" OR "transformer encoder" OR "multi-head attention" OR "computer vision")
Found 20 papers via API search


Insert of existing embedding ID: 2312.03568v1
Add of existing embedding ID: 2312.03568v1


Added paper to vector DB: DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization


Insert of existing embedding ID: 2207.11971v2
Add of existing embedding ID: 2207.11971v2


Added paper to vector DB: Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer


Insert of existing embedding ID: 2206.00481v2
Add of existing embedding ID: 2206.00481v2


Added paper to vector DB: Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer


Insert of existing embedding ID: 2205.12041v1
Add of existing embedding ID: 2205.12041v1


Added paper to vector DB: Privacy-Preserving Image Classification Using Vision Transformer


Insert of existing embedding ID: 2108.01684v1
Add of existing embedding ID: 2108.01684v1


Added paper to vector DB: Vision Transformer with Progressive Sampling


Insert of existing embedding ID: 2406.12944v1
Add of existing embedding ID: 2406.12944v1


Added paper to vector DB: Semantic Graph Consistency: Going Beyond Patches for Regularizing Self-Supervised Vision Transformers


Insert of existing embedding ID: 2203.01587v3
Add of existing embedding ID: 2203.01587v3


Added paper to vector DB: Multi-Tailed Vision Transformer for Efficient Inference


Insert of existing embedding ID: 2306.02095v1
Add of existing embedding ID: 2306.02095v1


Added paper to vector DB: Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers


Insert of existing embedding ID: 2309.08035v1
Add of existing embedding ID: 2309.08035v1


Added paper to vector DB: Interpretability-Aware Vision Transformer


Insert of existing embedding ID: 2501.16227v1
Add of existing embedding ID: 2501.16227v1


Added paper to vector DB: PDC-ViT : Source Camera Identification using Pixel Difference Convolution and Vision Transformer


Insert of existing embedding ID: 2406.18051v1
Add of existing embedding ID: 2406.18051v1


Added paper to vector DB: ViT-1.58b: Mobile Vision Transformers in the 1-bit Era


Insert of existing embedding ID: 2211.06726v2
Add of existing embedding ID: 2211.06726v2


Added paper to vector DB: MultiCrossViT: Multimodal Vision Transformer for Schizophrenia Prediction using Structural MRI and Functional Network Connectivity Data


Insert of existing embedding ID: 2205.14949v1
Add of existing embedding ID: 2205.14949v1


Added paper to vector DB: HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling


Insert of existing embedding ID: 2205.09995v1
Add of existing embedding ID: 2205.09995v1


Added paper to vector DB: Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning


Insert of existing embedding ID: 2203.08566v1
Add of existing embedding ID: 2203.08566v1


Added paper to vector DB: EDTER: Edge Detection with Transformer


Insert of existing embedding ID: 2204.10485v1
Add of existing embedding ID: 2204.10485v1


Added paper to vector DB: Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network


Insert of existing embedding ID: 2110.14731v3
Add of existing embedding ID: 2110.14731v3


Added paper to vector DB: Vision Transformer for Classification of Breast Ultrasound Images


Insert of existing embedding ID: 2203.11987v2
Add of existing embedding ID: 2203.11987v2


Added paper to vector DB: PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers


Insert of existing embedding ID: 2202.01884v1
Add of existing embedding ID: 2202.01884v1


Added paper to vector DB: Research on Patch Attentive Neural Process


Insert of existing embedding ID: 2407.19394v4
Add of existing embedding ID: 2407.19394v4


Added paper to vector DB: Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets
Processed 20 papers for state
Found 5 papers via semantic search
SEARCH NODE - Output state keys: ['search_results', 'embedding_results']
Final state keys: ['query', 'expanded_query', 'context', 'messages', 'search_results', 'analysis', 'embedding_results']


### QUERY EXPANSION

Vision Transformers (ViTs) represent a groundbreaking shift in computer vision, applying the Transformer architecture, originally designed for natural language processing, to image recognition and related tasks.  Understanding ViTs involves exploring several key concepts:

* **The Transformer Architecture:**  This foundation relies on self-attention mechanisms, allowing the model to weigh the importance of different parts of an input sequence (in NLP, words; in vision, image patches) in relation to each other.  Key components include:
    * **Self-Attention:** The core mechanism enabling the model to relate different parts of the input.  This involves calculating attention weights based on the relationships between different patches, effectively allowing the model to focus on relevant parts of the image.
    * **Multi-Head Self-Attention:**  Employing multiple self-attention mechanisms operating in parallel, each focusing on different aspects of the input, leading to a richer representation.
    * **Positional Encoding:** Since Transformers inherently lack sequential information, positional encodings are added to the input embeddings to represent the location of each patch within the image.
    * **Encoder-Decoder Structure (in some ViTs):** While original ViTs used only the encoder part, some variants incorporate a decoder for tasks like image generation and segmentation.

* **Image Patch Embeddings:** ViTs divide images into smaller patches, which are then flattened and linearly projected into embedding vectors.  These embeddings serve as the input to the Transformer encoder.  The patch size is a crucial hyperparameter affecting model performance and computational cost.

* **Comparison to Convolutional Neural Networks (CNNs):** Traditionally, CNNs dominated computer vision.  ViTs offer distinct advantages, including:
    * **Global Receptive Field:** ViTs can capture long-range dependencies within the image from the initial layers, unlike CNNs which require stacking multiple convolutional layers to achieve a larger receptive field.
    * **Scalability:** Transformers scale well with increasing data and model size, leading to improved performance on large datasets.

* **Variants and Applications of Vision Transformers:** Beyond the basic ViT architecture, numerous variants have been developed, addressing limitations and extending capabilities:
    * **Hybrid Architectures:** Combining CNNs and Transformers to leverage the strengths of both approaches.
    * **Hierarchical Transformers:**  Processing images at multiple scales to capture both local and global features more effectively.
    * **Vision Transformer based Detection and Segmentation:** Adapting ViTs for object detection, semantic segmentation, and instance segmentation tasks.
    * **Self-Supervised Learning with ViTs:**  Leveraging large amounts of unlabeled data to pre-train ViTs for improved performance.

* **Challenges and Future Directions:**
    * **Computational Cost:** ViTs can be computationally expensive, especially for high-resolution images.  Research focuses on reducing this cost through techniques like efficient attention mechanisms.
    * **Data Efficiency:**  While ViTs excel with large datasets, improving their performance on smaller datasets remains a challenge.
    * **Interpretability:**  Understanding the internal workings of ViTs and the reasoning behind their predictions is an active area of research.


By exploring these concepts and subtopics, one can gain a comprehensive understanding of Vision Transformers, their strengths and limitations, and their potential impact on the future of computer vision.

### RESEARCH ANALYSIS

These papers explore various enhancements and applications of Vision Transformers (ViTs).  "Multi-Tailed Vision Transformer" focuses on **efficient inference** by using multiple "tails" with shared parameters, aiming to reduce computational costs during deployment. "Where are my Neighbors?" tackles the challenge of **training ViTs on smaller datasets** by leveraging patch relationships through self-supervision, mitigating the reliance on massive data.  "PaCa-ViT" addresses the **semantic gap between image patches and tokens** by introducing patch-to-cluster attention, potentially improving representation learning. "Mask-guided Vision Transformer" applies ViTs to **few-shot learning**, leveraging masks for guidance with limited labeled data. Finally, "Vision Transformer with Progressive Sampling" enhances **efficiency during training and inference** by progressively sampling patches, reducing the computational burden of attending to all patches at once. Key themes emerging from these papers include efficiency (both during training and inference), handling smaller datasets, and adapting ViTs to specific tasks like few-shot learning.  They address the limitations of standard ViTs related to computational cost and data requirements.


### TOP PAPERS (SEMANTIC SEARCH)

**Found 5 papers via semantic search.**


                        **1. Multi-Tailed Vision Transformer for Efficient Inference**

                        *Authors:* Yunke Wang, Bo Du, Wenyuan Wang, Chang Xu

                        *Published:* 2022-03-03

                        *URL:* http://arxiv.org/abs/2203.01587v3

                        *Abstract:* Recently, Vision Transformer (ViT) has achieved promising performance in
image recognition and gradually serves as a powerful backbone in various vision
tasks. To satisfy the sequential input of Transformer, the tail of ViT first
splits each image into a sequence of visual tokens with a fixed length...

                        ---
                        


                        **2. Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer**

                        *Authors:* Guglielmo Camporese, Elena Izzo, Lamberto Ballan

                        *Published:* 2022-06-01

                        *URL:* http://arxiv.org/abs/2206.00481v2

                        *Abstract:* Vision Transformers (ViTs) enabled the use of the transformer architecture on
vision tasks showing impressive performances when trained on big datasets.
However, on relatively small datasets, ViTs are less accurate given their lack
of inductive bias. To this end, we propose a simple but still effect...

                        ---
                        


                        **3. PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers**

                        *Authors:* Ryan Grainger, Thomas Paniagua, Xi Song, Naresh Cuntoor, Mun Wai Lee, Tianfu Wu

                        *Published:* 2022-03-22

                        *URL:* http://arxiv.org/abs/2203.11987v2

                        *Abstract:* Vision Transformers (ViTs) are built on the assumption of treating image
patches as ``visual tokens" and learn patch-to-patch attention. The patch
embedding based tokenizer has a semantic gap with respect to its counterpart,
the textual tokenizer. The patch-to-patch attention suffers from the quadra...

                        ---
                        

### ALL TOP PAPERS

**Found 10 papers total.**


            **1. Multi-Tailed Vision Transformer for Efficient Inference**

            *Authors:* Yunke Wang, Bo Du, Wenyuan Wang, Chang Xu

            *Published:* 2022-03-03

            *URL:* http://arxiv.org/abs/2203.01587v3

            *Abstract:* Recently, Vision Transformer (ViT) has achieved promising performance in
image recognition and gradually serves as a powerful backbone in various vision
tasks. To satisfy the sequential input of Transformer, the tail of ViT first
splits each image into a sequence of visual tokens with a fixed length...

            ---
            


            **2. Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer**

            *Authors:* Guglielmo Camporese, Elena Izzo, Lamberto Ballan

            *Published:* 2022-06-01

            *URL:* http://arxiv.org/abs/2206.00481v2

            *Abstract:* Vision Transformers (ViTs) enabled the use of the transformer architecture on
vision tasks showing impressive performances when trained on big datasets.
However, on relatively small datasets, ViTs are less accurate given their lack
of inductive bias. To this end, we propose a simple but still effect...

            ---
            


            **3. PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers**

            *Authors:* Ryan Grainger, Thomas Paniagua, Xi Song, Naresh Cuntoor, Mun Wai Lee, Tianfu Wu

            *Published:* 2022-03-22

            *URL:* http://arxiv.org/abs/2203.11987v2

            *Abstract:* Vision Transformers (ViTs) are built on the assumption of treating image
patches as ``visual tokens" and learn patch-to-patch attention. The patch
embedding based tokenizer has a semantic gap with respect to its counterpart,
the textual tokenizer. The patch-to-patch attention suffers from the quadra...

            ---
            


            **4. Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning**

            *Authors:* Yuzhong Chen, Zhenxiang Xiao, Lin Zhao, Lu Zhang, Haixing Dai, David Weizhong Liu, Zihao Wu, Changhe Li, Tuo Zhang, Changying Li, Dajiang Zhu, Tianming Liu, Xi Jiang

            *Published:* 2022-05-20

            *URL:* http://arxiv.org/abs/2205.09995v1

            *Abstract:* Learning with little data is challenging but often inevitable in various
application scenarios where the labeled data is limited and costly. Recently,
few-shot learning (FSL) gained increasing attention because of its
generalizability of prior knowledge to new tasks that contain only a few
samples. ...

            ---
            


            **5. Vision Transformer with Progressive Sampling**

            *Authors:* Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin

            *Published:* 2021-08-03

            *URL:* http://arxiv.org/abs/2108.01684v1

            *Abstract:* Transformers with powerful global relation modeling abilities have been
introduced to fundamental computer vision tasks recently. As a typical example,
the Vision Transformer (ViT) directly applies a pure transformer architecture
on image classification, by simply splitting images into tokens with a...

            ---
            

In [None]:
what are vision transformers?