## Scientific Paper Agent using LangGraph
### Overview
This project implements an intelligent research assistant that helps users navigate, understand, and analyze scientific literature using LangGraph and advanced language models. By combining various academic API with sophisticated paper processing techniques, it creates a seamless experience for researchers, students, and professionals working with academic papers.

### Motivation 
Reduce the time people invest in R&D and reading, analyzing and synthesizing academic papers. 
Key Challenes include - 
* Extensive Time commitment 
* Inefficient search processes accross fragmented database ecosystems
* Complex task synthesizingand connecting findings across multiple papers
* Stay current with new publications

## Key Components
1. State-Driven Workflow Engine
* StateGraph Architecture: Five-node system for orchestraded research.
* Decision making node: Query intent analysis and routing.
* Planning Node: Research Stategy formulation.
* Tool Execution Node: Paper Retrievel and processing.
* Judge Node: Quality validation and improvement cycles.

2. Paper Processing Integration
* Source Integration, CORE API for Comprehensive paper access.
* Document Processing, PDF content extraction, Text Structure preservation.
3. Analysis Workflow
* State-aware processing pipeline
* Multi-step validation gates
* Quality-focused improvement cycles.
* Human-in-the-loop validation options.


### Method details
#### The system requires

- Gemini API Key.
- CORE API key for paper retrieval. CORE is one of the larges online repositories for scientific papers, counting over 136 million papers, and offers a free API for personal use. A key can be requested here.
#### Technical Architecture:

- LangGraph for state orchestration.
- PDFplumber for document processing.
- Pydantic for structured data handling.

In [5]:
import io
import json
import os
import urllib3
import time

import pdfplumber
from dotenv import load_dotenv
from IPython.display import display, Markdown
from langchain_core.messages import BaseMessage, SystemMessage, ToolMessage, AIMessage
from langchain_core.tools import BaseTool, tool
from langgraph.graph import END, StateGraph
from langgraph.graph.state import CompiledStateGraph
from langgraph.graph.message import add_messages
from pydantic import BaseModel, Field
from typing import Annotated, ClassVar, Sequence, TypedDict, Optional
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

load_dotenv()
os.environ["GOOGLE_API_KEY"] = os.getenv('GOOGLE_API_KEY')
os.environ["CORE_API_KEY"] = os.getenv('CORE_API_KEY')

## Prompts
this cell contains the prompts used in the workflow

the agent_prompt  contains a section how to use complex queries with CORE API enabling agent to solve more complex tasks.

In [None]:
# Prompt for the initial decision making on how to reply to the user 
decision_making_prompt = """
You are an experienced scientific researcher.
Your goal is to help the use with their scientific research.

Based on the user query, decide if you need to perform a research or if you can answer the question directly.
- You should perform a research if the user query requires any supporting evidence or information.
- You should answer the question directly if the user query is a simple question that can be answered without any supporting evidence or information.
"""

# planning 
planning_prompt = """
You are an experienced scientific researcher.
Your goal is to make a new step by step plan to help the user with their scientific research.


