An automated system for reviewing research papers using DeepSeek R1 with context from eminent researchers in the field.
- Download papers from ArXiv or direct URLs
- Advanced PDF text extraction using multiple methods (PyPDF2, PyMuPDF, pdfplumber)
- Extract sections and structure from scientific papers
- Find eminent researchers in the paper's field using Semantic Scholar
- Retrieve context papers from these researchers
- Integrate with PubMed to find additional relevant papers
- Generate critical reviews using Google AI (Gemini), focusing on errors, inaccuracies, missing context, and citations
-
Clone this repository:
git clone https://github.com/yourusername/roastchip.git cd roastchip -
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -e . -
Create a
.envfile with your API keys:cp .env.example .envThen edit the
.envfile to add your API keys and other configuration:# Google Gemini API key (required if not using OpenRouter) GOOGLE_AI_API_KEY=your_gemini_api_key_here GOOGLE_AI_API_BASE=https://generativelanguage.googleapis.com/v1beta GOOGLE_AI_MODEL=gemini-1.5-pro # OpenRouter API key (required if using OpenRouter) OPENROUTER_API_KEY=your_openrouter_api_key_here OPENROUTER_API_BASE=https://openrouter.ai/api/v1 OPENROUTER_MODEL=google/gemini-1.5-pro # Can also use google/gemini-2.5-pro, openai/gpt-4o, anthropic/claude-3-sonnet, etc. # Email for PubMed API access (required for PubMed integration) PUBMED_EMAIL=your_email@example.com # Semantic Scholar API key (optional, but recommended for higher rate limits) # Get your API key from https://www.semanticscholar.org/product/api SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key_here # Configuration MAX_PAPERS_PER_RESEARCHER=5 MAX_RESEARCHERS=3 DOWNLOAD_DIR=downloads REVIEWS_DIR=reviews -
Getting API keys:
Google Gemini API key:
- Go to Google AI Studio
- Sign in with your Google account
- Click on "Get API key" in the top right corner
- Create a new API key or use an existing one
- Copy the API key and paste it in your
.envfile asGOOGLE_AI_API_KEY
OpenRouter API key:
- Go to OpenRouter
- Sign in or create an account
- Navigate to the API Keys section
- Create a new API key
- Copy the API key and paste it in your
.envfile asOPENROUTER_API_KEY
Semantic Scholar API key:
- Go to Semantic Scholar API
- Sign up for an API key
- Copy the API key and paste it in your
.envfile asSEMANTIC_SCHOLAR_API_KEY
Note: The system is configured to use Gemini 1.5 Pro by default, but you can specify other models like Gemini 2.5 Pro, GPT-4o, or Claude 3 Sonnet when using OpenRouter.
python -m paper_reviewer.main review 2101.12345 --output review.json
python -m paper_reviewer.main review-url https://example.com/paper.pdf --field cs.AI --output review.json
python -m paper_reviewer.main test --output review.json
python -m paper_reviewer.main process-all --field cs.AI
usage: main.py [-h] {review,review-url,test,process-all} ...
Automatically review research papers using Google AI with context from eminent researchers.
positional arguments:
{review,review-url,test,process-all} Command to execute
review Review a paper from ArXiv
review-url Review a paper from a URL
test Run a test review on a predefined paper
process-all Process all PDFs in the downloads directory
options:
-h, --help show this help message and exit
usage: main.py review [-h] [--use-rules] [--use-scholar] [--max-context-papers MAX_CONTEXT_PAPERS] [--max-researchers MAX_RESEARCHERS] [--no-pubmed] [--output OUTPUT] paper_id
positional arguments:
paper_id ArXiv paper ID (e.g., 2101.12345)
options:
-h, --help show this help message and exit
--use-rules Use reviewer rules from AIreviewer_rules.txt
--use-scholar Include Semantic Scholar results in the prompt
--max-context-papers MAX_CONTEXT_PAPERS
Maximum number of context papers per researcher
--max-researchers MAX_RESEARCHERS
Maximum number of researchers to consider
--no-pubmed Disable PubMed integration
--output OUTPUT Output file path (JSON format)
usage: main.py review-url [-h] [--use-rules] [--use-scholar] [--paper-id PAPER_ID] [--field FIELD] [--max-context-papers MAX_CONTEXT_PAPERS] [--max-researchers MAX_RESEARCHERS] [--no-pubmed] [--output OUTPUT] pdf_url
positional arguments:
pdf_url URL to the PDF file
options:
-h, --help show this help message and exit
--use-rules Use reviewer rules from AIreviewer_rules.txt
--use-scholar Include Semantic Scholar results in the prompt
--paper-id PAPER_ID Optional paper ID (will be generated if not provided)
--field FIELD Research field of the paper (e.g., cs.AI, physics.optics)
--max-context-papers MAX_CONTEXT_PAPERS
Maximum number of context papers per researcher
--max-researchers MAX_RESEARCHERS
Maximum number of researchers to consider
--no-pubmed Disable PubMed integration
--output OUTPUT Output file path (JSON format)
usage: main.py test [-h] [--use-rules] [--use-scholar] [--output OUTPUT]
options:
-h, --help show this help message and exit
--use-rules Use reviewer rules from AIreviewer_rules.txt
--use-scholar Include Semantic Scholar results in the prompt
--output OUTPUT Output file path (JSON format)
usage: main.py process-all [-h] [--use-rules] [--use-scholar] [--field FIELD] [--max-context-papers MAX_CONTEXT_PAPERS] [--max-researchers MAX_RESEARCHERS] [--no-pubmed]
options:
-h, --help show this help message and exit
--use-rules Use reviewer rules from AIreviewer_rules.txt
--use-scholar Include Semantic Scholar results in the prompt
--field FIELD Default research field for papers without ArXiv IDs
--max-context-papers MAX_CONTEXT_PAPERS
Maximum number of context papers per researcher
--max-researchers MAX_RESEARCHERS
Maximum number of researchers to consider
--no-pubmed Disable PubMed integration
python -m paper_reviewer.main review 2101.12345
This will:
- Download the paper with ID 2101.12345 from ArXiv
- Extract text from the PDF using the best available method
- Find top researchers in the paper's field using Semantic Scholar
- Retrieve context papers from these researchers
- Find related papers in PubMed
- Generate a critical review using Google AI (Gemini)
- Print the review to the console
python -m paper_reviewer.main test
This will run a review on the paper "Gene Set Summarization using Large Language Models" (ArXiv ID: 2305.13338).
python -m paper_reviewer.main test --use-rules --use-scholar
When using the --use-rules option:
- The system will include the reviewer rules from
AIreviewer_rules.txtin the prompt to the AI model - The output filenames will indicate whether rules were used with
_rulesyes_or_rulesno_in the filename - The review structure and content will follow the guidelines specified in the rules file
- The full prompt will be saved to a file with the same prefix as the review file but with an additional
_promptsuffix - The prompt structure will also be saved as a JSON file with the same prefix as the review file but with an additional
_prompt.jsonsuffix, following the LinkML schema defined insrc/paper_reviewer/models/prompt_schema.linkml.yaml
When using the --use-scholar option:
- The system will include Semantic Scholar results in the prompt to the AI model
- The output filenames will indicate whether Semantic Scholar was used with
_scholaryes_or_scholarno_in the filename - The review will include insights from related papers by eminent researchers in the field
You can use both options together to get the benefits of both structured reviews and context from related papers.
python -m paper_reviewer.main process-all --use-rules --use-scholar
This will:
- Find all PDF files in the
downloadsdirectory - Process each PDF file to extract text
- Generate reviews for each PDF using Google AI (Gemini)
- Save the reviews and prompts to the
reviewsdirectory
The system will automatically detect ArXiv papers based on their filenames and use appropriate field settings for them. For non-ArXiv papers, it will use the field specified with the --field option (default: cs.AI).
The project includes test scripts in the src/tests directory that use the already extracted text from the raw_text directory, avoiding the need to extract text from PDFs.
# Process all raw text files with reviewer rules
python -m tests.run_tests raw-text --use-rules
# Process all raw text files with Semantic Scholar context
python -m tests.run_tests raw-text --use-scholar
# Process all raw text files with both rules and Semantic Scholar
python -m tests.run_tests raw-text --use-rules --use-scholar# Process a specific file with reviewer rules
python -m tests.run_tests raw-text --file raw_text/ISMEJ-D-23-00112.txt --use-rules
# Process a specific file with a custom output directory
python -m tests.run_tests raw-text --file raw_text/ISMEJ-D-23-00112.txt --output-dir custom_reviews# Generate a review with reviewer rules
python -m tests.test_single_file raw_text/ISMEJ-D-23-00112.txt --use-rules
# Generate a review with Semantic Scholar context
python -m tests.test_single_file raw_text/ISMEJ-D-23-00112.txt --use-scholar
# Generate a review with a specific model
python -m tests.test_single_file raw_text/ISMEJ-D-23-00112.txt --use-rules --model "openai/gpt-4o"# Run pytest tests (mocked API calls)
python -m pytest src/tests/test_all_llms_with_schema.py
# Run pytest tests with verbose output
python -m pytest -v src/tests/test_all_llms_with_schema.py
# Run actual tests with real API calls for all models
python -m src.tests.test_all_llms_with_schema
# Run actual tests with Gemini and reviewer rules
python -m src.tests.test_all_llms_with_schema --model gemini --use-rules
# Run actual tests with Claude
python -m src.tests.test_all_llms_with_schema --model claude
# Run actual tests with GPT-4o and a specific file
python -m src.tests.test_all_llms_with_schema --model gpt4o --file raw_text/ISMEJ-D-23-00112.txtThe tests with real API calls will generate the following files in the test_reviews directory:
- Text review files
- JSON review files
- Text prompt files
- JSON prompt files (following the LinkML schema)
See the tests README for more information.
The project includes an evaluation module in the src/evaluation directory that can be used to compare reviews generated with and without rules.
python -m evaluation.compare_specific_reviews --rules-yes <path_to_rules_yes_review> --rules-no <path_to_rules_no_review>python -m evaluation.compare_all_reviews --reviews-dir reviews --output-dir evaluation_results --timestamp <timestamp>python -m evaluation.manual_compare --rules-yes <path_to_rules_yes_review> --rules-no <path_to_rules_no_review>The evaluation module compares reviews based on several metrics:
- Basic Metrics: Sentence count, word count, average sentence length
- Structure Metrics: Section count, bullet point count
- Complexity Metrics: Lexical diversity, content word ratio
- Unique Content Analysis: Identifies unique sentences in each review
The evaluation results are saved in the following formats:
- JSON Files: Detailed comparison results for each paper
- Summary JSON: Overall results across all papers
- HTML Report: User-friendly visualization of the evaluation results
The system uses a structured prompt format defined by a LinkML schema. The prompt is saved in both text and JSON formats:
- Text Format: Human-readable prompt with clear section delimiters
- JSON Format: Structured representation of the prompt following the LinkML schema
Example of the prompt JSON structure:
{
"introduction": {
"section_type": "INTRODUCTION",
"section_title": "REVIEWER ROLE",
"section_content": "You are a critical academic reviewer with expertise in analyzing research papers.",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"reviewer_rules": {
"section_type": "REVIEWER_RULES",
"section_title": "REVIEWER RULES",
"section_content": "1. Ethics & Integrity\n...\n\n2. Review Structure\n...\n\n3. Content‑Level Expectations\n...",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"task_description": {
"section_type": "TASK_DESCRIPTION",
"section_title": "REVIEW TASK",
"section_content": "Your task is to provide a thorough, critical review of the following paper...",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"paper_content": {
"section_type": "PAPER_CONTENT",
"section_title": "PAPER TO REVIEW",
"section_content": "...[paper content]...",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"context_papers": {
"section_type": "CONTEXT_PAPERS",
"section_title": "CONTEXT PAPERS",
"section_content": "To help with your review, here are relevant papers from eminent researchers...",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"review_instructions": {
"section_type": "REVIEW_INSTRUCTIONS",
"section_title": "REVIEW INSTRUCTIONS",
"section_content": "Please provide a comprehensive review that:\n1. Identifies any factual errors...",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"metadata": {
"model_name": "openai/gpt-4o",
"use_rules": true,
"context_papers_count": 5,
"semantic_scholar_papers_count": 3,
"pubmed_papers_count": 2
}
}
## Pipeline Runner
The project includes a unified pipeline runner (`src.pipeline_runner`) that provides a consistent interface to run different stages of the paper review pipeline. This tool enforces consistent output directories and simplifies the execution of the entire workflow.
### Pipeline Stages
The pipeline runner supports the following stages:
1. **PDF to Text Extraction**: Extract text from PDF files
2. **Text to Reviews Generation**: Generate reviews from raw text with various parameter combinations
3. **Reviews to Evaluation**: Compare and evaluate generated reviews
4. **Evaluation to Visualization**: Create visual reports from evaluation data
### Directory Structure
The pipeline runner enforces a consistent directory structure:
- `pdf/`: PDF files
- `raw_text/`: Extracted text from PDFs
- `reviews/`: Generated reviews
- `evaluation/`: Evaluation results
- `evaluation_viz/`: Visualization of evaluation results
### Usage
#### Extract Text from PDFs
```bash
python -m src.pipeline_runner extract [options]Options:
--pdf-dir: Directory containing PDF files (default: "pdf")--output-dir: Directory to save extracted text (default: "raw_text")--limit: Maximum number of PDFs to process
python -m src.pipeline_runner review [options]Options:
--raw-text-dir: Directory containing raw text files (default: "raw_text")--reviews-dir: Directory to save reviews (default: "reviews")--use-rules: Use reviewer rules--use-scholar: Include Semantic Scholar results--use-pubmed: Include PubMed results--timestamp: Timestamp to use in filenames--delay: Delay in seconds between API calls (default: 30)--model: LLM model name to use (e.g., 'gemini-1.5-pro', 'google/gemini-2.5-pro', 'openai/gpt-4o', 'anthropic/claude-3-sonnet'). Use 'all' to run all three main models.--use-openrouter: Use OpenRouter API instead of Google AI
python -m src.pipeline_runner all-reviews [options]This command generates reviews with all combinations of parameters (with/without rules, with/without scholar, with/without pubmed) using a consistent timestamp for easy comparison.
Options:
--raw-text-dir: Directory containing raw text files (default: "raw_text")--reviews-dir: Directory to save reviews (default: "reviews")--timestamp: Timestamp to use in filenames--delay: Delay in seconds between API calls (default: 60)--model: LLM model name to use. Use 'all' to run all three main models (Gemini 2.5 Pro, GPT-4o, and Claude 3 Sonnet)
python -m src.pipeline_runner evaluate [options]Options:
--reviews-dir: Directory containing review files (default: "reviews")--output-dir: Directory to save evaluation results (default: "evaluation")--timestamp: Timestamp to filter reviews (required)
python -m src.pipeline_runner visualize [options]Options:
--summary-file: Path to the evaluation summary JSON file (required)--output-dir: Directory to save visualization results
python -m src.pipeline_runner full [options]This command runs the entire pipeline from PDF extraction to visualization in one go.
Options:
--pdf-dir: Directory containing PDF files (default: "pdf")--raw-text-dir: Directory to save extracted text (default: "raw_text")--reviews-dir: Directory to save reviews (default: "reviews")--evaluation-dir: Directory to save evaluation results (default: "evaluation")--evaluation-viz-dir: Directory to save visualization results (default: "evaluation_viz")--limit: Maximum number of PDFs to process--delay: Delay in seconds between API calls (default: 60)--model: LLM model name to use. Use 'all' to run all three main models (Gemini 2.5 Pro, GPT-4o, and Claude 3 Sonnet)
# Extract text from all PDFs in the pdf/ directory
python -m src.pipeline_runner extract --pdf-dir pdf --output-dir raw_text
# Extract text from only the first 3 PDFs
python -m src.pipeline_runner extract --pdf-dir pdf --output-dir raw_text --limit 3
# Extract text from a specific PDF
cp path/to/your/paper.pdf pdf/
python -m src.pipeline_runner extract --pdf-dir pdf --output-dir raw_text# Generate reviews with rules and scholar for all papers in raw_text/
python -m src.pipeline_runner review --use-rules --use-scholar --use-pubmed
# Generate reviews with a specific model (Gemini 2.5 Pro)
python -m src.pipeline_runner review --use-rules --model "google/gemini-2.5-pro" --use-openrouter
# Generate reviews with OpenAI GPT-4o and include context from Semantic Scholar
python -m src.pipeline_runner review --use-rules --use-scholar --model "openai/gpt-4o" --use-openrouter
# Generate reviews with Claude 3 Sonnet with a longer delay between API calls
python -m src.pipeline_runner review --model "anthropic/claude-3-sonnet" --use-openrouter --delay 60
# Generate reviews with all three main models (Gemini 2.5 Pro, GPT-4o, and Claude 3 Sonnet)
python -m src.pipeline_runner review --use-rules --model all --use-openrouter --delay 60
# Generate reviews with a specific timestamp (useful for grouping related reviews)
python -m src.pipeline_runner review --use-rules --model "openai/gpt-4o" --use-openrouter --timestamp "20250504_120000"This will generate the following files for each paper in the raw_text/ directory:
- A text review file (e.g.,
ISMEJ-D-23-00112_review_20250503_204228_rulesyes_scholarno_pubmedno_openai_gpt_4o.txt) - A JSON review file (e.g.,
ISMEJ-D-23-00112_review_20250503_204228_rulesyes_scholarno_pubmedno_openai_gpt_4o.json) - A text prompt file (e.g.,
ISMEJ-D-23-00112_review_20250503_204228_rulesyes_scholarno_pubmedno_openai_gpt_4o_prompt.txt) - A JSON prompt file (e.g.,
ISMEJ-D-23-00112_review_20250503_204228_rulesyes_scholarno_pubmedno_openai_gpt_4o_prompt.json)
The JSON prompt file follows the LinkML schema defined in src/paper_reviewer/models/prompt_schema.linkml.yaml and contains structured sections for the prompt. Here's an example of the JSON structure:
{
"introduction": {
"section_type": "INTRODUCTION",
"section_title": "REVIEWER ROLE",
"section_content": "You are a critical academic reviewer with expertise in analyzing research papers.",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"reviewer_rules": {
"section_type": "REVIEWER_RULES",
"section_title": "REVIEWER RULES",
"section_content": "1. Ethics & Integrity\n...\n\n2. Review Structure\n...\n\n3. Content‑Level Expectations\n...",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"task_description": {
"section_type": "TASK_DESCRIPTION",
"section_title": "REVIEW TASK",
"section_content": "Your task is to provide a thorough, critical review of the following paper...",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"paper_content": {
"section_type": "PAPER_CONTENT",
"section_title": "PAPER TO REVIEW",
"section_content": "...[paper content]...",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"context_papers": {
"section_type": "CONTEXT_PAPERS",
"section_title": "CONTEXT PAPERS",
"section_content": "To help with your review, here are relevant papers from eminent researchers...",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"review_instructions": {
"section_type": "REVIEW_INSTRUCTIONS",
"section_title": "REVIEW INSTRUCTIONS",
"section_content": "Please provide a comprehensive review that:\n1. Identifies any factual errors...",
"section_delimiter_start": "===",
"section_delimiter_end": "==="
},
"metadata": {
"model_name": "openai/gpt-4o",
"use_rules": true,
"context_papers_count": 5,
"semantic_scholar_papers_count": 3,
"pubmed_papers_count": 2
}
}
#### Generate All Review Combinations
```bash
# Generate all combinations with default settings
python -m src.pipeline_runner all-reviews --delay 60
# Generate all combinations with a specific model
python -m src.pipeline_runner all-reviews --model "openai/gpt-4o" --delay 60
# Generate all combinations with all three main models (Gemini 2.5 Pro, GPT-4o, and Claude 3 Sonnet)
python -m src.pipeline_runner all-reviews --model all --delay 60
# Generate all combinations with a specific timestamp
python -m src.pipeline_runner all-reviews --timestamp "20250504_120000" --delay 60This will generate reviews with all combinations of parameters:
- Without rules, without scholar, without pubmed
- With rules, without scholar, without pubmed
- Without rules, with scholar, with pubmed
- With rules, with scholar, with pubmed
All reviews will have the same timestamp for easy comparison. For each combination, the following files will be generated:
- Text review files
- JSON review files
- Text prompt files
- JSON prompt files (following the LinkML schema)
# Evaluate reviews with a specific timestamp
python -m src.pipeline_runner evaluate --timestamp 20250504_120000
# Evaluate reviews with custom directories
python -m src.pipeline_runner evaluate --reviews-dir custom_reviews --output-dir custom_evaluation --timestamp 20250504_120000This will generate evaluation files in the evaluation/ directory:
- Individual evaluation JSON files for each paper
- A summary JSON file with overall metrics
- Comparison data between different review configurations
# Visualize evaluation results from a summary file
python -m src.pipeline_runner visualize --summary-file evaluation/evaluation_summary_20250504_120000.json
# Visualize with a custom output directory
python -m src.pipeline_runner visualize --summary-file evaluation/evaluation_summary_20250504_120000.json --output-dir custom_vizThis will generate HTML visualization files in the evaluation_viz/ directory with:
- Bar plots comparing metrics across different models and configurations
- Detailed comparisons of review content
- Visualizations of unique and common content between reviews
# Run the full pipeline with default settings
python -m src.pipeline_runner full --limit 5
# Run the full pipeline with a specific model
python -m src.pipeline_runner full --model "openai/gpt-4o" --limit 3
# Run the full pipeline with all three main models (Gemini 2.5 Pro, GPT-4o, and Claude 3 Sonnet)
python -m src.pipeline_runner full --model all --limit 3 --delay 60
# Run the full pipeline with custom directories
python -m src.pipeline_runner full --pdf-dir custom_pdf --raw-text-dir custom_text --reviews-dir custom_reviewsThis will:
- Extract text from PDFs (limited by the --limit parameter if provided)
- Generate all review combinations
- Evaluate the reviews
- Create visualization reports
All outputs will use consistent timestamps throughout the pipeline for easy tracking.
The project includes a script to run the full pipeline with multiple models in sequence. This is useful for comparing the performance of different LLM models on the same set of papers.
# Run the pipeline with all three models (Claude, GPT-4o, and Gemini)
python run_pipeline_with_models.py
# Run the pipeline with a specific set of models
python run_pipeline_with_models.py --models "anthropic/claude-3.7-sonnet:thinking" "openai/gpt-4o"
# Run the pipeline with custom directories
python run_pipeline_with_models.py --raw-text-dir custom_text --reviews-dir custom_reviewsThis script will:
- Run the pipeline for each model with all parameter combinations (with/without rules, with/without scholar & pubmed)
- Use a consistent timestamp across all models for easy comparison
- Generate evaluation and visualization reports comparing the performance of different models
- Use a shorter delay (1 second) between OpenRouter API calls to speed up processing
Note: Before running this script, make sure you have installed the package in development mode with pip install -e . to ensure all dependencies are available.