AI-powered pipeline for conducting comprehensive literature reviews across any research domain. Configure your research topic, keywords, and evaluation criteria through simple JSON files—no code changes required.
Research-Agnostic: While originally built for neuromorphic computing research, the pipeline now supports any research domain through configurable
research_config.jsonfiles. See Research Domain Configuration below.
For Codespaces, bootstrap the n8n integration with:
source ./bootstrap-n8n.shThis enables AI-assisted workflow management. See Codespace n8n Setup for details.
Launch the web dashboard for a user-friendly interface:
./run_dashboard.shThen open http://localhost:8000 in your browser to:
- Upload PDFs
- Monitor job progress in real-time
- View logs and download reports
- Retry failed jobs
See Dashboard Guide for detailed instructions.
Run the full 5-stage pipeline with a single command:
python pipeline_orchestrator.pyWith logging:
python pipeline_orchestrator.py --log-file pipeline.logWith custom configuration:
python pipeline_orchestrator.py --config pipeline_config.jsonResume from checkpoint:
python pipeline_orchestrator.py --resumeWith custom research domain:
python pipeline_orchestrator.py --research-config domains/my-domain/research_config.jsonResume from specific stage:
python pipeline_orchestrator.py --resume-from judgeBatch mode (non-interactive):
# Run without user prompts - useful for CI/CD and automated testing
python pipeline_orchestrator.py --batch-mode
# Combine with other options
python pipeline_orchestrator.py --batch-mode --log-file batch.log
python pipeline_orchestrator.py --batch-mode --resumeBatch mode defaults:
- Pillar selection: ALL analyzable pillars
- Analysis mode: ONCE (single-pass)
- User prompts: Skipped
Custom output directory:
# Use custom output directory for gap analysis results
python pipeline_orchestrator.py --output-dir reviews/my_review
# Use environment variable
export LITERATURE_REVIEW_OUTPUT_DIR=reviews/my_review
python pipeline_orchestrator.py
# Multiple reviews in separate directories
python pipeline_orchestrator.py --output-dir reviews/baseline
python pipeline_orchestrator.py --output-dir reviews/update_jan_2025Priority: CLI argument > Environment variable > Config file > Default (gap_analysis_output)
For step-by-step control, run each stage individually:
# Stage 1: Initial paper review
python Journal-Reviewer.py
# Stage 2: Judge claims
python Judge.py
# Stage 3: Deep requirements analysis (if rejections exist)
python DeepRequirementsAnalyzer.py
python Judge.py # Re-judge DRA claims
# Stage 4: Sync to database
python sync_history_to_db.py
# Stage 5: Gap analysis and convergence
python Orchestrator.py- Journal-Reviewer: Screen papers and extract claims
- Judge: Evaluate claims against requirements
- DeepRequirementsAnalyzer (DRA): Re-analyze rejected claims (conditional)
- Sync: Update CSV database from version history
- Orchestrator: Identify gaps, generate gap-closing search recommendations, and drive convergence
Create a pipeline_config.json file:
{
"version": "1.2.0",
"version_history_path": "review_version_history.json",
"output_dir": "gap_analysis_output",
"stage_timeout": 7200,
"log_level": "INFO",
"retry_policy": {
"enabled": true,
"default_max_attempts": 3,
"default_backoff_base": 2,
"default_backoff_max": 60,
"circuit_breaker_threshold": 3,
"per_stage": {
"journal_reviewer": {
"max_attempts": 5,
"backoff_base": 2,
"backoff_max": 120,
"retryable_patterns": ["timeout", "rate limit", "connection error"]
}
}
}
}Configuration Options:
output_dir: Custom output directory for gap analysis results (default:gap_analysis_output)version_history_path: Path to version history JSON filestage_timeout: Maximum time (seconds) for each stagelog_level: Logging verbosity (DEBUG, INFO, WARNING, ERROR)retry_policy: Automatic retry configuration (see below)
The pipeline automatically retries transient failures like network timeouts and rate limits:
Enable retry (default):
{
"retry_policy": {
"enabled": true,
"default_max_attempts": 3
}
}Disable retry:
{
"retry_policy": {
"enabled": false
}
}Custom retry per stage:
{
"retry_policy": {
"per_stage": {
"journal_reviewer": {
"max_attempts": 5,
"backoff_base": 2,
"backoff_max": 120
}
}
}
}Retryable errors:
- Network timeouts and connection errors
- Rate limiting (429, "too many requests")
- Service unavailable (503, 502, 504)
- Temporary failures
Non-retryable errors:
- Syntax errors, import errors
- File not found
- Permission denied (401, 403)
- Invalid configuration
Python Version: 3.12+
Pipeline:
pip install -r requirements-dev.txtWeb Dashboard:
pip install -r requirements-dashboard.txtCreate a .env file with your API key:
GEMINI_API_KEY=your_api_key_here
DASHBOARD_API_KEY=your-secure-api-key # For dashboard authentication
Literature-Review/
├── research_config.json # 🔬 Active research domain configuration
├── pillar_definitions.json # Requirements framework
├── domains/ # 🌐 Research domain configurations
│ ├── neuromorphic-computing/ # Default domain
│ ├── example-domain/ # Template for new domains
│ └── README.md # Guide for creating domains
├── docs/ # 📚 All documentation
│ ├── README.md # Documentation guide
│ ├── DASHBOARD_GUIDE.md # 🌐 Web dashboard guide
│ ├── RESEARCH_AGNOSTIC_ARCHITECTURE.md # Multi-domain support
│ ├── CONSOLIDATED_ROADMAP.md # ⭐ Master project roadmap
│ ├── architecture/ # System design & refactoring
│ ├── guides/ # Workflow & strategy guides
│ ├── status-reports/ # Progress tracking
│ └── assessments/ # Technical evaluations
├── task-cards/ # 📋 Implementation task cards
│ ├── README.md # Task cards guide
│ ├── agent/ # Agent improvement tasks
│ ├── automation/ # Reliability & error handling
│ ├── integration/ # Integration test specs
│ ├── e2e/ # End-to-end test specs
│ └── evidence-enhancement/ # Evidence quality features
├── reviews/ # 🔍 Review documentation
│ ├── README.md # Reviews guide
│ ├── pull-requests/ # PR assessments
│ ├── architecture/ # Design reviews
│ └── third-party/ # External audits
├── literature_review/ # 🐍 Main package code
│ ├── config/ # Research domain configuration
│ ├── analysis/ # Judge, DRA, Recommendations
│ ├── reviewers/ # Journal & Deep reviewers
│ ├── orchestrator.py # Pipeline coordination
│ └── utils/ # Shared utilities
├── webdashboard/ # 🌐 Web dashboard
│ ├── app.py # FastAPI application
│ ├── templates/ # HTML templates
│ └── static/ # CSS, JS, images
├── tests/ # 🧪 Test suite
│ ├── unit/ # Unit tests
│ ├── component/ # Component tests
│ ├── integration/ # Integration tests
│ ├── webui/ # Dashboard tests
│ └── e2e/ # End-to-end tests
└── scripts/ # 🔧 Utility scripts
Getting Started:
- docs/guides/WORKFLOW_EXECUTION_GUIDE.md - How to run the pipeline
- docs/CONSOLIDATED_ROADMAP.md ⭐ - Complete project overview
- docs/DASHBOARD_GUIDE.md - Web dashboard guide
Research Domain Configuration:
- docs/RESEARCH_AGNOSTIC_ARCHITECTURE.md - Multi-domain architecture guide
- domains/README.md - Creating new research domains
- research_config.json - Example configuration
Incremental Review Mode:
- docs/INCREMENTAL_REVIEW_USER_GUIDE.md - Complete incremental mode guide
- docs/INCREMENTAL_REVIEW_MIGRATION_GUIDE.md - Migration from previous versions
- docs/api/incremental_endpoints.yaml - REST API specification
- examples/incremental_review_examples.py - Code examples
Architecture & Design:
- docs/architecture/ARCHITECTURE_REFACTOR.md - Current repository structure
- docs/architecture/ARCHITECTURE_ANALYSIS.md - System architecture
Testing & Status:
- docs/status-reports/TESTING_STATUS_SUMMARY.md - Test coverage
- docs/TEST_MODIFICATIONS.md - Enhanced test specifications
Task Planning:
- task-cards/README.md - All implementation tasks (23 cards)
- task-cards/evidence-enhancement/ - Evidence quality features
See docs/README.md for complete documentation index.
- ✅ Automated Execution: Runs all 5 stages sequentially
- ✅ Conditional DRA: Only runs when rejections are detected
- ✅ Progress Logging: Timestamps and status for each stage
- ✅ Error Handling: Halts on failure with clear error messages
- ✅ Configurable: Customizable timeouts and paths
- ✅ Checkpoint/Resume: Resume from interruption points
- ✅ Automatic Retry: Retry transient failures with exponential backoff
- ✅ Circuit Breaker: Prevents infinite retry loops
- ✅ Retry History: Track all retry attempts in checkpoint file
- ✅ Incremental Review Mode: Only analyze new papers, preserve previous results, 60-80% faster
- ✅ Gap-Targeted Pre-filtering: Reduce analysis time and API costs by only analyzing papers likely to close open gaps
- ✅ Research-Agnostic (NEW!): Configure any research domain via
research_config.json—no code changes required
The pipeline now supports any research domain through simple JSON configuration. No code changes required to switch between neuromorphic computing, climate science, biomedical research, or any other field.
How it works:
- Create a
research_config.jsondefining your research topic, keywords, and evaluation criteria - Optionally create a
pillar_definitions.jsonwith your requirements framework - Run the pipeline with
--research-config your_config.json
Quick Start:
# Use the default neuromorphic computing domain
python pipeline_orchestrator.py
# Use a custom research domain
python pipeline_orchestrator.py --research-config domains/climate-science/research_config.json
# Create a new domain from template
cp -r domains/example-domain domains/my-research
# Edit domains/my-research/research_config.json with your topic
python pipeline_orchestrator.py --research-config domains/my-research/research_config.jsonConfiguration File Structure:
{
"domain": {
"id": "my-research-domain",
"name": "My Research Domain"
},
"research_topic": {
"primary": "Your primary research question...",
"short_description": "brief domain focus"
},
"prompt_context": {
"researcher_role": "PhD-level research assistant specializing in..."
},
"vocabulary": {
"primary_keywords": ["keyword1", "keyword2"],
"secondary_keywords": ["technical-term1"]
},
"pillar_definitions_file": "pillar_definitions.json"
}Benefits:
- No code changes: Switch domains by changing a config file
- Reproducible: Share configs for collaborative research
- Multi-domain: Run analyses for different research areas in parallel
- Backward compatible: Existing neuromorphic workflows continue to work
See domains/README.md for complete configuration guide.
Reduce analysis time and API costs by intelligently filtering papers before deep analysis. The pre-filter extracts unfilled gaps from previous analyses and scores each paper's relevance to those gaps.
How it works:
- Extracts gaps from previous gap analysis report
- Scores each paper's relevance to gaps using keyword matching
- Skips papers below relevance threshold
- Analyzes only gap-closing papers
Usage:
# Default (50% threshold)
python pipeline_orchestrator.py --prefilter
# Aggressive mode (30% threshold, analyze more papers)
python pipeline_orchestrator.py --prefilter-mode aggressive
# Conservative mode (70% threshold, analyze fewer papers)
python pipeline_orchestrator.py --prefilter-mode conservative
# Custom threshold
python pipeline_orchestrator.py --relevance-threshold 0.65
# Disable pre-filtering
python pipeline_orchestrator.py --no-prefilterBenefits:
- Cost Savings: Typical reduction of 50-70% in papers analyzed
- Time Savings: 60-80% faster incremental runs
- API Cost Reduction: $15-30 saved per run
- Accuracy: <5% false negative rate (relevant papers rarely skipped)
Configuration:
Add to pipeline_config.json:
{
"prefilter": {
"enabled": true,
"threshold": 0.50,
"mode": "auto"
}
}Update existing reviews by adding new papers without re-analyzing the entire database. The incremental mode intelligently detects changes and only processes new or modified papers while preserving your previous analysis results.
How it works:
- Loads previous analysis - Reads existing gap report and orchestrator state
- Detects new papers - Compares database to find new or modified papers since last run
- Extracts gaps - Identifies unfilled requirements from previous analysis
- Scores relevance - Uses ML and keyword matching to predict which papers close gaps
- Pre-filters - Skips low-relevance papers (configurable threshold, default 50%)
- Analyzes - Runs deep analysis on filtered papers only
- Merges - Combines new evidence into existing report without data loss
- Tracks lineage - Records parent→child job relationship in state
Quick Start:
# 1. Run baseline analysis
python pipeline_orchestrator.py --output-dir reviews/baseline
# 2. Add new papers to data/raw/
# 3. Run incremental update (default mode)
python pipeline_orchestrator.py --output-dir reviews/baseline
# Or explicitly enable incremental mode
python pipeline_orchestrator.py --incremental --output-dir reviews/baselineUsage Examples:
# Preview what would be analyzed (dry-run)
python pipeline_orchestrator.py --incremental --dry-run
# Force full re-analysis (override incremental)
python pipeline_orchestrator.py --force
# Continue specific review with parent job tracking
python pipeline_orchestrator.py --incremental --parent-job-id review_20250115_103000
# Clear analysis cache and start fresh
python pipeline_orchestrator.py --clear-cache --forceBenefits:
- 60-80% faster - Only analyzes new, relevant papers
- Cost savings - $15-30 per incremental run vs $50+ for full analysis
- Preserves work - Builds on previous analysis without data loss
- Tracks changes - See gaps closed over time with job lineage
- Smart filtering - Automatic relevance scoring reduces wasted analysis
- Safe fallback - Automatically runs full mode if prerequisites missing
Prerequisites: Incremental mode requires:
- Previous
gap_analysis_report.jsonin output directory - Complete
orchestrator_state.json(analysis_completed: true)
If prerequisites are missing, the pipeline automatically falls back to full analysis mode.
Advanced Options:
# Combine with pre-filtering for maximum efficiency
python pipeline_orchestrator.py --incremental --prefilter-mode aggressive
# Custom relevance threshold for pre-filtering
python pipeline_orchestrator.py --incremental --relevance-threshold 0.40
# Use separate directories for comparison
python pipeline_orchestrator.py --output-dir reviews/baseline
python pipeline_orchestrator.py --output-dir reviews/update_feb_2025Configuration:
Add to pipeline_config.json:
{
"incremental": true,
"force": false,
"parent_job_id": null,
"relevance_threshold": 0.50,
"prefilter_enabled": true
}Troubleshooting:
"Incremental prerequisites not met"
- Ensure previous gap_analysis_report.json exists in output directory
- Check orchestrator_state.json shows analysis_completed: true
- Run full analysis first:
python pipeline_orchestrator.py --force
"No new papers detected"
- Verify papers were added to data/raw/ directory
- Papers must be in JSON format with proper metadata
- Use
--forceto re-analyze all papers anyway
"No changes detected - all papers are up to date"
- This is normal! No new/modified papers were found
- Add new papers or use
--forcefor full re-analysis - Clear cache with
--clear-cacheif fingerprints seem stale
The pipeline creates a pipeline_checkpoint.json file to track progress. If a pipeline fails, you can resume from the last successful stage:
# Resume from last checkpoint
python pipeline_orchestrator.py --resume
# Resume from specific stage
python pipeline_orchestrator.py --resume-from syncView checkpoint status:
cat pipeline_checkpoint.json | jq '.stages'View retry history:
cat pipeline_checkpoint.json | jq '.stages.journal_reviewer.retry_history'The pipeline automatically retries transient failures:
- Network Timeout → Retry with exponential backoff
- Rate Limit → Wait and retry with increasing delays
- Syntax Error → Fail immediately (no retry)
- Circuit Breaker → Stop after 3 consecutive failures
Example retry flow:
- Attempt 1: Fails with "Connection timeout" → Wait 2s, retry
- Attempt 2: Fails with "Rate limit" → Wait 4s, retry
- Attempt 3: Succeeds → Continue to next stage
The pipeline generates analysis results in configurable directories:
gap_analysis_output/ # Research gap analysis results (default, customizable via --output-dir)
proof_scorecard_output/ # Proof scorecard outputs (CLI)
workspace/ # Dashboard job data and results
Custom Output Directory: You can specify a custom output directory for gap analysis results:
# Via CLI argument
python pipeline_orchestrator.py --output-dir reviews/baseline
# Via environment variable
export LITERATURE_REVIEW_OUTPUT_DIR=reviews/baseline
# Via config file
{
"output_dir": "reviews/baseline"
}This enables organizing multiple review projects:
reviews/
├── baseline_2025_01/ # Initial review
├── update_2025_02/ # Monthly update
└── comparative_study/ # Comparative analysis
Note: These directories are gitignored as they contain generated artifacts. Run the pipeline to regenerate outputs locally.
Complete Output Reference:
- docs/OUTPUT_FILE_REFERENCE.md - Comprehensive list of all output files (CLI & Dashboard)
- docs/OUTPUT_MANAGEMENT_STRATEGY.md - Git policy and rationale
- docs/DASHBOARD_CLI_PARITY.md - Feature comparison
Typical Output Structure:
gap_analysis_output/
├── gap_analysis_report.json # Master analysis report
├── executive_summary.md # Human-readable summary
├── waterfall_Pillar_1-7.html # Pillar visualizations (7 files)
├── _OVERALL_Research_Gap_Radar.html # Overall radar chart
├── _Paper_Network.html # Paper network graph
├── _Research_Trends.html # Trend analysis
├── proof_chain.html/json # Evidence proof chains
├── sufficiency_matrix.html/json # Evidence sufficiency
└── triangulation.html/json # Multi-source verification
└── suggested_searches.json/md # Research recommendations
Regenerate Outputs:
# CLI
python pipeline_orchestrator.py path/to/paper.pdf
# Dashboard
# Use "Re-run Analysis" button or "Import Existing Results" featureSee docs/OUTPUT_FILE_REFERENCE.md for complete file descriptions, sizes, and formats.
All project documentation is organized in the docs/ folder:
| Resource | Description |
|---|---|
| docs/README.md | Documentation index and quick reference |
| docs/USER_MANUAL.md | Complete user manual |
| docs/DASHBOARD_GUIDE.md | Web dashboard guide |
| docs/API_REFERENCE.md | REST API documentation |
| docs/TESTING_GUIDE.md | Testing procedures |
| docs/DEPLOYMENT_GUIDE.md | Deployment instructions |
Implementation task cards are in task-cards/ - see task-cards/README.md for the index.
Archived implementation summaries, smoke test reports, and PR reviews are in docs/archive/.
Run the test suite:
pytestRun specific test categories:
pytest -m unit # Unit tests only
pytest -m integration # Integration tests onlySee LICENSE file for details.
Last automated review: July 23, 2024