Literature Review Automation System

AI-powered pipeline for conducting comprehensive literature reviews across any research domain. Configure your research topic, keywords, and evaluation criteria through simple JSON files—no code changes required.

Research-Agnostic: While originally built for neuromorphic computing research, the pipeline now supports any research domain through configurable research_config.json files. See Research Domain Configuration below.

Quick Start

🔧 GitHub Codespace Setup

For Codespaces, bootstrap the n8n integration with:

source ./bootstrap-n8n.sh

This enables AI-assisted workflow management. See Codespace n8n Setup for details.

🌐 Web Dashboard (NEW!)

Launch the web dashboard for a user-friendly interface:

./run_dashboard.sh

Then open http://localhost:8000 in your browser to:

Upload PDFs
Monitor job progress in real-time
View logs and download reports
Retry failed jobs

See Dashboard Guide for detailed instructions.

Automated Pipeline (Recommended)

Run the full 5-stage pipeline with a single command:

python pipeline_orchestrator.py

With logging:

python pipeline_orchestrator.py --log-file pipeline.log

With custom configuration:

python pipeline_orchestrator.py --config pipeline_config.json

Resume from checkpoint:

python pipeline_orchestrator.py --resume

With custom research domain:

python pipeline_orchestrator.py --research-config domains/my-domain/research_config.json

Resume from specific stage:

python pipeline_orchestrator.py --resume-from judge

Batch mode (non-interactive):

# Run without user prompts - useful for CI/CD and automated testing
python pipeline_orchestrator.py --batch-mode

# Combine with other options
python pipeline_orchestrator.py --batch-mode --log-file batch.log
python pipeline_orchestrator.py --batch-mode --resume

Batch mode defaults:

Pillar selection: ALL analyzable pillars
Analysis mode: ONCE (single-pass)
User prompts: Skipped

Custom output directory:

# Use custom output directory for gap analysis results
python pipeline_orchestrator.py --output-dir reviews/my_review

# Use environment variable
export LITERATURE_REVIEW_OUTPUT_DIR=reviews/my_review
python pipeline_orchestrator.py

# Multiple reviews in separate directories
python pipeline_orchestrator.py --output-dir reviews/baseline
python pipeline_orchestrator.py --output-dir reviews/update_jan_2025

Priority: CLI argument > Environment variable > Config file > Default (gap_analysis_output)

Manual Execution

For step-by-step control, run each stage individually:

# Stage 1: Initial paper review
python Journal-Reviewer.py

# Stage 2: Judge claims
python Judge.py

# Stage 3: Deep requirements analysis (if rejections exist)
python DeepRequirementsAnalyzer.py
python Judge.py  # Re-judge DRA claims

# Stage 4: Sync to database
python sync_history_to_db.py

# Stage 5: Gap analysis and convergence
python Orchestrator.py

Pipeline Stages

Journal-Reviewer: Screen papers and extract claims
Judge: Evaluate claims against requirements
DeepRequirementsAnalyzer (DRA): Re-analyze rejected claims (conditional)
Sync: Update CSV database from version history
Orchestrator: Identify gaps, generate gap-closing search recommendations, and drive convergence

Configuration

Create a pipeline_config.json file:

{
  "version": "1.2.0",
  "version_history_path": "review_version_history.json",
  "output_dir": "gap_analysis_output",
  "stage_timeout": 7200,
  "log_level": "INFO",
  "retry_policy": {
    "enabled": true,
    "default_max_attempts": 3,
    "default_backoff_base": 2,
    "default_backoff_max": 60,
    "circuit_breaker_threshold": 3,
    "per_stage": {
      "journal_reviewer": {
        "max_attempts": 5,
        "backoff_base": 2,
        "backoff_max": 120,
        "retryable_patterns": ["timeout", "rate limit", "connection error"]
      }
    }
  }
}

Configuration Options:

output_dir: Custom output directory for gap analysis results (default: gap_analysis_output)
version_history_path: Path to version history JSON file
stage_timeout: Maximum time (seconds) for each stage
log_level: Logging verbosity (DEBUG, INFO, WARNING, ERROR)
retry_policy: Automatic retry configuration (see below)

Retry Configuration

The pipeline automatically retries transient failures like network timeouts and rate limits:

Enable retry (default):

{
  "retry_policy": {
    "enabled": true,
    "default_max_attempts": 3
  }
}

Disable retry:

{
  "retry_policy": {
    "enabled": false
  }
}

Custom retry per stage:

{
  "retry_policy": {
    "per_stage": {
      "journal_reviewer": {
        "max_attempts": 5,
        "backoff_base": 2,
        "backoff_max": 120
      }
    }
  }
}

Retryable errors:

Network timeouts and connection errors
Rate limiting (429, "too many requests")
Service unavailable (503, 502, 504)
Temporary failures

Non-retryable errors:

Syntax errors, import errors
File not found
Permission denied (401, 403)
Invalid configuration

Requirements

Python Version: 3.12+

Pipeline:

pip install -r requirements-dev.txt

Web Dashboard:

pip install -r requirements-dashboard.txt

Create a .env file with your API key:

GEMINI_API_KEY=your_api_key_here
DASHBOARD_API_KEY=your-secure-api-key  # For dashboard authentication

📁 Repository Structure

Literature-Review/
├── research_config.json           # 🔬 Active research domain configuration
├── pillar_definitions.json        # Requirements framework
├── domains/                        # 🌐 Research domain configurations
│   ├── neuromorphic-computing/    # Default domain
│   ├── example-domain/            # Template for new domains
│   └── README.md                  # Guide for creating domains
├── docs/                          # 📚 All documentation
│   ├── README.md                  # Documentation guide
│   ├── DASHBOARD_GUIDE.md         # 🌐 Web dashboard guide
│   ├── RESEARCH_AGNOSTIC_ARCHITECTURE.md  # Multi-domain support
│   ├── CONSOLIDATED_ROADMAP.md    # ⭐ Master project roadmap
│   ├── architecture/              # System design & refactoring
│   ├── guides/                    # Workflow & strategy guides
│   ├── status-reports/            # Progress tracking
│   └── assessments/               # Technical evaluations
├── task-cards/                    # 📋 Implementation task cards
│   ├── README.md                  # Task cards guide
│   ├── agent/                     # Agent improvement tasks
│   ├── automation/                # Reliability & error handling
│   ├── integration/               # Integration test specs
│   ├── e2e/                       # End-to-end test specs
│   └── evidence-enhancement/      # Evidence quality features
├── reviews/                       # 🔍 Review documentation
│   ├── README.md                  # Reviews guide
│   ├── pull-requests/             # PR assessments
│   ├── architecture/              # Design reviews
│   └── third-party/               # External audits
├── literature_review/             # 🐍 Main package code
│   ├── config/                    # Research domain configuration
│   ├── analysis/                  # Judge, DRA, Recommendations
│   ├── reviewers/                 # Journal & Deep reviewers
│   ├── orchestrator.py            # Pipeline coordination
│   └── utils/                     # Shared utilities
├── webdashboard/                  # 🌐 Web dashboard
│   ├── app.py                     # FastAPI application
│   ├── templates/                 # HTML templates
│   └── static/                    # CSS, JS, images
├── tests/                         # 🧪 Test suite
│   ├── unit/                      # Unit tests
│   ├── component/                 # Component tests
│   ├── integration/               # Integration tests
│   ├── webui/                     # Dashboard tests
│   └── e2e/                       # End-to-end tests
└── scripts/                       # 🔧 Utility scripts

Documentation

📖 Quick Links

Getting Started:

docs/guides/WORKFLOW_EXECUTION_GUIDE.md - How to run the pipeline
docs/CONSOLIDATED_ROADMAP.md ⭐ - Complete project overview
docs/DASHBOARD_GUIDE.md - Web dashboard guide

Research Domain Configuration:

docs/RESEARCH_AGNOSTIC_ARCHITECTURE.md - Multi-domain architecture guide
domains/README.md - Creating new research domains
research_config.json - Example configuration

Incremental Review Mode:

docs/INCREMENTAL_REVIEW_USER_GUIDE.md - Complete incremental mode guide
docs/INCREMENTAL_REVIEW_MIGRATION_GUIDE.md - Migration from previous versions
docs/api/incremental_endpoints.yaml - REST API specification
examples/incremental_review_examples.py - Code examples

Architecture & Design:

docs/architecture/ARCHITECTURE_REFACTOR.md - Current repository structure
docs/architecture/ARCHITECTURE_ANALYSIS.md - System architecture

Testing & Status:

docs/status-reports/TESTING_STATUS_SUMMARY.md - Test coverage
docs/TEST_MODIFICATIONS.md - Enhanced test specifications

Task Planning:

task-cards/README.md - All implementation tasks (23 cards)
task-cards/evidence-enhancement/ - Evidence quality features

See docs/README.md for complete documentation index.

Pipeline Orchestrator Features

✅ Automated Execution: Runs all 5 stages sequentially
✅ Conditional DRA: Only runs when rejections are detected
✅ Progress Logging: Timestamps and status for each stage
✅ Error Handling: Halts on failure with clear error messages
✅ Configurable: Customizable timeouts and paths
✅ Checkpoint/Resume: Resume from interruption points
✅ Automatic Retry: Retry transient failures with exponential backoff
✅ Circuit Breaker: Prevents infinite retry loops
✅ Retry History: Track all retry attempts in checkpoint file
✅ Incremental Review Mode: Only analyze new papers, preserve previous results, 60-80% faster
✅ Gap-Targeted Pre-filtering: Reduce analysis time and API costs by only analyzing papers likely to close open gaps
✅ Research-Agnostic (NEW!): Configure any research domain via research_config.json—no code changes required

Research Domain Configuration (NEW!)

The pipeline now supports any research domain through simple JSON configuration. No code changes required to switch between neuromorphic computing, climate science, biomedical research, or any other field.

How it works:

Create a research_config.json defining your research topic, keywords, and evaluation criteria
Optionally create a pillar_definitions.json with your requirements framework
Run the pipeline with --research-config your_config.json

Quick Start:

# Use the default neuromorphic computing domain
python pipeline_orchestrator.py

# Use a custom research domain
python pipeline_orchestrator.py --research-config domains/climate-science/research_config.json

# Create a new domain from template
cp -r domains/example-domain domains/my-research
# Edit domains/my-research/research_config.json with your topic
python pipeline_orchestrator.py --research-config domains/my-research/research_config.json

Configuration File Structure:

{
  "domain": {
    "id": "my-research-domain",
    "name": "My Research Domain"
  },
  "research_topic": {
    "primary": "Your primary research question...",
    "short_description": "brief domain focus"
  },
  "prompt_context": {
    "researcher_role": "PhD-level research assistant specializing in..."
  },
  "vocabulary": {
    "primary_keywords": ["keyword1", "keyword2"],
    "secondary_keywords": ["technical-term1"]
  },
  "pillar_definitions_file": "pillar_definitions.json"
}

Benefits:

No code changes: Switch domains by changing a config file
Reproducible: Share configs for collaborative research
Multi-domain: Run analyses for different research areas in parallel
Backward compatible: Existing neuromorphic workflows continue to work

See domains/README.md for complete configuration guide.

Gap-Targeted Pre-filtering

Reduce analysis time and API costs by intelligently filtering papers before deep analysis. The pre-filter extracts unfilled gaps from previous analyses and scores each paper's relevance to those gaps.

How it works:

Extracts gaps from previous gap analysis report
Scores each paper's relevance to gaps using keyword matching
Skips papers below relevance threshold
Analyzes only gap-closing papers

Usage:

# Default (50% threshold)
python pipeline_orchestrator.py --prefilter

# Aggressive mode (30% threshold, analyze more papers)
python pipeline_orchestrator.py --prefilter-mode aggressive

# Conservative mode (70% threshold, analyze fewer papers)
python pipeline_orchestrator.py --prefilter-mode conservative

# Custom threshold
python pipeline_orchestrator.py --relevance-threshold 0.65

# Disable pre-filtering
python pipeline_orchestrator.py --no-prefilter

Benefits:

Cost Savings: Typical reduction of 50-70% in papers analyzed
Time Savings: 60-80% faster incremental runs
API Cost Reduction: $15-30 saved per run
Accuracy: <5% false negative rate (relevant papers rarely skipped)

Configuration:

Add to pipeline_config.json:

{
  "prefilter": {
    "enabled": true,
    "threshold": 0.50,
    "mode": "auto"
  }
}

Incremental Review Mode (NEW!)

Update existing reviews by adding new papers without re-analyzing the entire database. The incremental mode intelligently detects changes and only processes new or modified papers while preserving your previous analysis results.

How it works:

Loads previous analysis - Reads existing gap report and orchestrator state
Detects new papers - Compares database to find new or modified papers since last run
Extracts gaps - Identifies unfilled requirements from previous analysis
Scores relevance - Uses ML and keyword matching to predict which papers close gaps
Pre-filters - Skips low-relevance papers (configurable threshold, default 50%)
Analyzes - Runs deep analysis on filtered papers only
Merges - Combines new evidence into existing report without data loss
Tracks lineage - Records parent→child job relationship in state

Quick Start:

# 1. Run baseline analysis
python pipeline_orchestrator.py --output-dir reviews/baseline

# 2. Add new papers to data/raw/

# 3. Run incremental update (default mode)
python pipeline_orchestrator.py --output-dir reviews/baseline

# Or explicitly enable incremental mode
python pipeline_orchestrator.py --incremental --output-dir reviews/baseline

Usage Examples:

# Preview what would be analyzed (dry-run)
python pipeline_orchestrator.py --incremental --dry-run

# Force full re-analysis (override incremental)
python pipeline_orchestrator.py --force

# Continue specific review with parent job tracking
python pipeline_orchestrator.py --incremental --parent-job-id review_20250115_103000

# Clear analysis cache and start fresh
python pipeline_orchestrator.py --clear-cache --force

Benefits:

60-80% faster - Only analyzes new, relevant papers
Cost savings - $15-30 per incremental run vs $50+ for full analysis
Preserves work - Builds on previous analysis without data loss
Tracks changes - See gaps closed over time with job lineage
Smart filtering - Automatic relevance scoring reduces wasted analysis
Safe fallback - Automatically runs full mode if prerequisites missing

Prerequisites: Incremental mode requires:

Previous gap_analysis_report.json in output directory
Complete orchestrator_state.json (analysis_completed: true)

If prerequisites are missing, the pipeline automatically falls back to full analysis mode.

Advanced Options:

# Combine with pre-filtering for maximum efficiency
python pipeline_orchestrator.py --incremental --prefilter-mode aggressive

# Custom relevance threshold for pre-filtering
python pipeline_orchestrator.py --incremental --relevance-threshold 0.40

# Use separate directories for comparison
python pipeline_orchestrator.py --output-dir reviews/baseline
python pipeline_orchestrator.py --output-dir reviews/update_feb_2025

Configuration:

Add to pipeline_config.json:

{
  "incremental": true,
  "force": false,
  "parent_job_id": null,
  "relevance_threshold": 0.50,
  "prefilter_enabled": true
}

Troubleshooting:

"Incremental prerequisites not met"

Ensure previous gap_analysis_report.json exists in output directory
Check orchestrator_state.json shows analysis_completed: true
Run full analysis first: python pipeline_orchestrator.py --force

"No new papers detected"

Verify papers were added to data/raw/ directory
Papers must be in JSON format with proper metadata
Use --force to re-analyze all papers anyway

"No changes detected - all papers are up to date"

This is normal! No new/modified papers were found
Add new papers or use --force for full re-analysis
Clear cache with --clear-cache if fingerprints seem stale

Checkpoint & Resume

The pipeline creates a pipeline_checkpoint.json file to track progress. If a pipeline fails, you can resume from the last successful stage:

# Resume from last checkpoint
python pipeline_orchestrator.py --resume

# Resume from specific stage
python pipeline_orchestrator.py --resume-from sync

View checkpoint status:

cat pipeline_checkpoint.json | jq '.stages'

View retry history:

cat pipeline_checkpoint.json | jq '.stages.journal_reviewer.retry_history'

Error Recovery

The pipeline automatically retries transient failures:

Network Timeout → Retry with exponential backoff
Rate Limit → Wait and retry with increasing delays
Syntax Error → Fail immediately (no retry)
Circuit Breaker → Stop after 3 consecutive failures

Example retry flow:

Attempt 1: Fails with "Connection timeout" → Wait 2s, retry
Attempt 2: Fails with "Rate limit" → Wait 4s, retry
Attempt 3: Succeeds → Continue to next stage

Output Files

The pipeline generates analysis results in configurable directories:

gap_analysis_output/          # Research gap analysis results (default, customizable via --output-dir)
proof_scorecard_output/       # Proof scorecard outputs (CLI)
workspace/                    # Dashboard job data and results

Custom Output Directory: You can specify a custom output directory for gap analysis results:

# Via CLI argument
python pipeline_orchestrator.py --output-dir reviews/baseline

# Via environment variable
export LITERATURE_REVIEW_OUTPUT_DIR=reviews/baseline

# Via config file
{
  "output_dir": "reviews/baseline"
}

This enables organizing multiple review projects:

reviews/
├── baseline_2025_01/         # Initial review
├── update_2025_02/           # Monthly update
└── comparative_study/        # Comparative analysis

Note: These directories are gitignored as they contain generated artifacts. Run the pipeline to regenerate outputs locally.

Complete Output Reference:

docs/OUTPUT_FILE_REFERENCE.md - Comprehensive list of all output files (CLI & Dashboard)
docs/OUTPUT_MANAGEMENT_STRATEGY.md - Git policy and rationale
docs/DASHBOARD_CLI_PARITY.md - Feature comparison

Typical Output Structure:

gap_analysis_output/
├── gap_analysis_report.json              # Master analysis report
├── executive_summary.md                  # Human-readable summary
├── waterfall_Pillar_1-7.html             # Pillar visualizations (7 files)
├── _OVERALL_Research_Gap_Radar.html      # Overall radar chart
├── _Paper_Network.html                   # Paper network graph
├── _Research_Trends.html                 # Trend analysis
├── proof_chain.html/json                 # Evidence proof chains
├── sufficiency_matrix.html/json          # Evidence sufficiency
└── triangulation.html/json               # Multi-source verification
└── suggested_searches.json/md            # Research recommendations

Regenerate Outputs:

# CLI
python pipeline_orchestrator.py path/to/paper.pdf

# Dashboard
# Use "Re-run Analysis" button or "Import Existing Results" feature

See docs/OUTPUT_FILE_REFERENCE.md for complete file descriptions, sizes, and formats.

📚 Documentation

All project documentation is organized in the docs/ folder:

Resource	Description
docs/README.md	Documentation index and quick reference
docs/USER_MANUAL.md	Complete user manual
docs/DASHBOARD_GUIDE.md	Web dashboard guide
docs/API_REFERENCE.md	REST API documentation
docs/TESTING_GUIDE.md	Testing procedures
docs/DEPLOYMENT_GUIDE.md	Deployment instructions

Task Cards

Implementation task cards are in task-cards/ - see task-cards/README.md for the index.

Historical Documentation

Archived implementation summaries, smoke test reports, and PR reviews are in docs/archive/.

Testing

Run the test suite:

pytest

Run specific test categories:

pytest -m unit          # Unit tests only
pytest -m integration   # Integration tests only

License

See LICENSE file for details.

Last automated review: July 23, 2024

Name		Name	Last commit message	Last commit date
Latest commit History 979 Commits
.agent/workflows		.agent/workflows
.devcontainer		.devcontainer
.github/workflows		.github/workflows
analysis_cache		analysis_cache
data		data
docs		docs
domains		domains
examples		examples
gold_standards		gold_standards
literature_review		literature_review
n8n-server		n8n-server
reports		reports
reviews		reviews
scripts		scripts
task-cards		task-cards
tests		tests
validation_framework		validation_framework
webdashboard		webdashboard
.coveragerc		.coveragerc
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
INCR_W2_3_UI_PREVIEW.html		INCR_W2_3_UI_PREVIEW.html
LICENSE		LICENSE
README.md		README.md
bootstrap-n8n.sh		bootstrap-n8n.sh
deep_coverage_database.DEPRECATED.json		deep_coverage_database.DEPRECATED.json
docker-compose.yml		docker-compose.yml
e2e_smoke_test.py		e2e_smoke_test.py
errors-export.json		errors-export.json
genealogy_test.html		genealogy_test.html
migration.log		migration.log
nginx.conf		nginx.conf
orchestrator_state.json		orchestrator_state.json
pillar_definitions.json		pillar_definitions.json
pillar_definitions_enhanced.json		pillar_definitions_enhanced.json
pillar_research_log.json		pillar_research_log.json
pipeline_config.json		pipeline_config.json
pipeline_orchestrator.py		pipeline_orchestrator.py
pr-review-export.json		pr-review-export.json
pytest.ini		pytest.ini
release-export.json		release-export.json
requirements-dashboard.txt		requirements-dashboard.txt
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
research_config.json		research_config.json
review_log.backup2.json		review_log.backup2.json
review_log.json		review_log.json
run_dashboard.sh		run_dashboard.sh
smoke_test_enhanced_pipeline.py		smoke_test_enhanced_pipeline.py
stakeholder_definitions.json		stakeholder_definitions.json
test_deep_reviewer_instantiation.py		test_deep_reviewer_instantiation.py
test_user_input_selection.py		test_user_input_selection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Literature Review Automation System

Quick Start

🔧 GitHub Codespace Setup

🌐 Web Dashboard (NEW!)

Automated Pipeline (Recommended)

Manual Execution

Pipeline Stages

Configuration

Retry Configuration

Requirements

📁 Repository Structure

Documentation

📖 Quick Links

Pipeline Orchestrator Features

Research Domain Configuration (NEW!)

Gap-Targeted Pre-filtering

Incremental Review Mode (NEW!)

Checkpoint & Resume

Error Recovery

Output Files

📚 Documentation

Task Cards

Historical Documentation

Testing

License

About

Uh oh!

Releases 1

Packages

Contributors 4

Uh oh!

Languages

License

BootstrapAI-mgmt/Literature-Review

Folders and files

Latest commit

History

Repository files navigation

Literature Review Automation System

Quick Start

🔧 GitHub Codespace Setup

🌐 Web Dashboard (NEW!)

Automated Pipeline (Recommended)

Manual Execution

Pipeline Stages

Configuration

Retry Configuration

Requirements

📁 Repository Structure

Documentation

📖 Quick Links

Pipeline Orchestrator Features

Research Domain Configuration (NEW!)

Gap-Targeted Pre-filtering

Incremental Review Mode (NEW!)

Checkpoint & Resume

Error Recovery

Output Files

📚 Documentation

Task Cards

Historical Documentation

Testing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Uh oh!

Languages

Packages