Skip to content

Automated and iterative literature review process for assessing literature database coverage for a given objective or set of objectives

License

Notifications You must be signed in to change notification settings

BootstrapAI-mgmt/Literature-Review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

979 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Literature Review Automation System

Integration Tests E2E Tests codecov

AI-powered pipeline for conducting comprehensive literature reviews across any research domain. Configure your research topic, keywords, and evaluation criteria through simple JSON files—no code changes required.

Research-Agnostic: While originally built for neuromorphic computing research, the pipeline now supports any research domain through configurable research_config.json files. See Research Domain Configuration below.

Quick Start

🔧 GitHub Codespace Setup

For Codespaces, bootstrap the n8n integration with:

source ./bootstrap-n8n.sh

This enables AI-assisted workflow management. See Codespace n8n Setup for details.

🌐 Web Dashboard (NEW!)

Launch the web dashboard for a user-friendly interface:

./run_dashboard.sh

Then open http://localhost:8000 in your browser to:

  • Upload PDFs
  • Monitor job progress in real-time
  • View logs and download reports
  • Retry failed jobs

See Dashboard Guide for detailed instructions.

Automated Pipeline (Recommended)

Run the full 5-stage pipeline with a single command:

python pipeline_orchestrator.py

With logging:

python pipeline_orchestrator.py --log-file pipeline.log

With custom configuration:

python pipeline_orchestrator.py --config pipeline_config.json

Resume from checkpoint:

python pipeline_orchestrator.py --resume

With custom research domain:

python pipeline_orchestrator.py --research-config domains/my-domain/research_config.json

Resume from specific stage:

python pipeline_orchestrator.py --resume-from judge

Batch mode (non-interactive):

# Run without user prompts - useful for CI/CD and automated testing
python pipeline_orchestrator.py --batch-mode

# Combine with other options
python pipeline_orchestrator.py --batch-mode --log-file batch.log
python pipeline_orchestrator.py --batch-mode --resume

Batch mode defaults:

  • Pillar selection: ALL analyzable pillars
  • Analysis mode: ONCE (single-pass)
  • User prompts: Skipped

Custom output directory:

# Use custom output directory for gap analysis results
python pipeline_orchestrator.py --output-dir reviews/my_review

# Use environment variable
export LITERATURE_REVIEW_OUTPUT_DIR=reviews/my_review
python pipeline_orchestrator.py

# Multiple reviews in separate directories
python pipeline_orchestrator.py --output-dir reviews/baseline
python pipeline_orchestrator.py --output-dir reviews/update_jan_2025

Priority: CLI argument > Environment variable > Config file > Default (gap_analysis_output)

Manual Execution

For step-by-step control, run each stage individually:

# Stage 1: Initial paper review
python Journal-Reviewer.py

# Stage 2: Judge claims
python Judge.py

# Stage 3: Deep requirements analysis (if rejections exist)
python DeepRequirementsAnalyzer.py
python Judge.py  # Re-judge DRA claims

# Stage 4: Sync to database
python sync_history_to_db.py

# Stage 5: Gap analysis and convergence
python Orchestrator.py

Pipeline Stages

  1. Journal-Reviewer: Screen papers and extract claims
  2. Judge: Evaluate claims against requirements
  3. DeepRequirementsAnalyzer (DRA): Re-analyze rejected claims (conditional)
  4. Sync: Update CSV database from version history
  5. Orchestrator: Identify gaps, generate gap-closing search recommendations, and drive convergence

Configuration

Create a pipeline_config.json file:

{
  "version": "1.2.0",
  "version_history_path": "review_version_history.json",
  "output_dir": "gap_analysis_output",
  "stage_timeout": 7200,
  "log_level": "INFO",
  "retry_policy": {
    "enabled": true,
    "default_max_attempts": 3,
    "default_backoff_base": 2,
    "default_backoff_max": 60,
    "circuit_breaker_threshold": 3,
    "per_stage": {
      "journal_reviewer": {
        "max_attempts": 5,
        "backoff_base": 2,
        "backoff_max": 120,
        "retryable_patterns": ["timeout", "rate limit", "connection error"]
      }
    }
  }
}

Configuration Options:

  • output_dir: Custom output directory for gap analysis results (default: gap_analysis_output)
  • version_history_path: Path to version history JSON file
  • stage_timeout: Maximum time (seconds) for each stage
  • log_level: Logging verbosity (DEBUG, INFO, WARNING, ERROR)
  • retry_policy: Automatic retry configuration (see below)

Retry Configuration

The pipeline automatically retries transient failures like network timeouts and rate limits:

Enable retry (default):

{
  "retry_policy": {
    "enabled": true,
    "default_max_attempts": 3
  }
}

Disable retry:

{
  "retry_policy": {
    "enabled": false
  }
}

Custom retry per stage:

{
  "retry_policy": {
    "per_stage": {
      "journal_reviewer": {
        "max_attempts": 5,
        "backoff_base": 2,
        "backoff_max": 120
      }
    }
  }
}

Retryable errors:

  • Network timeouts and connection errors
  • Rate limiting (429, "too many requests")
  • Service unavailable (503, 502, 504)
  • Temporary failures

Non-retryable errors:

  • Syntax errors, import errors
  • File not found
  • Permission denied (401, 403)
  • Invalid configuration

Requirements

Python Version: 3.12+

Pipeline:

pip install -r requirements-dev.txt

Web Dashboard:

pip install -r requirements-dashboard.txt

Create a .env file with your API key:

GEMINI_API_KEY=your_api_key_here
DASHBOARD_API_KEY=your-secure-api-key  # For dashboard authentication

📁 Repository Structure

Literature-Review/
├── research_config.json           # 🔬 Active research domain configuration
├── pillar_definitions.json        # Requirements framework
├── domains/                        # 🌐 Research domain configurations
│   ├── neuromorphic-computing/    # Default domain
│   ├── example-domain/            # Template for new domains
│   └── README.md                  # Guide for creating domains
├── docs/                          # 📚 All documentation
│   ├── README.md                  # Documentation guide
│   ├── DASHBOARD_GUIDE.md         # 🌐 Web dashboard guide
│   ├── RESEARCH_AGNOSTIC_ARCHITECTURE.md  # Multi-domain support
│   ├── CONSOLIDATED_ROADMAP.md    # ⭐ Master project roadmap
│   ├── architecture/              # System design & refactoring
│   ├── guides/                    # Workflow & strategy guides
│   ├── status-reports/            # Progress tracking
│   └── assessments/               # Technical evaluations
├── task-cards/                    # 📋 Implementation task cards
│   ├── README.md                  # Task cards guide
│   ├── agent/                     # Agent improvement tasks
│   ├── automation/                # Reliability & error handling
│   ├── integration/               # Integration test specs
│   ├── e2e/                       # End-to-end test specs
│   └── evidence-enhancement/      # Evidence quality features
├── reviews/                       # 🔍 Review documentation
│   ├── README.md                  # Reviews guide
│   ├── pull-requests/             # PR assessments
│   ├── architecture/              # Design reviews
│   └── third-party/               # External audits
├── literature_review/             # 🐍 Main package code
│   ├── config/                    # Research domain configuration
│   ├── analysis/                  # Judge, DRA, Recommendations
│   ├── reviewers/                 # Journal & Deep reviewers
│   ├── orchestrator.py            # Pipeline coordination
│   └── utils/                     # Shared utilities
├── webdashboard/                  # 🌐 Web dashboard
│   ├── app.py                     # FastAPI application
│   ├── templates/                 # HTML templates
│   └── static/                    # CSS, JS, images
├── tests/                         # 🧪 Test suite
│   ├── unit/                      # Unit tests
│   ├── component/                 # Component tests
│   ├── integration/               # Integration tests
│   ├── webui/                     # Dashboard tests
│   └── e2e/                       # End-to-end tests
└── scripts/                       # 🔧 Utility scripts

Documentation

📖 Quick Links

Getting Started:

Research Domain Configuration:

Incremental Review Mode:

Architecture & Design:

Testing & Status:

Task Planning:

See docs/README.md for complete documentation index.

Pipeline Orchestrator Features

  • Automated Execution: Runs all 5 stages sequentially
  • Conditional DRA: Only runs when rejections are detected
  • Progress Logging: Timestamps and status for each stage
  • Error Handling: Halts on failure with clear error messages
  • Configurable: Customizable timeouts and paths
  • Checkpoint/Resume: Resume from interruption points
  • Automatic Retry: Retry transient failures with exponential backoff
  • Circuit Breaker: Prevents infinite retry loops
  • Retry History: Track all retry attempts in checkpoint file
  • Incremental Review Mode: Only analyze new papers, preserve previous results, 60-80% faster
  • Gap-Targeted Pre-filtering: Reduce analysis time and API costs by only analyzing papers likely to close open gaps
  • Research-Agnostic (NEW!): Configure any research domain via research_config.json—no code changes required

Research Domain Configuration (NEW!)

The pipeline now supports any research domain through simple JSON configuration. No code changes required to switch between neuromorphic computing, climate science, biomedical research, or any other field.

How it works:

  1. Create a research_config.json defining your research topic, keywords, and evaluation criteria
  2. Optionally create a pillar_definitions.json with your requirements framework
  3. Run the pipeline with --research-config your_config.json

Quick Start:

# Use the default neuromorphic computing domain
python pipeline_orchestrator.py

# Use a custom research domain
python pipeline_orchestrator.py --research-config domains/climate-science/research_config.json

# Create a new domain from template
cp -r domains/example-domain domains/my-research
# Edit domains/my-research/research_config.json with your topic
python pipeline_orchestrator.py --research-config domains/my-research/research_config.json

Configuration File Structure:

{
  "domain": {
    "id": "my-research-domain",
    "name": "My Research Domain"
  },
  "research_topic": {
    "primary": "Your primary research question...",
    "short_description": "brief domain focus"
  },
  "prompt_context": {
    "researcher_role": "PhD-level research assistant specializing in..."
  },
  "vocabulary": {
    "primary_keywords": ["keyword1", "keyword2"],
    "secondary_keywords": ["technical-term1"]
  },
  "pillar_definitions_file": "pillar_definitions.json"
}

Benefits:

  • No code changes: Switch domains by changing a config file
  • Reproducible: Share configs for collaborative research
  • Multi-domain: Run analyses for different research areas in parallel
  • Backward compatible: Existing neuromorphic workflows continue to work

See domains/README.md for complete configuration guide.

Gap-Targeted Pre-filtering

Reduce analysis time and API costs by intelligently filtering papers before deep analysis. The pre-filter extracts unfilled gaps from previous analyses and scores each paper's relevance to those gaps.

How it works:

  1. Extracts gaps from previous gap analysis report
  2. Scores each paper's relevance to gaps using keyword matching
  3. Skips papers below relevance threshold
  4. Analyzes only gap-closing papers

Usage:

# Default (50% threshold)
python pipeline_orchestrator.py --prefilter

# Aggressive mode (30% threshold, analyze more papers)
python pipeline_orchestrator.py --prefilter-mode aggressive

# Conservative mode (70% threshold, analyze fewer papers)
python pipeline_orchestrator.py --prefilter-mode conservative

# Custom threshold
python pipeline_orchestrator.py --relevance-threshold 0.65

# Disable pre-filtering
python pipeline_orchestrator.py --no-prefilter

Benefits:

  • Cost Savings: Typical reduction of 50-70% in papers analyzed
  • Time Savings: 60-80% faster incremental runs
  • API Cost Reduction: $15-30 saved per run
  • Accuracy: <5% false negative rate (relevant papers rarely skipped)

Configuration:

Add to pipeline_config.json:

{
  "prefilter": {
    "enabled": true,
    "threshold": 0.50,
    "mode": "auto"
  }
}

Incremental Review Mode (NEW!)

Update existing reviews by adding new papers without re-analyzing the entire database. The incremental mode intelligently detects changes and only processes new or modified papers while preserving your previous analysis results.

How it works:

  1. Loads previous analysis - Reads existing gap report and orchestrator state
  2. Detects new papers - Compares database to find new or modified papers since last run
  3. Extracts gaps - Identifies unfilled requirements from previous analysis
  4. Scores relevance - Uses ML and keyword matching to predict which papers close gaps
  5. Pre-filters - Skips low-relevance papers (configurable threshold, default 50%)
  6. Analyzes - Runs deep analysis on filtered papers only
  7. Merges - Combines new evidence into existing report without data loss
  8. Tracks lineage - Records parent→child job relationship in state

Quick Start:

# 1. Run baseline analysis
python pipeline_orchestrator.py --output-dir reviews/baseline

# 2. Add new papers to data/raw/

# 3. Run incremental update (default mode)
python pipeline_orchestrator.py --output-dir reviews/baseline

# Or explicitly enable incremental mode
python pipeline_orchestrator.py --incremental --output-dir reviews/baseline

Usage Examples:

# Preview what would be analyzed (dry-run)
python pipeline_orchestrator.py --incremental --dry-run

# Force full re-analysis (override incremental)
python pipeline_orchestrator.py --force

# Continue specific review with parent job tracking
python pipeline_orchestrator.py --incremental --parent-job-id review_20250115_103000

# Clear analysis cache and start fresh
python pipeline_orchestrator.py --clear-cache --force

Benefits:

  • 60-80% faster - Only analyzes new, relevant papers
  • Cost savings - $15-30 per incremental run vs $50+ for full analysis
  • Preserves work - Builds on previous analysis without data loss
  • Tracks changes - See gaps closed over time with job lineage
  • Smart filtering - Automatic relevance scoring reduces wasted analysis
  • Safe fallback - Automatically runs full mode if prerequisites missing

Prerequisites: Incremental mode requires:

  • Previous gap_analysis_report.json in output directory
  • Complete orchestrator_state.json (analysis_completed: true)

If prerequisites are missing, the pipeline automatically falls back to full analysis mode.

Advanced Options:

# Combine with pre-filtering for maximum efficiency
python pipeline_orchestrator.py --incremental --prefilter-mode aggressive

# Custom relevance threshold for pre-filtering
python pipeline_orchestrator.py --incremental --relevance-threshold 0.40

# Use separate directories for comparison
python pipeline_orchestrator.py --output-dir reviews/baseline
python pipeline_orchestrator.py --output-dir reviews/update_feb_2025

Configuration:

Add to pipeline_config.json:

{
  "incremental": true,
  "force": false,
  "parent_job_id": null,
  "relevance_threshold": 0.50,
  "prefilter_enabled": true
}

Troubleshooting:

"Incremental prerequisites not met"

  • Ensure previous gap_analysis_report.json exists in output directory
  • Check orchestrator_state.json shows analysis_completed: true
  • Run full analysis first: python pipeline_orchestrator.py --force

"No new papers detected"

  • Verify papers were added to data/raw/ directory
  • Papers must be in JSON format with proper metadata
  • Use --force to re-analyze all papers anyway

"No changes detected - all papers are up to date"

  • This is normal! No new/modified papers were found
  • Add new papers or use --force for full re-analysis
  • Clear cache with --clear-cache if fingerprints seem stale

Checkpoint & Resume

The pipeline creates a pipeline_checkpoint.json file to track progress. If a pipeline fails, you can resume from the last successful stage:

# Resume from last checkpoint
python pipeline_orchestrator.py --resume

# Resume from specific stage
python pipeline_orchestrator.py --resume-from sync

View checkpoint status:

cat pipeline_checkpoint.json | jq '.stages'

View retry history:

cat pipeline_checkpoint.json | jq '.stages.journal_reviewer.retry_history'

Error Recovery

The pipeline automatically retries transient failures:

  1. Network Timeout → Retry with exponential backoff
  2. Rate Limit → Wait and retry with increasing delays
  3. Syntax Error → Fail immediately (no retry)
  4. Circuit Breaker → Stop after 3 consecutive failures

Example retry flow:

  • Attempt 1: Fails with "Connection timeout" → Wait 2s, retry
  • Attempt 2: Fails with "Rate limit" → Wait 4s, retry
  • Attempt 3: Succeeds → Continue to next stage

Output Files

The pipeline generates analysis results in configurable directories:

gap_analysis_output/          # Research gap analysis results (default, customizable via --output-dir)
proof_scorecard_output/       # Proof scorecard outputs (CLI)
workspace/                    # Dashboard job data and results

Custom Output Directory: You can specify a custom output directory for gap analysis results:

# Via CLI argument
python pipeline_orchestrator.py --output-dir reviews/baseline

# Via environment variable
export LITERATURE_REVIEW_OUTPUT_DIR=reviews/baseline

# Via config file
{
  "output_dir": "reviews/baseline"
}

This enables organizing multiple review projects:

reviews/
├── baseline_2025_01/         # Initial review
├── update_2025_02/           # Monthly update
└── comparative_study/        # Comparative analysis

Note: These directories are gitignored as they contain generated artifacts. Run the pipeline to regenerate outputs locally.

Complete Output Reference:

Typical Output Structure:

gap_analysis_output/
├── gap_analysis_report.json              # Master analysis report
├── executive_summary.md                  # Human-readable summary
├── waterfall_Pillar_1-7.html             # Pillar visualizations (7 files)
├── _OVERALL_Research_Gap_Radar.html      # Overall radar chart
├── _Paper_Network.html                   # Paper network graph
├── _Research_Trends.html                 # Trend analysis
├── proof_chain.html/json                 # Evidence proof chains
├── sufficiency_matrix.html/json          # Evidence sufficiency
└── triangulation.html/json               # Multi-source verification
└── suggested_searches.json/md            # Research recommendations

Regenerate Outputs:

# CLI
python pipeline_orchestrator.py path/to/paper.pdf

# Dashboard
# Use "Re-run Analysis" button or "Import Existing Results" feature

See docs/OUTPUT_FILE_REFERENCE.md for complete file descriptions, sizes, and formats.

📚 Documentation

All project documentation is organized in the docs/ folder:

Resource Description
docs/README.md Documentation index and quick reference
docs/USER_MANUAL.md Complete user manual
docs/DASHBOARD_GUIDE.md Web dashboard guide
docs/API_REFERENCE.md REST API documentation
docs/TESTING_GUIDE.md Testing procedures
docs/DEPLOYMENT_GUIDE.md Deployment instructions

Task Cards

Implementation task cards are in task-cards/ - see task-cards/README.md for the index.

Historical Documentation

Archived implementation summaries, smoke test reports, and PR reviews are in docs/archive/.

Testing

Run the test suite:

pytest

Run specific test categories:

pytest -m unit          # Unit tests only
pytest -m integration   # Integration tests only

License

See LICENSE file for details.


Last automated review: July 23, 2024

About

Automated and iterative literature review process for assessing literature database coverage for a given objective or set of objectives

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •