ATHENA (Advanced Tool for High-throughput Experimental NGS Analysis) is a comprehensive, production-ready command-line bioinformatics pipeline designed for complete NGS data analysis from raw reads to assembled contigs. It integrates industry-standard tools including FastQC, Trimmomatic, SPAdes, and QUAST to provide an automated, parallel-processing workflow with advanced features like execution history tracking, resume capabilities, and production deployment options.
ATHENA revolutionizes NGS data analysis by providing:
- 🔍 Quality Assessment: Comprehensive FastQC analysis on raw sequencing data
- ✂️ Read Trimming: Advanced Trimmomatic processing to remove low-quality bases and adapters
- 🧬 Genome Assembly: High-quality de novo genome assembly using SPAdes
- 📊 Assembly Validation: Detailed QUAST assembly quality assessment
- 🔄 Quality Validation: Post-trimming FastQC analysis to validate improvements
- 📈 Automated Reporting: Rich quality control reports throughout the pipeline
- 📝 Execution History: Complete tracking of pipeline runs with resume capabilities
- 🔄 Smart Resume: Continue from failed steps without re-running completed stages
- 🚀 Production Ready: AWS deployment with multiple scaling strategies
- ⚡ High Performance: Multi-threaded execution with configurable resource allocation
- 🛡️ Error Recovery: Robust error handling with detailed diagnostics
- 📊 Rich Reporting: Terminal-based summaries with detailed metrics
| Command | Description | Use Case |
|---|---|---|
start |
Complete automated pipeline (FastQC → Trimmomatic → FastQC → SPAdes → QUAST) | Full genome analysis |
continue |
Resume pipeline from specific step | Recovery from failures |
fastqc |
Standalone FastQC quality control analysis | Quality assessment only |
trim |
Standalone Trimmomatic read trimming | Read preprocessing |
spades |
Standalone SPAdes genome assembly | Assembly only |
quast |
Standalone QUAST assembly quality assessment | Assembly validation |
clean |
Clean up temporary files and resources | Maintenance |
help |
Comprehensive usage information | Documentation |
- History Tracking: Complete JSON-based execution history with detailed metadata
- Smart Resume: Automatic detection of completed steps and intelligent restart
- Session Management: Named sessions for organized project management
- Error Recovery: Detailed error diagnostics with suggested solutions
- Automated Configuration: Fully automated pipeline execution with config files
- Remote Server Support: Execute pipelines on external servers with user credentials
- Multi-threaded Processing: Configurable thread allocation for optimal performance
- Memory Management: Intelligent memory allocation with user-defined limits
- Resource Monitoring: Real-time tracking of CPU, memory, and disk usage
- Parallel Execution: Concurrent processing of multiple samples
- Docker Support: Containerized deployment for consistent environments
- AWS Integration: Multiple cloud deployment strategies without credential sharing
- Configuration Management: YAML/JSON configuration files for reproducible runs
- Comprehensive Testing: Automated test suites with performance benchmarking
- Rich Terminal Output: Color-coded progress indicators and detailed status
- Comprehensive Reports: HTML and text-based quality summaries
- Metrics Tracking: Performance and quality metrics throughout pipeline
- Flexible I/O: Support for compressed/uncompressed files and directory inputs
- Operating System: Linux (Ubuntu 20.04+, CentOS 8+) or macOS (10.15+)
- CPU: Multi-core processor (4+ cores recommended)
- Memory: 8GB RAM minimum (16GB+ recommended for large assemblies)
- Storage: 50GB+ free space (varies with dataset size)
- C++17 compatible compiler (GCC 7+, Clang 5+, or MSVC 2017+)
- CMake 3.12 or higher
- Git (for version control)
- Python 3.6+ (for test suites and utilities)
- FastQC v0.11.9+ (quality control analysis)
- Trimmomatic v0.39+ (read trimming)
- SPAdes v3.13+ (genome assembly)
- QUAST v5.0+ (assembly quality assessment)
- Java Runtime Environment 8+ (for FastQC and Trimmomatic)
- CLI11 (command-line parsing) -
external/CLI11.hpp - nlohmann/json (JSON processing) -
external/json.hpp
# Clone the repository
git clone https://github.com/1337-R-D/Athena.git
cd Athena
# Navigate to the main build directory
cd Athena
# Create and enter build directory
mkdir -p build && cd build
# Configure and build
cmake ..
make -j$(nproc)
# Test the build
./athena --version
./athena help# Debug build with additional information
cmake -DCMAKE_BUILD_TYPE=Debug ..
make -j$(nproc)
# Release build with optimizations
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)
# Install system-wide (optional)
sudo make install
# Clean build
make clean# Build main executable only
make athena
# Build and run application with test data
make run
# Run test suite
make test-commands
# Clean all build artifacts
make cleanFor convenience, use the provided installation script:
# Make installation script executable
chmod +x installation.sh
# Run automated installation (installs dependencies and builds ATHENA)
./installation.sh
# Verify installation
./Athena/build/athena --version# Build the main executable
make athena
# Build and run the application
make run
# Run the test suite
make test-commandsRun the full ATHENA pipeline for comprehensive NGS analysis from raw reads to assembled genome:
# Complete pipeline with paired-end reads
./build/athena start -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -o results/
# Complete pipeline with directory input (auto-detects paired files)
./build/athena start -d input_data/ -o results/
# With quality reports and verbose output
./build/athena start -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -o results/ -r -vATHENA now supports fully automated execution through configuration files, enabling seamless integration into workflows and remote server execution:
# Automated pipeline using config file
./build/athena start --config config.json
# Automated pipeline with custom output directory
./build/athena start --config config.json -o custom_results/Configuration File Format (config.json):
{
"remote": {
"ip": "12.34.56.78",
"user": "ubuntu",
"key_path": "~/.ssh/private-key",
"remote_dir": "/home/user1/job"
},
"pipeline": {
"input1": "sample_R1.fastq.gz",
"input2": "sample_R2.fastq.gz",
"output_dir": "results/",
"generate_reports": true,
"verbose": false
}
}ATHENA provides seamless execution on external high-performance servers through secure SSH connections:
Features:
- 🔐 Secure Authentication: SSH key-based authentication for secure connections
- 🚀 High-Performance Computing: Leverage powerful remote servers for large-scale analysis
- 📁 Automatic File Management: Seamless file transfer and remote directory management
- 👥 Multi-User Support: Individual user credentials and isolated workspaces
- 🔄 Remote Resume: Continue interrupted analyses on remote servers
Getting Remote Access: To use remote server capabilities, contact the ATHENA team to obtain your personal credentials:
- Email: [olympus-]
- Request: Include your intended use case and computational requirements
- Credentials: You'll receive a personalized
config.jsonfile with your server access details
Remote Execution Example:
# Execute complete pipeline on remote server
./build/athena start --config your_credentials.json
# Resume interrupted remote job
./build/athena continue --config your_credentials.json -p remote_results/
# Run specific analysis step remotely
./build/athena spades --config your_credentials.json -1 R1.fastq -2 R2.fastq -o assembly/ATHENA follows a structured 7-stage pipeline:
| Stage | Tool | Purpose | Outputs |
|---|---|---|---|
| 1. Initialization | Internal | Setup and validation | Session metadata, directory structure |
| 2. FastQC (Raw) | FastQC | Quality assessment of raw reads | HTML reports, quality metrics |
| 3. Trimmomatic | Trimmomatic | Remove low-quality bases and adapters | Trimmed paired/unpaired reads |
| 4. FastQC (Trimmed) | FastQC | Quality assessment post-trimming | Quality improvement metrics |
| 5. SPAdes Assembly | SPAdes | De novo genome assembly | Contigs, scaffolds, assembly graph |
| 6. QUAST Analysis | QUAST | Assembly quality assessment | Assembly statistics, reports |
| 7. Finalization | Internal | Cleanup and summary | Final reports, session completion |
ATHENA tracks execution history and allows intelligent resumption:
# Resume from last failed step automatically
./build/athena continue -p results/
# Continue specific session
./build/athena continue -s session_name -p results/
# Resume from specific step (if previous steps completed)
./build/athena continue -p results/ --from spadesResume capabilities:
- ✅ Automatic detection of completed steps
- ✅ Validation of intermediate files
- ✅ Error diagnostics and suggestions
- ✅ Session-based progress tracking
# Analyze paired-end reads
./build/athena fastqc -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -o qc_results/
# With terminal quality report
./build/athena fastqc -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -o qc_results/ -r
# Verbose mode with detailed output
./build/athena fastqc -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -o qc_results/ -v
# Process directory of FASTQ files
./build/athena fastqc -d input_reads/ -o qc_results/ -r# Trim paired-end reads with default settings
./build/athena trim -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -o trimmed_results/
# Trim with custom configuration and verbose output
./build/athena trim -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -o trimmed_results/ -v
# Process directory input
./build/athena trim -d raw_reads/ -o trimmed_results/# Assemble trimmed paired-end reads
./build/athena spades -1 trimmed_R1_paired.fq.gz -2 trimmed_R2_paired.fq.gz -o assembly/
# Assembly with verbose logging
./build/athena spades -1 trimmed_R1_paired.fq.gz -2 trimmed_R2_paired.fq.gz -o assembly/ -v
# Assembly with quality report generation
./build/athena spades -1 trimmed_R1_paired.fq.gz -2 trimmed_R2_paired.fq.gz -o assembly/ -r# Basic assembly quality evaluation
./build/athena quast -c contigs.fasta -o quast_results/
# With reference genome for comparison
./build/athena quast -c contigs.fasta -r reference_genome.fasta -o quast_results/
# Verbose analysis with detailed metrics
./build/athena quast -c contigs.fasta -o quast_results/ -v# Display comprehensive help
./build/athena help
# Show version information
./build/athena --version
# Clean temporary files and directories
./build/athena clean
# Clean with confirmation prompts
./build/athena clean --interactive-v, --version: Display ATHENA version information-h, --help: Show command-specific help
-1, --input1 FILE: First input FASTQ file (R1)-2, --input2 FILE: Second input FASTQ file (R2)-d, --directory DIR: Input directory containing FASTQ files-o, --output DIR: Output directory for results (required)-c, --contig FILE: Input contig/assembly file (QUAST only)-r, --reference FILE: Reference genome file (QUAST optional)
-r, --report: Generate detailed terminal reports-v, --verbose: Enable verbose output and logging-p, --project DIR: Project directory for resume operations-s, --session NAME: Specific session name for operations--from STEP: Resume from specific pipeline step--config FILE: Configuration file for automated execution (JSON format)
Configuration File Options:
- Remote server credentials and connection details
- Pipeline parameters and input/output specifications
- Execution preferences (verbose, reports, threading)
- User-specific workspace and authentication settings
Clean Command Features:
- Removes test output directories (
test_output,fastqc_test_output, etc.) - Cleans temporary files (
.tmp,.log,*_fastqc.html,*_trimmed*.fq.gz) - Cleans cache directories containing ATHENA-related files
- Optional build directory cleanup with user confirmation
- Color-coded output for clear feedback
- Detailed summary of cleaned items
-v, --version: Show version information-h, --help: Display help message
-1, --input1 FILE: First input FASTQ file (required)-2, --input2 FILE: Second input FASTQ file (required)-o, --output DIR: Output directory (required)-r, --report: Generate terminal quality reports
-1, --input1 FILE: First input FASTQ file (required)-2, --input2 FILE: Second input FASTQ file (required)-o, --output DIR: Output directory (required)-r, --report: Generate terminal quality reports-v, --verbose: Show detailed FastQC output
-1, --input1 FILE: First input FASTQ file (required)-2, --input2 FILE: Second input FASTQ file (required)-o, --output DIR: Output directory (required)-r, --report: Generate terminal reports (placeholder)-v, --verbose: Show detailed Trimmomatic output
-1, --input1 FILE: First input FASTQ file (required)-2, --input2 FILE: Second input FASTQ file (required)-o, --output DIR: Output directory (required)-r, --report: Generate terminal reports-v, --verbose: Show detailed SPAdes output
-c, --contig FILE: Input contig/assembly file (required)-o, --output DIR: Output directory (required)-r, --reference FILE: Reference genome file (optional)-v, --verbose: Show detailed QUAST output
ATHENA is continuously evolving to provide cutting-edge bioinformatics capabilities. Here are the exciting features currently in development:
- 📊 Comprehensive PDF Reports: Professional-grade analysis reports with publication-ready figures and tables
- 📈 Interactive Quality Dashboards: Web-based interactive visualizations for quality metrics and assembly statistics
- 🎯 Comparative Analysis Reports: Multi-sample comparison reports with statistical analysis
- 📋 Executive Summaries: High-level summary reports for project managers and stakeholders
Stay tuned for updates! Follow our development progress and contribute to the roadmap on our GitHub repository.
- FastQC Documentation: Detailed FastQC integration guide
- Trimmomatic Documentation: Trimmomatic configuration and usage
- SPAdes Documentation: SPAdes genome assembly guide
- QUAST Documentation: QUAST assembly quality assessment guide
- Contributors Guide: Development setup and contribution guidelines
- Official FastQC: Upstream FastQC documentation
- Official Trimmomatic: Upstream Trimmomatic documentation
- Official SPAdes: Upstream SPAdes documentation
- Official QUAST: Upstream QUAST documentation
This project is distributed under the terms specified in the repository license. Please refer to the LICENSE file for detailed information.
Contributions are welcome! Please read CONTRIBUTORS.md for:
- Development environment setup
- Code style guidelines
- Testing procedures
- Submission process
ATHENA - Streamlining NGS data preprocessing with automated quality control and read trimming.
