Skip to content

ataumutozsoy/VarAnnote

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VarAnnote - Comprehensive Variant Analysis & Annotation Suite

DOI

🧬 A powerful toolkit for genomic variant annotation and clinical interpretation.

Features

  • Comprehensive Annotation: ClinVar, gnomAD, COSMIC, dbSNP integration
  • Functional Prediction: Gene symbols, consequences, pathogenicity scores
  • Multiple Output Formats: VCF, TSV, JSON
  • Command Line Interface: Easy-to-use CLI with progress bars
  • Modular Design: Each tool can be used independently
  • Academic Ready: Designed for research and publication

Installation

From Source (Development)

git clone https://github.com/yourusername/varannote.git
cd VarAnnote
pip install -e .

From PyPI (Coming Soon)

pip install varannote

Installation

Option 1: Install from PyPI (Recommended)

pip install varannote

Option 2: Install from Source

git clone https://github.com/AtaUmutOZSOY/VarAnnote.git
cd VarAnnote
pip install -e .

Windows PATH Configuration

VarAnnote automatically configures PATH on Windows during installation. If you encounter any issues:

  1. Restart your terminal after installation - this is usually enough
  2. Alternative: Use python -m (always works):
    python -m varannote --help
    python -m varannote annotate input.vcf --output output.vcf
  3. Manual setup (if needed):
    python -m varannote setup-path

Verify Installation

# Test installation
varannote --version
# or
python -m varannote --version

# Test with help
varannote --help

Quick Start

Basic Variant Annotation

# Annotate variants with default databases
varannote annotate test_variants.vcf --output annotated.vcf

# Use specific databases
varannote annotate input.vcf -d clinvar -d gnomad --output result.vcf

# Output in different formats
varannote annotate input.vcf --format tsv --output result.tsv
varannote annotate input.vcf --format json --output result.json

Pathogenicity Prediction

# Predict pathogenicity using ensemble model
varannote pathogenicity variants.vcf --model ensemble

# Use specific model with custom threshold
varannote pathogenicity variants.vcf --model cadd --threshold 0.7

Available Commands

varannote --help                    # Show all commands
varannote annotate --help           # Annotation help
varannote pathogenicity --help      # Pathogenicity prediction help
varannote pharmacogenomics --help   # Pharmacogenomics analysis help
varannote population-freq --help    # Population frequency help
varannote compound-het --help       # Compound heterozygote detection help
varannote segregation --help        # Family segregation analysis help

Command Reference

Main Commands

Command Description
annotate Comprehensive variant annotation
pathogenicity Pathogenicity prediction
pharmacogenomics Drug-gene interaction analysis
population-freq Population frequency calculation
compound-het Compound heterozygote detection
segregation Family segregation analysis

Common Options

Option Description
--output, -o Output file path
--format, -f Output format (vcf, tsv, json)
--genome, -g Reference genome (hg19, hg38)
--verbose, -v Enable verbose output

Input/Output Formats

Input

  • VCF files (.vcf, .vcf.gz)
  • Standard VCF format with CHROM, POS, REF, ALT fields

Output

  • VCF: Annotated VCF with INFO fields
  • TSV: Tab-separated values for analysis
  • JSON: Structured data for programmatic use

Annotation Databases

Database Description Fields Added
ClinVar Clinical significance clinvar_significance, clinvar_id
gnomAD Population frequencies gnomad_af, gnomad_ac, gnomad_an
COSMIC Cancer mutations cosmic_id, cosmic_count
dbSNP Variant identifiers dbsnp_id

Examples

Example 1: Basic Annotation

varannote annotate test_variants.vcf --output annotated.vcf --verbose

Output:

🧬 Annotating variants from test_variants.vcf
πŸ“Š Using genome: hg38
πŸ—„οΈ  Databases: clinvar, gnomad, dbsnp
πŸ”§ Initialized VariantAnnotator with genome: hg38
πŸ“– Reading variants from test_variants.vcf
πŸ” Found 5 variants to annotate
Annotating variants  [####################################]  100%
βœ… Annotation complete: 5 variants processed
πŸ“ Output saved to: annotated.vcf

Example 2: TSV Output for Analysis

varannote annotate test_variants.vcf --format tsv --output results.tsv

Example 3: Pathogenicity Analysis

varannote pathogenicity test_variants.vcf --model ensemble --threshold 0.6

Development

Project Structure

VarAnnote/
β”œβ”€β”€ setup.py                    # Package configuration
β”œβ”€β”€ requirements.txt            # Dependencies
β”œβ”€β”€ README.md                   # This file
β”œβ”€β”€ test_variants.vcf          # Test data
└── varannote/
    β”œβ”€β”€ __init__.py            # Main package
    β”œβ”€β”€ cli.py                 # Command line interface
    β”œβ”€β”€ core/                  # Core functionality
    β”‚   β”œβ”€β”€ annotator.py       # Variant annotation engine
    β”‚   └── pathogenicity.py   # Pathogenicity prediction
    β”œβ”€β”€ tools/                 # Individual tools
    β”‚   β”œβ”€β”€ annotator.py       # Annotation tool
    β”‚   └── ...                # Other tools
    └── utils/                 # Utilities
        β”œβ”€β”€ vcf_parser.py      # VCF file parser
        └── annotation_db.py   # Database interface

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Run with coverage
pytest --cov=varannote tests/

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Citation

If you use VarAnnote in your research, please cite:

APA Format:

Γ–zsoy, A. U. (2025). VarAnnote: Comprehensive Variant Analysis & Annotation Suite (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.15615370

BibTeX:

@software{ozsoy2025varannote,
  author = {Γ–zsoy, Ata Umut},
  title = {VarAnnote: Comprehensive Variant Analysis \& Annotation Suite},
  url = {https://github.com/AtaUmutOZSOY/VarAnnote},
  doi = {10.5281/zenodo.15615370},
  version = {1.0.0},
  year = {2025}
}

IEEE Format:

A. U. Γ–zsoy, "VarAnnote: Comprehensive Variant Analysis & Annotation Suite," Version 1.0.0, 2025, doi: 10.5281/zenodo.15615370. [Online]. Available: https://github.com/AtaUmutOZSOY/VarAnnote

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Acknowledgments

  • BioPython community for sequence analysis tools
  • gnomAD consortium for population frequency data
  • ClinVar team for clinical variant curation
  • COSMIC database for cancer mutation data

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages