Skip to content

algarsi3/pyAnnotator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyAnnotator

A Python tool for genomic variant annotation and database management. Provides a CLI for annotating VCF files and managing annotation databases with support for multiple file formats and genome assemblies.

✨ Features

  • 📁 Multi-format Support: VCF, CSV, TSV, and Parquet files
  • 💬 Interactive CLI: User-friendly command-line interface
  • 🗄️ Database Management: Persistent annotation source storage
  • 🧬 VCF Processing: Advanced annotation with SnpEff and our custom annotation core
  • 🔬 Assembly Support: GRCh37 and GRCh38

⚙️ Installation

🚀 Quick Setup (Recommended)

./scripts/setup.sh

Or, alternatively:

make setup

📋 Prerequisites

  • Python 3.9+
  • bcftools for VCF processing
  • Conda (for conda setup)

🔧 Manual Installation

# Create and activate conda environment
conda create -n pyannotator python=3.9
conda activate pyannotator

# Install dependencies
pip install -r requirements.txt
conda install -c bioconda bcftools

# Setup database
cd prisma
python -m prisma generate
python -m prisma db push
cd ..

For detailed installation instructions, troubleshooting, and platform-specific setup, see SETUP.md.

🚀 Quick Start

# Load and index VCF (using SnpEff)
python pyannotator.py load-variant-file test.vcf.gz --snpeff --name "my_variant_file" --assembly GRCh38.105

# Load annotation database
python pyannotator.py load-database annotations.csv --name "my_db" --assembly GRCh38.105

# List databases
python pyannotator.py list-databases

# Annotate with interactive database selection
python pyannotator.py annotate input.vcf --assembly GRCh38.105 --interactive

# OR annotate with specific databases
python pyannotator.py annotate input.vcf --assembly GRCh38.105 --databases "DB1" "DB2"

⌨️ Basic Commands

Command Description
annotate ... Annotate VCF file with selected databases
load-variant-file <file> Load and index variant file
load-database <file> Load and register annotation database
list-databases Show registered databases
delete-all-databases Remove all databases

⌨️ Other commands

Command Description
load-transcripts Fetch transcripts from Ensembl and store in database
list-transcripts List transcripts stored in database
clean-transcripts Clean/delete transcripts from database
select-database Interactive single database selection
select-databases Interactive multiple database selection

For detailed command options and examples, see COMMANDS.md.

💾 Database

Uses Prisma with SQLite. See DATABASE.md for schema details and CRUD operations.

📚 Documentation

About

A Python tool for genomic variant annotation and database management. Provides a CLI for annotating VCF files and managing annotation databases with support for multiple file formats and genome assemblies.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors