Skip to content

Luisagi/screen-sra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

screen-sra

A Snakemake workflow for quickly screening assembled genomes against SRA datasets.

What it does

For each genome and SRA run, the workflow:

  1. Downloads reads from SRA (fasterq-dump)
  2. Quality-trims reads (fastp)
  3. Maps reads to reference (bwa mem + samtools)
  4. Computes per-feature coverage (bedtools)

Output: One Excel file per genome with mapping statistics and feature-level coverage.

Quick start

# 1. Install dependencies
conda env create -f environment.yml
conda activate screen-sra

# 2. Edit config.yaml with your SRA IDs and genome files

# 3. Run (workflow parallelizes over SRA IDs/genomes with --cores)
snakemake --cores 16

# Note: `threads: 8` in config is used for `bwa mem` mapping.
# Download uses 4 threads, QC uses 4 threads.

# 4. Check results in results/excel/

Configuration

Edit config.yaml:

threads: 8                     # Threads for bwa mem mapping
sra_ids_file: SRR_Acc_List.txt # File with SRA IDs (one per line)
keep_aux: true                 # Keep intermediate files (true/false)
keep_mapping: true             # Keep CRAM/BAM files (true/false)

genomes:
  - genome_id: your_genome
    fasta: path/to/genome.fna
    gff3: path/to/genome.gff3

Inputs

  • SRA IDs: Text file with one SRA run ID per line (e.g., SRR123456)
  • Genomes: FASTA + GFF3 files for each reference genome

Outputs

results/
├── excel/
│   └── {genome_id}.xlsx    # Final reports (one per genome)
│       ├── General_mapping sheet
│       └── Per-SRA sample sheets
├── reads/                  # Raw fastq.gz files
├── qc/                     # fastp reports
├── mapping/                # BAM files
└── gene_tables/            # Per-sample TSV tables

Notes

  • This pipeline is designed for bacterial genome screening with annotation files generated by Bakta .
  • Other annotation files should also work, depending on their GFF/GFF3 structure.
  • Code was reviewed and optimized with GPT-5.3-Codex.

About

A Snakemake workflow for quickly screening assembled genomes against SRA datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages