Skip to content

Nextflow pipeline for scaffolding genome assemblies with Hi-C reads

License

Notifications You must be signed in to change notification settings

digenoma-lab/hic-scaffolding-nf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hic-scaffolding-nf

Nextflow pipeline for scaffolding genome assemblies with Hi-C reads

Download juicer tools

wget http://hicfiles.tc4ga.com.s3.amazonaws.com/public/juicer/juicer_tools_1.11.09_jcuda.0.8.jar

Introduction

This pipeline requires the following inputs:

  1. A fasta file containing assembled contigs (--contigs)
  2. Hi-C reads in paired-end fastq(.gz) format (--r1Reads and --r2Reads)

It then performs the following tasks:

  1. Aligns the Hi-C reads to the contigs using chromap
  2. Scaffolds the contigs using yahs
  3. Prepares all the files you need to do manual curation in Juicebox

and produces the following outputs:

  • Alignments in bam format (out/chromap/aligned.bam)
  • A scaffolded assembly in both agp and fasta formats (out/scaffolds/yahs.out_scaffolds_final.[agp,fa])
  • .hic and .assembly files for loading in Juicebox Assembly Tools (out/juicebox_input/out_JBAT.[hic,assembly])

Configuration

Running on Lewis

If you're running this on the Lewis cluster, I've already got a profile set up with everything you need, so just add -profile lewis to the command and you're good to go.

Running on another cluster/cloud/locally

This pipeline has the following dependencies:

Nextflow must be in your path. You can get nextflow to make a conda environment containing chromap and yahs for you with -profile conda (note one dash!). JuicerTools is distributed as a jar file, so you need to tell the pipeline where it is by adding the argument --juicer-tools-jar /path/to/jar (note two dashes!). You can also add this stuff to a config file called nextflow.config in the directory from which you're running it (see nextflow documentation).

Running

nextflow run WarrenLab/hic-scaffolding-nf \
    --contigs contigs.fa \
    --r1Reads hic_reads_R1.fastq.gz \
    --r2Reads hic_reads_R2.fastq.gz

Kutral example

    nextflow run hic-scaffolding-nf/main.nf \
          --contigs sl_female_ont_purge_r2.fasta \
          --r1Reads DDU_AAOSDF_4_1_HFYVJDSX7.UDI488_clean.fastq.gz \
          --r2Reads DDU_AAOSDF_4_2_HFYVJDSX7.UDI488_clean.fastq.gz \
          -profile uoh

You'll need to add a couple options depending on your configuration (see section above).

About

Nextflow pipeline for scaffolding genome assemblies with Hi-C reads

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Nextflow 100.0%