Skip to content

HiFi : FALCON Unzip3 User Guide

Zev Kronenberg edited this page Aug 6, 2019 · 9 revisions

This guide outlines the steps for running FALCON-Unzip3 for HiFi data. It assumes that FALCON-unzip3 was installed via pbBioConda and that FALCON has already been run on the input data. There have been substantive changes to FALCON-Unzip that warrant this wiki. For example, the polishing stage has been completely rewritten to use RACON, decreasing runtimes and increasing base accuracy.

For reference, here are two papers that discuss the advantages of HiFi assembly, for human:

Warning: This is an unofficial developer guide meant to cover the basics until a formal document is written. If you find bugs/problems/concerns please raise them via GitHub issues.


Quick Run - TL;DR

Brackets {} denote variables. For example, {movie} means multiple movies.

1. Extract HiFi reads from PacBio BAMs:

samtools fasta {movie}.bam > {movie}.fasta
samtools fastq {movie}.bam > {movie}.fastq

2. Generate and index a single fastq file:

cat {movie}.fastq > {falcon_unzip_input}.fastq
samtools fqidx {falcon_unzip_input}.fastq

3. Edit and run the FALCON config, be sure input_type = preads and include the FASTA fofn. For additional details see FALCON/FALCON-Unzip documentation.

 [General]
 input_type = preads
 input_fofn = CCS.fasta.fofn

A fofn is a text file with a list of files. FALCON supports multiple input FASTA files.


4. Edit the FALCON-Unzip config (fc_unzip.cfg) in the main assembly directory.

New sections in HiFi FALCON-Unzip config:

A. In the [Unzip] block specify the path to you

[Unzip]
fastq=../example/m00001_00001_00001.fastq

B. We are in the process of deprecating/simplifying some of the job sections for HiFi FALCON-Unzip. All of the FALCON-Unzip job steps will remain for backward compatibility. However, if you're working on HiFi pay attention to [job.defaults] and [job.high] these two sections configure the majority of resource requests.

  • Setup job defaults (for polishing and other jobs ):
[job.defaults]
NPROC={cpu}
MB={mb}
njobs={njobs}

We recommend 2-4 CPUs for [job.defaults]

  • Setup the high resources sections (for mapping/sorting):
[job.high]
njobs={njobs}
NPROC={cpu}
MB={mb}

This section controls a few stages, like read mapping, and assembly that require a lot of resources.

  • Setup the high memory sections (for single CPU jobs that require a lot of ram):
[job.highmem]
njobs={njobs}
NPROC={cpu}
MB={mb}

This section controls the haplotig assembly stage.

5. run.

fc_unzip --target="ccs" unzip.cfg

Common questions

Can I mix CLR and CCS/HiFi data?

Not currently. It's difficult to mix CLR and HiFi data because they have different error profiles and read lengths.

Why are both FASTA and FASTQ required?

Falcon uses FASTA and Falcon-Unzip uses FASTQ for polishing. It's important that the FASTA and FASTQ files contain the same reads.

How long does HiFi FALCON-Unzip take?

HiFi FALCON-Unzip is faster than traditional FALCON-Unzip. P-read generation in FALCON is skipped, which is an expensive step. We re-worked read tracking stages which reduced runtimes. Roughly speaking, on a cluster, it takes two to four days for a human genome to run through HiFi FALCON-Unzip.

What type of accuracy can I expect?

On the human datasets we've tested the primary contigs and haplotigs have a Phred scaled QV of 50. The phasing accuracy of the haplotigs, for human, is > 99.9%.

You can’t perform that action at this time.