Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

These steps describe how to call with freebayes and then use bcftools consensus to create a polished fasta file. Requirements are:

  • freebayes (>=1.3.1)
  • bcftools (>=1.8)
  • samtools
  1. Setup:

  2. Index the assembly fasta file and input BAM/CRAM file if not already indexed:

     samtools faidx $fasta
     samtools index $bam
  3. Parallelise into chunks across the genome. mean_cov is taken from longranger summary.csv and used for skipping excessive coverage region. This awk command will print out the commands to be run:

     awk '{print "freebayes --bam $bam --region "$1":1-"$2" --skip-coverage $((mean_cov*12)) -f $fasta | bcftools view --no-version -Ob -o "$1":1-"$2".bcf"}' $fasta.fai
  4. Make a list of the output files (in the same order as the reference) to be concatenated together when done:

     awk '{print $1":1-"$2".bcf"}' $ref.fai > concat_list.txt
  5. When all parallel jobs finished, concatenate and normalise non-REF variants:

     bcftools concat -nf concat_list.txt | bcftools view -Ou -e'type="ref"' | bcftools norm -Ob -f $fasta -o $sample.bcf
     bcftools index $sample.bcf
  6. Create the polished fasta with bcftools consensus. This will do very basic filtering of the callset (QUAL>1) and select only homozygous ALT and heterozygous non-REF sites. At heterozygous non-REF sites (both alleles do not match the reference fasta), the longest allele will be chosen.

     bcftools consensus -i'QUAL>1 && (GT="AA" || GT="Aa")' -Hla -f $fasta $sample.bcf > $sample.fasta

    Alternatively, if you have a phased VCF and would like to consistently pick one of the haplotypes at all heterozygous sites, set -H to 1 or 2:

     bcftools consensus -i'QUAL>1' -H1 -f $fasta $sample.bcf > $sample.fasta
You can’t perform that action at this time.