v0.18.2 - Hotfix release for ambiguous alignments

@FelixKrueger FelixKrueger released this Jun 28, 2017 · 7 commits to master since this release

Bismark

  • Changed the timing of when ambiguous within same thread alignments are reset. Previously some alignments were incorrectly considered ambiguous (see here). This affected Bowtie 2 alignments only.

bismark2bedGraph

  • The option --ample_mem is now mutually exclusive with specifying memory for the UNIX sort command via the option --buffer_size.

Downloads

v0.18.1

@FelixKrueger FelixKrueger released this May 22, 2017 · 13 commits to master since this release

Bismark

  • Commented out warning messages for certain ambiguous alignments for paired-end alignments.

Downloads

v0.18.0 - further NOMe-Seq support and bug fixes

@FelixKrueger FelixKrueger released this May 15, 2017 · 14 commits to master since this release

Release Notes for Bismark v0.18.0

  • Changed FindBin qw($Bin) to FindBin qw($RealBin) for bismark, bismark_methylation_extractor, bismark2report and bismark2summary so that symlinks are resolved before calling different modules.

Bismark

  • Fixed the behaviour of (very rare) ambiguous corner cases where a sequence had a perfect sequence duplication within the valid paired-end distance.

Methylation Extractor

  • Added new option --yacht (for Yet Another Context Hunting Tool) that writes out additional information about the read a methylation call belongs to, and its output is meant to be fed into the NOMe_filtering script (see below). This option writes out a single 'any_C_context' file that contains all methylation calls for a read consecutively. Its intended use is single-cell NOMe-Seq data, so it only works in single-end mode (paired-end reads often suffer from chimaera problems...)

--yacht adds three additional columns to the standard methylation call files:

<read start> <read end> <read orientation>

For forward reads (+ orientation) the start position is the left-most position wheras for reverse reads (- orientation) it is the rightmost position.

Changed FindBin qw($Bin) to FindBin qw($RealBin) so that symlinks are resolved before calling different modules.

NOMe_filtering

This script reads in methylation call files from the Bismark methylation extractor that contain additional information about the reads that methylation calls belonged to. It processes entire (single-end) reads and then filters calls for NOMe-Seq positions (nucleosome occupancy and methylome sequencing) where accessible DNA gets methylated in a GpC context:

 (i) filters CpGs to only output cytosines in A-CG and T-CG context
(ii) filters GC context to only report cytosines in GC-A, GC-C and GC-T context

Both of these measures aim to reduce unwanted biases, i.e. the influence of G-CG (intended) and C-CG (off-target) on endogenous CpG methylation, and the influence of CpG methylation on (the NOMe-Seq specific) GC context methylation.

The NOMe-Seq filtering output reports cytosines in CpG context only if they are in A-CG or T-CG context,
and cytosines in GC context only when the C is not in CpG context. The output file is tab-delimited and in
the following format (1-based coords):

<readID>  <chromosome>  <read start>  <read end>  <count methylated CpG>  <count non-methylated CpG>  <count methylated GC>  <count non-methylated GC>
HWI-D00436:298:C9KY4ANXX:1:1101:2035:2000_1:N:0:_ACAGTGGT 10 8517979 8518098 0 1 0 1
HWI-D00436:298:C9KY4ANXX:1:1101:5072:1993_1:N:0:_ACAGTGGT 8 9476630 9476748 0 0 0 2

coverage2cytosine

  • Fixed an issue in --merge_CpG mode caused by chromosomes ending in CG.

  • Fixed an issue caused by specifying --zero as well as --merge_CpG.

bam2nuc

  • Fixed an issue where the option --output_dir had been ignored.

filter_non_conversion

Removed help text indicating that this script also did the deduplication.

Downloads

v0.17.0 - Filter non-conversion, Documentation and convenience updates

@FelixKrueger FelixKrueger released this Jan 18, 2017 · 45 commits to master since this release

Bismark

  • The option --dovetail is now the default behaviour for paired-end Bowtie2 libraries to assist with
    alignments that have undergone 5'-end trimming. Can be disabled using the new option --no_dovetail.
  • Added time stamp to the Bismark run.
  • Chromosome names with leading spaces now cause Bismark to bail.
  • Fixed path handling for --multicore mode when --prefix had been specified as well.
  • Bismark now quits if the Bowties could not be executed properly.
  • Bails if supplied filenames do not exist.

Documentation

  • Added Overview of different library types and kits to the Bismark User Guide.
  • Also added documentation for Bismark modules bam2nuc, bismark2report, bismark2summary and filter_non_conversion.
  • Added a Markdown to HTML converter (make_docs.pl; thanks to Phil Ewels).

filter_non_conversion

  • Added a new script that allows filtering out of reads or read-pairs if the apparent non-CG methylation exceeds a certain threshold (3 by default). Optionally, the non-CG count may be forced to occur on consecutive non-CGs using the option --consecutive.
  • Added time stamp to filtering step.

bismark2bedGraph

  • For the creation of temporary files, we are now replacing / characters in the chromosome names with _ (underscores), similar to | (pipe) characters, as these / would attempt to write files to non-existing directories.

deduplicate_bismark

  • Single-/paired-end detection now also accepts --1 or --2.
  • Added EOF or truncation detection, causing the deduplicator to die.

bismark_methylation_extractor

  • Single-/paired-end detection now also accepts --1 or --2.
  • Added EOF or truncation detection, causing the methylation extractor to die.
  • Addded fatal ID1/ID2 check to paired-end extraction so that files which went out-of-sync at a later stage do not complete silently (but incorrectly!)

bismark2report

  • Major refactoring of bismark2report, the output should look the same though. Massive thanks to Phil Ewels for this.

coverage2cytosine

  • Added a new option --NOMe-seq to filter nucleosome occupancy and methylome sequencing (NOMe-Seq) data where accessible DNA gets enzymatically methylated in a GpC context. The option --NOMe-seq:
     i) filters the genome-wide CpG-report to only output cytosines in ACG and TCG context
    ii) filters the GC context output to only report cytosines in GCA, GCC and GCT context

Both of these measures aim to reduce unwanted biases, namely the influence of GCG and CCG on endogenous CpG methylation, and the inlfluence of CpG methylation on (the NOMe-Seq specific) GC context methylation. PLEASE NOTE that NOMe-Seq data requires a .cov.gz file as input which has been generated in non-CG mode (--CX).

bismark_genome_preparation

  • Fixed a bug that arose when --genomic_composition was specified (now moving back to the genome directory for in silico conversion).

Downloads

0.16.3 - Additional bug fix for ambiguous Bowtie 2 alignments

@FelixKrueger FelixKrueger released this Jul 25, 2016 · 137 commits to master since this release

Bismark

  • Essential: Fixed another bug where a subset of ambiguous Bowtie 2 alignments where considered unique even though
    they had been ambiguous in a different thread before, e.g.:
Read 1: AS:i:0 XS:i:0
Read 2: AS:i:0

In such cases the 'ambiguous within thread' variable is now only reset if the second alignment is truly better. This also affects the ambig.bam output.

  • Added support for large Bowtie (1) index files ending in .ebwtl which had been added in Bowtie v1.1.0.

Downloads

v0.16.2 - Includes essential bug fix for Bowtie 2 alignments

@FelixKrueger FelixKrueger released this Jul 19, 2016 · 141 commits to master since this release

  • Changed the Shebang in all scripts of the Bismark suite to #!/usr/bin/env perl instead of
    #!/usr/bin/perl

Bismark

  • Essential: Fixed a bug for Bowtie 2 alignments where reads that should be considered ambiguous were incorrectly assigned to the first alignment thread. This error had crept in during the 'changing the behavior of
    corner cases' in v0.16.0). Thanks to John Gaspar for spotting this!

deduplicate_bismark

  • Does now bail with a useful error message when the input files are empty.

bismark_genome_preparation

  • Added new option --genomic_composition so that the genomic composition can be calculated and written right at the genome preparation stage rather than by using bam2nuc

bam2nuc

  • Now also calculates a fold coverage for the various (di-)nucleotides. The changes in the nucleotide_stats text file are also picked up and plotted by bismark2report
  • Added a new option --genomic_composition_only to just process the genomic sequence without requiring any data files

bismark2summary

  • Added option -o/--basename <filename> to specify a certain filename. If not specified the name will
    remain bismark_summary_report.txt/html
  • Added documentation and the options --help and --version to be consistent with the rest of Bismark
  • Added option --title <string> to give the HTML report a different title

Downloads

0.16.1

@FelixKrueger FelixKrueger released this Apr 25, 2016 · 156 commits to master since this release

Bismark


  • Removed a rogue warn/sleep statement to check the resetting of best alignment scores for paired-end/Bowtie2 alignments which would obviously slow alignments down massively. Sorry for that.

Downloads

v0.16.0

@FelixKrueger FelixKrueger released this Apr 20, 2016 · 157 commits to master since this release

Bismark


  • File endings .fastq | .fq | .fastq.gz | .fq.gz are now removed from the output file (unless they were specified with --basename) in a bid to reduce the length of the already long file names.
  • Enabled the new option --dovetail (which will be turned on by default for --pbat libraries) which will now allow dovetailing reads to be reported. For a more in-depth description see #14.
  • Changed the behaviour of corner cases to where several non-directional alignments could have existed for the very same position but to different strands so that now the best alignment trumps the weaker one. As an example: If you relaxed the alignment criteria of a given alignment to allow ~60 mismatches for PE alignment we did find an alignment to the OT strand with a combined AS of -324, but there also was an alignment to the CTOB strand with and AS of 0 (perfect alignment). The CTOB now trumps the OT alignment, and the methylation information information is now reported for the bottom strand. Credits go to Sylvain Foret (ANU, Canberra) for bringing this to our attention!

New module: bismark2summary


Bismark summary

New module: bam2nuc


  • The new Bismark module bam2nuc calculcates the average mono- and di-nucleotide coverage of libraries and compares this to the genomic average composition. bam2nuc can be called straight from within Bismark (option --nucleotide_coverage) or run stand-alone. bam2nuc creates a ...nucleotide_stats.txt file that is also automatically detected by bismark2report and incorporated into the HTML report.
    (di-)nucleotide coverage

bismark2_sitrep.tpl


  • Removed an extra function call in bismark_sitrep.tpl so that the M-bias 2 plot is drawn once the M-bias 1 plot has finished drawing (parallel processing could with certain browsers and data may have resulted in a white spaceholder only).

methylation extractor


  • Altering the file path handling of coverage2cytosine and bismark2bedGraph also required some changes in the methylation extractor.

bismark2bedGraph


  • Input file path handling has been completely reworked. The output file which can be specified as -o output.bedGraph now has to be a single file name and mustn't contain any path information. A particular output folder may be specified with -dir /any/path/.
  • Addressing the file path handling issue also fixed a similar issue with the option --remove_spaces when -o had been specified.

coverage2cytosine


  • Changed zcat for gunzip -c when reading a gzipped coverage file. This should avoid some Mac platforms crashing because zcat invariably requires a file to end in the .Z (which it doesn't...)
  • Changed the way in which the coverage input file is handed over from the methylation_extractor
    to coverage2cytosine (previously the path information might have been part of the file name, but
    instead it will now be only part of the -dir output_directory option.

Downloads

v0.15.0

@FelixKrueger FelixKrueger released this Jan 14, 2016 · 199 commits to master since this release

Bismark


  • Added option --se/--single_end <list>. This sets single-end mapping mode explicitly giving a
    list of file names as <list>. The filenames may be provided as a comma , or colon :-separated
    list.
  • Added option --genome_folder <path/to/genome> as alternative to supplying the genome as the
    first argument.
  • Added an option --rg_tag to print an @RG header line as well as and RG:Z: tag to each read.
    The ID and SAMPLE fields default to 'SAMPLE', but can be specified manually with --rg_id or
    --rg_sample.
  • Added new option --ambig_bam for Bowtie2-mode only, which writes out a single alignment for
    sequences with multiple alignments to a special file ending in .ambiguous.bam. The alignments
    are in Bowtie2 format and do not any contain Bismark specific entries such as the methylation
    call etc. These ambiguous BAM files are intended to be used as coverage estimators for variant
    callers. Works for single-end and paired-end alignments in single or multi-core mode.
  • Added the new options --cram and --cram_ref to Bismark for both paired- and single-end alignments
    in single or multi-core mode. This option requires Samtools version 1.2 or higher. A genome
    FastA reference may be supplied as a single file with the option --cram_ref; if this is not
    specified the file is derived from the reference FastA file(s) used for the Bismark run, and written
    to the file Bismark_genome_CRAM_reference.mfa into the output directory.

deduplicate_bismark


  • Added better handling of cases when the input file was empty (died for percentage calculation
    instead of calling it N/A)
  • Added a note mentioning that Read1 and Read2 of paired-end files are expected to follow each
    other in two consecutive lines and possibly require name-sorting prior to deduplication. Also
    added a check that reads the first 100000 lines to see if the file appears to have been sorted
    and bail out if this is true.

methylation extractor


  • Added support for CRAM files (this option requires Samtools version 1.2 or higher)

bismark2bedGraph


  • Changed the way gzip compressed input files are handled when using the UNIX sort command (i.e. with
    --scaffolds/--gazillion or without --ample_memory

coverage2cytosine


  • Added option --gzip to compress output files. This currently only works for the default CpG_report
    and CX_report output files (and thus not with the option --gc or --split_files. The option --gzip
    is now also passed on from the bismark_methylation_extractor.
  • Added a check to bail if no information was found in the coverage file, e.g. if a wrong file path for a .cov.gz file had been specified

bismark_genome_preparation


  • Added process handling to the child processes.

Downloads

Bismark v0.14.5

@FelixKrueger FelixKrueger released this Nov 8, 2015 · 218 commits to master since this release

20-08-2015: 0.14.5 released - minor fix

  • deduplicate_bismark: Changed all instances of literal calls of samtools calls to $samtools_path

Downloads