Skip to content

@FelixKrueger FelixKrueger released this Nov 19, 2019 · 14 commits to master since this release

Bismark

  • Accepted pull request to fix the MAPQ score calculation in local mode.

methylation_consistency

Assets 2
  • 0.22.2
  • f960b3a
  • Compare
    Choose a tag to compare
    Search for a tag
  • 0.22.2
  • f960b3a
  • Compare
    Choose a tag to compare
    Search for a tag

@FelixKrueger FelixKrueger released this Oct 16, 2019 · 27 commits to master since this release

  • Added FAQ document for questions that keep coming up. Will be populated over time.

Bismark

  • the option --non_bs_mm is now only allowed in end-to-end mode

  • Fixed the calculation of non bisulfite mismatches for paired-end data which happened correctly only when R2 had an InDel (see here)

  • When the option -u was used in conjunction with --parallel, only -u sequences will be written to the temporary subset files for each spawn of Bismark (previously, the entire file was split for --parallel, but then only a small subset of those files was used for -u, which resulted in very long runs even for a small number of analysed sequences)

deduplicate_bismark

  • the command deduplicate_bismark *bam now works again. Previously the output file names were accidentally all derived from the first supplied file.

coverage2cytosine

  • Added new option --coverage_threshold INT. Positions have to be covered by at least INT calls (irrespective of their methylation state) before they get reported. For NOMe-seq, the minimum threshold is automatically set to 1 unless specified explicitly. Setting a coverage threshold does not work in conjunction with --merge_CpGs (as all genomix CpGs are required for this). Default: 0 (i.e. all genomic positions get reported)

bismark2report

  • added seconds to the timestamp report statement (which caused a warning on certain, but not all, platforms)

bismark2summary

  • Now reads splitting reports even for non-deduplicated files (such as RRBS).
Assets 2

@FelixKrueger FelixKrueger released this Apr 21, 2019 · 66 commits to master since this release

Bismark

  • Hot-fixed (read: removed) the cause of delay during the MD:Z: field computation for reads containing a deletion (which was roughly equal to 1 second per read). Apologies, I did it again...

  • Changed the default --score_min function for HISAT2 in --local mode back to a linear function (instead of using the logarithmic model that is employed by Bowtie 2). The default is now --score_min L,0,-0.2 for both end-to-end (default) and --local mode. It should be mentioned that we currently don't understand how exactly the scoring mode in HISAT2 works (even though the scores appear to be all negative with a maximum value of 0), so this might change somewhat in the future. See here for more info.

Assets 2

@FelixKrueger FelixKrueger released this Apr 16, 2019 · 76 commits to master since this release

Expanding on our observation that single-cell BS-seq, or PBAT libraries in general, can generate chimeric read pairs, a recent publication by Wu et al. described in further detail that intra-fragment chimeras can hinder the efficient alignment of single-cell BS-seq libraries. In there, the authors described a pipeline that uses paired-end alignments first, followed by a second, single-end alignment step that uses local alignments in a bid to improve the mapping of intra-molecular chimeras. To allow this type of improvement for single-cell or PBAT libraries, we have been experimenting with allowing local alignments.

Please note that we still do not recommend using local alignments as a means to magically increase mapping efficiencies (please see here), but we do acknowledge that PBAT/scBSs-seq/scNMT-seq are exceptional applications where local alignments might indeed make a difference (there is only so much data to be had from a single cell...).
We didn't have the time yet to set more appropriate or stringent default values for local alignments (suggestions welcome), nor did we investigate whether the methylation extraction will require an additional --ignore flag if a read was found to the be soft-clipped (the so called 'micro-homology domains'). This might be added in the near future.

Bismark

  • Added support for local alignments by introducing the new option --local. This means that the CIGAR operation S (soft-clipping) is now supported

  • fixed typo in option --path_to_bowtie2 (a single missing 2 was preventing the specified path to be accepted)

  • fixed typo in option --no-spliced-alignment in HISAT2 mode

  • fixed missing end-of-line character for unmapped or ambiguous FastQ sequences in paired-end FastQ mode

  • fixed output file naming in --hisat2 and --parallel mode (_hisat2 was missing in --parallel mode). Thanks to @phue for spotting this.

bismark_genome_preparation

  • Added option --large-index to force the generation of LARGE genome indexes. This may be required for indexing extremely large genomes (e.g. the Axolotl (32 GigaBases)) in --parallel mode. For more information on why the indexing was failing previously see here

bismark_methylation_extractor

  • Now supporting reads containing soft-clipped bases (CIGAR operation S)

bam2nuc

  • Now supporting reads containing soft-clipped bases (CIGAR operation S)

deduplicate_bismark

  • Now supporting reads containing soft-clipped bases (CIGAR operation S)
Assets 2

@FelixKrueger FelixKrueger released this Mar 14, 2019 · 102 commits to master since this release

For the upcoming version Bismark has undergone some substantial changes, which sometimes affect more than one module within the Bismark suite. Here is a short description of the major changes:

[Retired]: Bowtie 1 support

  • Bowtie (1) support, and all of its options, has been completely dropped from bismark_genome_preparation and bismark. This decision was not made lightly, but it seems no one is using the original Bowtie short read aligner anymore, even short reads have moved on...
  • Consequently, the option --vanilla and its handling has been removed from a number of modules (bismark_genome_preparation, bismark, bismark_methylation_extractor and deduplicate_bismark). Too bad, I liked that name...

[Added]: HISAT2 support

  • Instead, the DNA and RNA aligner HISAT2 has been added as a new choice of aligner. The reason for this is not necessarily that RNA methylation is now a thing, but certain alignment modes (see below) do require splice-aware mapping if we don't want to miss out on a whole class of (spliced) alignments. Bowtie 2 is the default mode, HISAT2 alignments can be enabled with the option --hisat2

  • Similar to the Bowtie2 mode, alignments with HISAT2 are restricted to global (end-to-end) alignments, i.e. soft-clipping is disabled. Furthermore, in paired-end mode, the options --no-mixed and --no-discordant are permanently enabled, meaning that only properly aligned read pairs are put out.

  • As the --hisat2 mode supports spliced alignments, the new CIGAR operation N is now supported in all Bismark modules (this includes bismark_genome_preparation, bismark, bismark_methylation_extractor, deduplicate_bismark and some others).

At the time of writing this, the --hisat2 mode appears to be working as expected. It should be mentioned however that we have not done a lot of testing of these new files, so comments and feedback are welcome.

SLAM-seq mode

We also added a new, experimental and completely different type of alignment for SLAM-seq type data (option --slam). This fairly recent method to interrogate newly synthesized messenger RNA is akin to bisulfite conversion, in that newly synthesized RNA may contain T to C conversions following an alkylation reaction (original publication and https://www.nature.com/articles/nmeth.4435). The new Bismark alignment mode --slam performs T>C conversions of both the genome (in the genome preparation step) and the subsequent alignment steps (Bismark alignment step). Currently, the rest of the processing of SLAM-seq data hijacks the standard methylation pipeline:

  • T>C conversions are written out as methylation events in CpG context, while T-T matches are scored as unmethylated events in CpG context. Other cytosine contexts are not being used.

So in a nut-shell: methylation calls in --slam mode are either Ts (unmethylated calls = matches at T positions), or T to C mismatches (methylated calls = C mismatches at T positions).

It should be noted that this is currently an experimental workflow. One might argue that T/C conversion aware (or T/C mis-mapping agnostic) mapping is currently not necessary for SLAM-seq, NASC-Seq, or scSLAM-seq data as the labeling reaction is very inefficient (1 in only 50 to 200 newly incorporated Ts is a 4sU, which may get alkylated). This might be true - for now. If and when the conversion reaction improves over time, C/T agnostic mapping, similar to bisulfite-Seq data, might very well become necessary.

Here is a screenshot of a comparison of aligning the same data (SLAM-seq-like) with Bismark in Bowtie 2 mode (top track) and HISAT2 mode (middle track). Alignments with HISAT2 recover a lot more alignments to short exons, as well as exon-exon spanning reads (evidenced in bottom track):

Bowtie2 HISAT2 aligment comparison

  • Added documentation for NOMe-seq or scNMT-seq processing.

bismark

  • Dropped support for Bowtie

  • Removed all traces of --vanilla

  • Added support for HISAT2 with option --hisat2.

  • Added HISAT2 option --no-spliced-aligments to disable spliced alignments altogether

  • Added HISAT2 option --known-splicesite-infile <path> to provide a list of known splice sites.

  • Added option --slam to allow T/C mismatch agnostic mapping (3-letter alignment). More here.

  • Added a new option --icpc to truncate read IDs at the first space (or tab) it encounters in the (FastQ) read ID, which are sometimes used to add comments to a FastQ entry (instead of replacing them with underscores which is the default behaviour).

bismark_genome_preparation

  • Dropped support for Bowtie

  • Added support for HISAT2 with option --hisat2.

  • Added option --slam. Instead of performing an in-silico bisulfite conversion, this mode transforms T to C (forward strand), or A to G (reverse strand). The folder structure and rest of the indexing process is currently exactly the same as for bisulfite sequences, but this might change at some point. This means that a genome prepared in --slam mode is currently indistinguishable from a true Bisulfite Genome (until the alignments are in) so please make sure you name the genome folder appropriately to avoid confusion.

deduplicate_bismark

  • Removed all traces of --vanilla

  • --bam mode is now the default. Uncompressed SAM output may still be obtained using the new option --sam

  • Added new option -o/--outfile <basename>. This basename is then modified to remove file endings such as .bam, .sam, .txt or .gz, and .deduplicated.bam, or .multiple.deduplicated.bam in --multiple mode, is then appended for consistency reasons.

  • Added support for new CIGAR operation N

bismark_methylation_extractor

  • Added support for new CIGAR operation N for all extraction modes

  • Removed all traces of --vanilla

bismark2summary/bismark2report

  • Adapted to work with Bismark HISAT2 reports instead of Bowtie 1 reports.

bam2nuc

  • Reads containing spliced reads are now also skipped when determining the genomic base composition (as are reads with InDels).
Assets 2

@FelixKrueger FelixKrueger released this Feb 1, 2019 · 168 commits to master since this release

This is an early notice that this will be the last release of Bismark that supports the use of Bowtie 1. We have added warning statements to both the genome preparation and alignment steps to warn users that Bowtie1 is now deprecated. All Bowtie 1 functionality and support will disappear in a future release. Please shout now if you think this will be a disaster for you...

bismark

  • Added check to prevent users from inadvertently specifying the very same file as both R1 and R2

  • Added a check for file truncation, or more generally the same number of reads between R1 and R2 for paired-end FastQ files (directional, non-directional and PBAT mode).

  • Added Travis CI testing for most Bismark modules and commands. This should help spotting problems a early, e.g. if I release a new version right before the Christmas holidays ...

  • Changed error message for failed fork command in --parallel mode to [FATAL ERROR]: ... to alert users that something isn't working as intended.

bismark_genome_preparation

  • Added multi-threading to the Bowtie2-based genome preparation (thanks to Rahul Karnik)

  • Added test to see whether specified files exist, or die otherwise

bismark2summary

  • Fixed division by zero errors when a C-context was not covered by any reads. This will now use values of 0/0 for the context plots, which looks a bit odd, but at least it still works.

  • Detects if (non-deduplicated) RRBS and WGBS samples are mixed together, and bails with a meaningful error message.

bam2nuc

  • Changed samtools to $samtools_path during single-end/paired-end file testing.

bismark_methylation_extractor

  • Changed the order in which --ample_mem and --buffer_size are checked.
Assets 2

@FelixKrueger FelixKrueger released this Apr 26, 2018 · 246 commits to master since this release

bismark_methylation_extractor

  • The methylation extractor now creates output directories if they don't exist already.

  • The options --ample_mem and --buffer_size <string> are now mutually exclusive.

  • Changed the directory being passed on when --cytosine_report is specified from parent directory' to 'output directory'.

bismark2report

  • Major rewrite of bismark2report: HTML file are now rendered using Plotly.js [plotly.js v1.39.4] which is completely open source and free to use. Highcharts and JQuery were dropped, as was raised here: #177.
    The files bioinfo.logo, bismark.logo, plot.ly and plotly_template.tpl are read in dynamically from a new folder plotly. bismark_sitrep and all its contents no longer ship with Bismark. The Bismark HTML reports should be completely self-contained, here is an example paired-end Bismark report.

bismark2summary

  • Major rewrite of bismark2summary: HTML file are now rendered using Plotly.js [plotly.js v1.39.4] which is completely open source and free to use. Highcharts and JQuery were dropped, as was raised here: #177. The files bioinfo.logo, bismark.logo, plot.ly and plotly_template.tpl are read in dynamically from a new folder plotly. bismark_sitrep and all its contents no longer ship with Bismark. The Bismark HTML Summary reports should be completely self-contained, here is an example of a percent alignment plot for a single cell experiment:
    Alignment Summary Report scBS-seq.

And finally, here are some examples for a WGBS summary report, an RRBS report (no deduplication), and the full scBS-seq report and scBS-seq data file.

Assets 2
Apr 26, 2018
Preparing for v0.19.1 release. Accepting genome files in gz format as…
… well

@FelixKrueger FelixKrueger released this Oct 13, 2017 · 298 commits to master since this release

Bismark

  • Changed the methylation call behaviour so that insertions in a read (which are filled in with X for the methylation call) are also considered as Unknown context for the methylation call. Here is issue #135.

filter_non_conversion

  • Added new options --percentage_cutoff [int] and --minimum_count [int] to allow filtering reads for non-bisulfite conversion using an overall methylation percentage and count cutoff. Here is issue #122.

deduplicate_bismark

  • Added option --multiple to the deduplicator to treat several input SAM/BAM files as the same sample. Here is issue #107.

  • Added option --output_dir to deduplicate_bismark so that it can be used in the Google cloud. Here is issue #123

coverage2cytosine

  • Output files are now handled better and more consistently. Default processing now produces the following output files (with --gzip):
CpG_report.txt(.gz) or
CX_report.txt(.gz)
  • The option --NOMe-Seq now produces four output files (with --gzip):
NOMe.CpG_report.txt(.gz)
NOMe.CpG.cov(.gz)
NOMe.GpC_report.txt(.gz)
NOMe.GpC.cov(.gz)

The option --split_by_chromosome should work in either default, --gc or --NOMe-seq mode.

  • NOMe-Seq processing if now ignoring processing that were not covered by any reads.

  • Improved handling of the --output_dir, i.e. the folder will be created if it doesn't exist already and making the path absolute.

  • Added new option --discordance <int> to allow filtering for discordance pf top and bottom strand when in --merge_CpG mode. CpG positions for which either the top or bottom strand was not measured at all will not be assessed for discordance and hence appear in the regular 'merged_CpG_evidence.cov' file. More details in issue #91.

  • Fixed context extraction for Gs at positions 1 and 2 of a chromosome/contig. Also, last cytosine positions of not covered chromosomes are now ignored in the same way as for covered chromosomes issue #127

copy_files_for_release

  • Is now working from any location.
Assets 2

@FelixKrueger FelixKrueger released this Jun 28, 2017 · 335 commits to master since this release

Bismark

  • Changed the timing of when ambiguous within same thread alignments are reset. Previously some alignments were incorrectly considered ambiguous (see here). This affected Bowtie 2 alignments only.

bismark2bedGraph

  • The option --ample_mem is now mutually exclusive with specifying memory for the UNIX sort command via the option --buffer_size.
Assets 2
You can’t perform that action at this time.