Skip to content

Salmon v0.6.0

Compare
Choose a tag to compare
@rob-p rob-p released this 01 Jan 20:16
· 1763 commits to master since this release

This is a fairly major new release of Salmon (thus the major version bump). It includes some new features and makes minor but backward-incompatible changes to the output format. Many of these changes track the latest changes to Sailfish.

Note for OSX binary:

If you receive a message that a library cannot be found (i.e. if you run into an @rpath issue), try running Salmon using the following command:

$ DYLD_FALLBACK_LIBRARY_PATH=<PATH_TO_SALMON>/lib <PATH_TO_SALMON>/bin/salmon

If this works, you can add the library path to the DYLD_FALLBACK_LIBRARY_PATH variable automatically by placing the line:

export DYLD_FALLBACK_LIBRARY_PATH=<PATH_TO_SALMON>/lib <PATH_TO_SALMON>/bin/salmon:$DYLD_FALLBACK_LIBRARY_PATH

in your ~/.profile file.

Major Changes

  • Default index --- The quasi index has been made the default type. This means that it is no longer necessary to provide the --type option to the index command. The fmd index remains enabled, but may be removed in a future version. We urge you to move over to the quasi index if you are not already using it.
  • Sequence-specific bias correction --- The old bias correction methodology has been removed from Salmon and replaced with a new sequence-specific bias correction model. Bias correction is enabled with the --biasCorrect flag. The new model has numerous benefits over the old. First, it should more accurately correct for sequence specific biases, leading to better estimates in biased samples. Second, it should not suffer from the same pathological "over-correction" failure cases of the old model --- if there is no substantial bias in the sample, it should have only a minimal effect on quantification results.
  • New output format --- The new output format adds another column, EffectiveLength, to the output which records the effective length of each transcript. This is the third column, and the TPM and NumReads columns have both been shifted by 1. Also, the quant.sf output file has been simplified and now contains no comment lines. The first row in the file is an (un-commented) header that lists the column names, and the subsequent rows are the quantification estimates.
  • Information about the command used --- Since the comment lines have been removed from the quant.sf file, this information (and more), which can sometimes be useful, has been output to other locations. There is a JSON formatted file in the top-level output directory called cmd_info.json. This contains a JSON structure with the relevant command line parameters (which used to appear in the quant.sf comments).
  • Meta-information about the run --- Quite a bit of useful information appears in the file aux/meta_info.json under the main quantification directory. This records information such as the number of reads processed, the number mapped, the percentage mapped, which type of posterior sampling (e.g. Gibbs / bootstrap), if any, was performed.
  • Auxiliary parameters from the run --- In addition to the meta_info.json file, the aux/ directory of the main quantification directory contains other useful files. Specifically, it contains gzipped, binary, data for any bootstrap or Gibbs samples that were generated, and gzipped binary data about the fragment length distribution and bias parameters (the latter is only meaningful if bias-correction was performed).

Minor Changes

  • Position specific start distribution --- Modeling of the position-specific start distribution has been improved, and the way that it is enabled / disabled has been changed. This model is off by default, but is enabled with the --useFSPD.

Bug Fixes

  • This release fixes a bug where the mapping location of a fragment may have been miscalculated by a small number of bases in certain cases. This in turn could lead to a small shift in the fragment length distribution and in the resulting quantification estimates.

Acknowledgements

  • Special thanks go to Ayush Sengupta for helping out with the implementation of sequence-specific bias correction.
  • Special thanks go to Mike Love for testing the effectiveness of the sequence-specific bias correction implementation (in Sailfish, but this uses the same model) on some experimental (GEUVADIS) data!

Note

As you may note, there are two DebianSqueeze binaries listed below. The binary called SalmonBeta-0.6.0_DebianSqueeze.tar.gz is the "standard" binary, which is built to use the JEMalloc memory allocator. In certain situations (involving files on NFS) this allocator has been observed to segfault upon program termination. This doesn't seem to affect the results, which have already been written by the time this occurs. However, if you encounter this problem, you can try SalmonBeta-0.6.0_DebianSqueeze_tcmalloc.tar.gz, which is built to use the TCMalloc memory allocator instead; which doesn't seem to suffer from this same issue.