Skip to content

Major updates:

  • STARsolo can perform counting of multi-gene (multi-mapping) reads with --soloMultiMappers EM [Uniform Rescue PropUnqiue] options.
  • PR #1163: SIMDe takes care of correct SIMD extensions based on -m g++ flag: compilation option CXXFLAGS_SIMD is preset to -mavx2, but can be to the desired target architecture. Many thanks to Michael R. Crusoe @mr-c, Evan Nemerson @nemequ and Steffen Möller @smoe !

New options and features:

  • New option: --soloUMIfiltering MultiGeneUMI_All to filter out all UMIs mapping to multiple genes (for uniquely mapping reads)
  • New script extras/scripts/calcUMIperCell.awk to calculate total number of UMIs per cell and filtering status from STARsolo matrix.mtx
  • New option: --outSJtype None to omit outputting splice junctions to SJ.out.tab
  • Simple script to convert BED spliced junctions (SJ.out.tab) to BED12 for UCSC display: extras/scripts/sjBED12.awk
  • PR #1164: SOURCE_DATE_EPOCH to make the build more reproducible
  • PR #1157: print STAR command line and version information to stdout

Changes in behavior:

  • Minor changes to statistics output (Features.csv and Summary.csv) to accomodate multimappers.
  • Modified option: ---limitIObufferSize now requires two numbers - separate sizes for input and output buffers

Bug fixes

  • PR #1156: clean opal/opal.o
  • Issue #1166: seg-fault for STARsolo --soloCBwhitelist None (no whitelist) with barcodes longer than 16b
  • Issue #1167: STARsolo CR/UR SAM tags are scrambled in TranscriptomeSAM file Aligned.toTranscriptome.out.bam. This bug appeared in 2.7.7a.
  • Issue #1177: Added file checks for the --inputBAMfile .
  • Issue #1180: Output the actual number of alignments in NH attributes even if --outSAMmultNmax is set to a smaller value.
  • Issue #1190: Allow GX/GN output for non-STARsolo runs.
  • Issue #1220: corrupt SAM/BAM files for --outFilterType BySJout. The bug was introduced with the chnages in 2.7.7a.
  • Issue #1211: scrambled CB tags in BAM output for --soloCBwhitelist None --soloFeatures Gene GeneFull.
  • Fixed a bug causing seg-faults with --clipAdapterType CellRanger4 option.
Assets 2

This release contains many major and minor STARsolo upgrades, bug fixes, and behavior changes.
STARsolo detailed description: https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md

Major new features:

  • --runMode soloCellFiltering option for cell filtering (calling) of the raw count matrix, without re-mapping
  • Input from SAM/BAM for STARsolo, with options --soloInputSAMattrBarcodeSeq and --soloInputSAMattrBarcodeQual to specify SAM tags for the barcode read sequence and qualities
  • --clipAdapterType CellRanger4 option for 5' TSO adapter and 3' polyA-tail clipping of the reads to better match CellRanger >= 4.0.0 mapping results
  • --soloBarcodeMate to support scRNA-seq protocols in which one of the paired-end mates contains both barcode sequence and cDNA (e.g. 10X 5' protocol)

New options:

  • --soloCellFilter EmptyDrops_CR option for cell filtering (calling) nearly identical to that of CellRanger 3 and 4
  • --readFilesSAMattrKeep to specify which SAM attributes from the input SAM to keep in the output
  • --soloUMIdedup 1M_Directional_UMItools option matching the "directional" method in UMI-tools Smith, Heger and Sudbery (Genome Research 2017)
  • --soloUMIdedup NoDedup option for counting reads per gene, i.e. no UMI deduplication
  • --soloUMIdedup 1MM_CR option for 1 mismatch UMI deduplication similar to CellRanger >= 3.0
  • --soloUMIfiltering MultiGeneUMI_CR option filters lower-count UMIs that map to more than one gene matching CellRanger >= 3.0
  • --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts options which allows 1MM multimatching to WL for barcodes with N-bases (to better match CellRanger >= 3.0)

Changes in behavior:

  • The UMI deduplication/correction specified in --soloUMIdedup is used for statistics output, filtering, and UB tag in BAM output.
  • If UMI or CB are not defined, the UB and CB tags in BAM output will contain "-" (instead of missing these tags).
  • For --soloUMIfiltering MultiGeneUMI option, the reads with multi-gene UMIs will have UB tag "-" in BAM output.
  • Different --soloUMIdedup counts, if requested, are recorded in separate .mtx files.
  • Cell-filtered Velocyto matrices are generated using Gene cell filtering.
  • Velocyto spliced/unspliced/ambiguous counts are reported in separate .mtx files.
  • Read clipping options --clip* now require specifying the values for all read mates, even if they are identical.

Bugfixes:

  • Issue #1107: fixed a bug causing seg-fault for --soloType SmartSeq with only one (pair of) fastq file(s)
  • Issue #1129: fixed an issue with short barcode sequences and --soloBarcodeReadLength 0
  • Issue #796: Fixed a problem with GX/GN tag output for --soloFeatures GeneFull option
  • PR: #1012: fix the bug with --soloCellFilter TopCells option
  • Fixed an issue that was causing slightly underestimated value of Q30 'Bases in RNA read' in Solo.out/Gene/Summary.csv
Assets 2

@alexdobin alexdobin released this Dec 28, 2020

Major new feature: STARconsensus: mapping RNA-seq reads to consensus genome.

  • Insert (consensus) variants from a VCF file into the reference genome at the genome generation step with --genomeTransformVCF Variants.vcf --genomeTransformType Haploid
  • Map to the transformed genome. Alignments (SAM/BAM) and spliced junctions (SJ.out.tab) can be transformed back to the original (reference) coordinates with --genomeTransformOutput SAM and/or SJ
  • More information: https://github.com/alexdobin/STAR/tree/master/docs/STARconsensus.md

Minor bug fixes:

  • Deprecated --genomeConsensusFile option. Please use --genomeTransformVCF and --genomeTransformType options instead.
  • Issue #1040: fixed a bug causing rare seg-faults for paired-end --soloType SmartSeq runs.
  • Issue #1071: fixed a bug that can cause a crash for STARsolo runs with a small number of cells.
Assets 2

@alexdobin alexdobin released this Sep 19, 2020

Major new feature:

Output multimapping chimeric alignments in BAM format using
--chimMultimapNmax N>1 --chimOutType WithinBAM --outSAMtype BAM Unsorted [and/or] SortedByCoordinate
Many thanks to Sebastian @suhrig who implemented this feature!
A more detailed description from Sebastian in PR #802.

Minor features and bug fixes:

  • Issue #1008: fixed the problem with Unmapped.out.mate? output for --soloType CB_samTagOut output.
  • PR # 1012: fixed the bug with --soloCellFiltering TopCells option.
  • Issue #786: fixed the bug causing the Different SJ motifs problem for overlapping mates.
  • Issue #945: GX/GN can be output for all --soloType, as well as for non-solo runs.
Assets 2

@alexdobin alexdobin released this Aug 17, 2020

Bug-fix release.

  • Issue #988: proceed reading from GTF after a warning that exon end is past chromosome end.
  • Issue #978: fixed corrupted transcriptInfo.tab in genome generation for cases where GTF file contains extra chromosomes not present in FASTA files.
  • Issue #945: otuput GX/GN for --soloFeatures GeneFull .
  • Implemented removal of control characters from the ends of input read lines, for compatibility with files pre-processed on Windows.
Assets 2

@alexdobin alexdobin released this Aug 1, 2020

Bug-fix release.

  • Issue #558: Fixed a bug that can cause a seg-fault in STARsolo run with paired-end reads that have protruding ends.
  • Issue #952: Increased the maximum allowed length of the SAM tags in the input SAM files.
  • Issue #955: fixed seg-fault-causing bug for --soloFeatures SJ option.
  • Issue #963: When reading GTF file, skip any exons that extend past the end of the chromosome, and give a warning.
  • Issue #965: output genome sizes with and without padding into Log.out.
  • Docker build: switched to debian:stable-slim in the Dockerfile.
  • --soloType CB_samTagOut now allows output of (uncorrected) UMI sequences and quality scores with SAM tags UR and UY.
  • Throw an error if FIFO file cannot be created on non-Linux partitions.
Assets 2

@alexdobin alexdobin released this Jun 16, 2020

STAR 2.7.5a 2020/06/16

Major new features:

  • Implemented STARsolo quantification for Smart-seq with --soloType SmartSeq option.
  • Implemented --readFilesManifest option to input a list of input read files.

Minor features and bug fixes:

  • Change in STARsolo SJ output behavior: junctions are output even if reads do not match genes.
  • Fixed a bug with solo SJ output for large genomes.
  • N-characters in --soloAdapterSequence are not counted as mismatches, allowing for multiple adapters (e.g. ddSeq).
  • SJ.out.tab is sym-linked as features.tsv for Solo SJ output.
  • Issue #882: 3rd field is now optional in Solo Gene features.tsv with --soloOutFormatFeaturesGeneField3.
  • Issue #883: Patch for FreeBSD in SharedMemory and Makefile improvements.
  • Issue #902: Fixed seg-fault for STARsolo CB/UB SAM attributes output with --soloFeatures GeneFull --outSAMunmapped Within options.
  • Issue #934: Fixed a problem with annotated junctions that was causing very rare seg-faults.
  • Issue #936: Throw an error if an empty whitelist is provided to STARsolo.
Assets 2

@alexdobin alexdobin released this Jun 1, 2020

Fixed multiple bugs and issues.

This version requires re-generation of the genome indexes

  • Fixed the long-standing seg-fault problem for small genomes.
  • Issue #784: Fixed a seg-fault in STARsolo for cases where no cell barcodes matched whitelist.
  • Issue #798: Fixed the problem in Solo Q30 Bases in Summary.csv average (#798).
  • Issue #843, #880: Throw an error if read file in --readFilesIn does not exist when using --readFilesCommand .
  • Issue #864: Fixed seg-fault for STARsolo runs with very small number of reads or cells.
  • Issue #881: Check if --genomeDir exists, create if necessary.
  • Issue #882: Added 3rd column "Gene Expression" to solo features.tsv file for better compatibility with downstream tools.
  • Issue #902: Fixed seg-fault for STARsolo CB/UB SAM attributes output with --soloFeatures GeneFull only option.
  • Issue #907: Fixed the bug that prevented output of STARsolo GX/GN tags into the Aligned.out.bam if --quantMode TranscriptomeSAM is used.
  • Issue #910: The output directory in --outFileNamePrefix is checked and created if it does not exist.
  • If solo barcode read length is not checked (--soloBarcodeReadLength 0) and it is shorter than CB+UMI length, the barcode is padded with Ns and not counted.
  • For genome generation runs, the Log.out file is moved into the --genomeDir directory.
  • Fixed a bug with solo SJ output for large genomes.
  • Implemented --seedMapMin option (previously hard-coded) to define minimum seed length.
Assets 2

@alexdobin alexdobin released this Oct 8, 2019

Major new STARsolo features:

  • Output enhancements:
    • Summary.csv statistics output for raw and filtered cells useful for quick run quality assessment.
    • --soloCellFilter option for basic filtering of the cells, similar to the methods used by CellRanger 2.2.x.
  • Better compatibility with CellRanger 3.x.x:
    • --soloUMIfiltering MultiGeneUMI option introduced in CellRanger 3.x.x for filtering UMI collisions between different genes.
    • --soloCBmatchWLtype 1MM_multi_pseudocounts option, introduced in CellRanger 3.x.x, which slightly changes the posterior probability calculation for CB with 1 mismatch.
  • Velocyto spliced/unspliced/ambiguous quantification:
    • --soloFeatures Velocyto option to produce Spliced, Unspliced, and Ambiguous counts similar to the velocyto.py tool developed by LaManno et al. This option is under active development and the results may change in the future versions.
  • Support for complex barcodes, e.g. inDrop:
    • Complex barcodes in STARsolo with --soloType CB_UMI_Complex, --soloCBmatchWLtype --soloAdapterSequence, --soloAdapterMismatchesNmax, --soloCBposition,--soloUMIposition
  • BAM tags:
    • CB/UB for corrected CellBarcode/UMI
    • GX/GN for gene ID/name
  • STARsolo most up-to-date documentation.
Assets 2

@alexdobin alexdobin released this Oct 4, 2019

  • Fixed the problem with no header in Chimeric.out.sam
Assets 2