Skip to content
BUS format of a number of 10x datasets
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
inst
man
vignettes
.Rbuildignore
.gitignore
DESCRIPTION
LICENSE
NAMESPACE
NEWS
README.md

README.md

TENxBUSData

This package is a thin wrapper to download 5 differrent 10x datasets in the BUS format, from within R. For each dataset, the following files will be downloaded:

  1. output.sorted.txt: information of transcripts compatible with each UMI for each cell barcode in text format
  2. output.sorted: binary format of output.sortedd.txt
  3. matrix.ec: transcript equivalence classes in this dataset
  4. transcripts.txt: transcripts in the transcriptome index, used in kallisto when generating the bus format

These files should be sufficient to generate a sparse matrix with the package BUSpaRse. See these notebooks for how these files were generated using kallisto and bustools and how we can generate a sparse matrix from these files.

The main purpose of this package, and the package BUSpaRse, is for advanced users to experiment with different ways to collapse UMIs mapped to multiple genes, to error correct barcode, or to adapt the BUS format for other purposes. The most recent version of kallisto and bustools should suffice to generate the gene count matrix from FASTQ files. You may also do so with BUSpaRse, but it's less efficient than using bustools. However, it's easier to tweak code from BUSpaRse than that from bustools for experimentation because R and Rcpp are easier to work with than pure C++.

You can’t perform that action at this time.