export from tome to any other format? #29

maximilianh · 2019-12-04T13:01:57Z

Hi, we have a tome file that we need to process. Is there any function or way to get the data out of the .tome file in a standard format? Like .mtx, .h5, a .csv or .tsv file with the genes on the lines and the first column being the geneId (possibly the symbol separated by | or similar) ?

I can see that tome is very good at importing files, but I cannot see an export function...

thanks!
Max

hypercompetent · 2019-12-06T00:11:01Z

Hi Max,

.tome files are an HDF5-formatted sparse matrix format, so you should be able to extract the data back out to .h5 (following 10x conventions) or .mtx. Here are some examples in R (below).

If your target is Python, let me know. I think I have some code that may be able to go straight from .tome into a scipy sparse csc matrix.

For .mtx

library(rhdf5)
library(scrattch.io)

tome_file <- "//allen/programs/celltypes/workgroups/rnaseqanalysis/shiny/tomes/facs/mouse_V1_ALM_20170913/faster_transcrip.tome"

# Read the sparse matrix for exon counts
tome_matrix <- read_tome_dgCMatrix(tome_file,
                                   "/data/exon")

# Write to .mtx using the Matrix package
Matrix::writeMM(tome_matrix,
                "tome.mtx")

# Read the sample and gene names (row and column names, respectively)
sample_names <- h5read(tome_file, "/sample_names")
gene_names <- h5read(tome_file, "/gene_names")

# Write row and column names to .csv
write.csv(sample_names, "row_sample_names.csv")
write.csv(gene_names, "col_gene_names.csv")

There is a 10x .h5 output function (write_dgCMatrix_h5()), but it looks like it may be out of date with the current structure used by 10x. Here's a way to output to the current structure:

library(rhdf5)
library(scrattch.io)

tome_file <- "//allen/programs/celltypes/workgroups/rnaseqanalysis/shiny/tomes/facs/mouse_V1_ALM_20170913/faster_transcrip.tome"

# Read the sparse matrix for exon counts
tome_matrix <- read_tome_dgCMatrix(tome_file,
                                   "/data/exon")

# Transpose to match the orientation expected by 10x
tome_matrix <- Matrix::t(tome_matrix)

# Now sample_names correspond to columns, gene_names to rows
sample_names <- h5read(tome_file, "/sample_names")
gene_names <- h5read(tome_file, "/gene_names")

# Output data in .h5 locations
h5_file <- "path_to_your.h5"

# Build groups
h5createFile(h5_file)
h5createGroup(h5_file, "/matrix")
h5createGroup(h5_file, "/matrix/features")

# Create Datasets and write their sparse matrix components
h5createDataset(h5_file, dataset = "/matrix/data", dims = length(tome_matrix@x), chunk = 1000)
h5write(tome_matrix@x, h5_file, "/matrix/data")

h5createDataset(h5_file, dataset = "/matrix/indices", dims = length(tome_matrix@i), chunk = 1000)
h5write(tome_matrix@i, h5_file, "/matrix/indices")

h5createDataset(h5_file, dataset = "/matrix/indptr", dims = length(tome_matrix@p), chunk = 1000)
h5write(tome_matrix@p, h5_file, "/matrix/indptr")

# Add shape/dims and row and column names
h5write(dim(tome_matrix), h5_file, "/matrix/shape")

h5write(sample_names, h5_file, "/matrix/barcodes")
h5write(gene_names, h5_file,  "/matrix/features/id")

I wouldn't recommend a .csv or .tsv, as expanding these files out to full matrices instead of sparse formats can make the resulting files very large. However, there is a .csv export function, write_dgCMatrix_csv():

library(rhdf5)
library(scrattch.io)

tome_file <- "//allen/programs/celltypes/workgroups/rnaseqanalysis/shiny/tomes/facs/mouse_V1_ALM_20170913/faster_transcrip.tome"

# Read the sparse matrix for exon counts
tome_matrix <- read_tome_dgCMatrix(tome_file,
                                   "/data/exon")

# Transpose to genes as rows if that's what you'd like to use
tome_matrix <- Matrix::t(tome_matrix)

# Write to .csv
write_dgCMatrix_csv(tome_matrix,
                    "tome.csv",
                    col1_name = "geneId",
                    chunk_size = 1000)

maximilianh · 2020-01-28T16:43:45Z

Hi, would it be possible to provide these files in some other format, like .mtx? Getting all these packages to work on our server is painful, it requires a certain version of R and just for reading a few files, this seems a lot of work. Would you mind providing these files in some other format, besides your own file format? It would be very much appreciated and may help increase community uptake of your results...
many thanks!
Max

wuzhaoqi1015 · 2020-04-02T12:44:27Z

Hello, I got the following error when I used it. Could you tell me the reason?

sample_name <- read_tome_sample_names(tome)
Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) :
HDF5. File accessibilty. Unable to open file.
gene_name <- read_tome_gene_names(tome)
Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) :
HDF5. File accessibilty. Unable to open file.

adrisede · 2020-04-02T18:04:50Z

Hello wuzhaoqi1015,

From both of your inquiries, it seems like you might be missing to load the test dataset properly.

Try this:

library(scrattch.io)
library("rhdf5")

tome <- system.file("testdata/tome",
"transcrip.tome",
package = "scrattch.io")

wuzhaoqi1015 · 2020-04-03T08:35:06Z

Thank you for your reply to the previous message. I have some questions to ask.
1.I wanna get a matrix with row and column names. After extracting the sparse matrix, can I add column and row names as follows?
a<-read_tome_dgCMatrix(tome,"data/t_exon") # read exon b<-read_tome_dgCMatrix(tome,"data/t_intron") #read intron sample_name <- read_tome_sample_names(tome) gene_name <- read_tome_gene_names(tome) rownames(a)<-gene_name colnames(a)<-sample_name
2.I want to export this matrix to a file. But when I run “write_dgCMatrix_csv”, I get an error. What is the reason for it? Is it possible to use "as.matrix" and then "write.csv".
`write_dgCMatrix_csv(a, "filename", col1_name ="gene_names",chunk_size = 2000)

[1] "Writing rows 1 to 2000"

Error in data.frame(..., check.names = FALSE) :

arguments imply differing number of rows: 0, 2000`

maximilianh · 2020-04-03T11:20:45Z

Hi wuzhaoqi1015, be assured that everyone has the same problem as you. I think the authors should provide all expression matrices in a normal format that anyone can read, ideally as text-based csv files, one for the matrix, one for the meta data.

…

On Fri, Apr 3, 2020 at 10:35 AM wuzhaoqi1015 ***@***.***> wrote: Thank you for your reply to the previous message. I have some questions to ask. 1.I wanna get a matrix with row and column names. After extracting the sparse matrix, can I add column and row names as follows? a<-read_tome_dgCMatrix(tome,"data/t_exon") # read exon b<-read_tome_dgCMatrix(tome,"data/t_intron") #read intron sample_name <- read_tome_sample_names(tome) gene_name <- read_tome_gene_names(tome) rownames(a)<-gene_name colnames(a)<-sample_name 2.I want to export this matrix to a file. But when I run “write_dgCMatrix_csv”, I get an error. What is the reason for it? Is it possible to use "as.matrix" and then "write.csv". [image: image] <https://user-images.githubusercontent.com/52707572/78340662-10442880-75c9-11ea-988f-4cd4987f7044.png> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACL4TM67TZ7HXDKEQX7TTDRKWNUTANCNFSM4JVIJIZA> .

KaitlynPrice · 2020-04-10T18:45:59Z

I am also getting this error:

Thank you for your reply to the previous message. I have some questions to ask.
1.I wanna get a matrix with row and column names. After extracting the sparse matrix, can I add column and row names as follows?
a<-read_tome_dgCMatrix(tome,"data/t_exon") # read exon b<-read_tome_dgCMatrix(tome,"data/t_intron") #read intron sample_name <- read_tome_sample_names(tome) gene_name <- read_tome_gene_names(tome) rownames(a)<-gene_name colnames(a)<-sample_name
2.I want to export this matrix to a file. But when I run “write_dgCMatrix_csv”, I get an error. What is the reason for it? Is it possible to use "as.matrix" and then "write.csv".
`write_dgCMatrix_csv(a, "filename", col1_name ="gene_names",chunk_size = 2000)

[1] "Writing rows 1 to 2000"

Error in data.frame(..., check.names = FALSE) :

arguments imply differing number of rows: 0, 2000`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

export from tome to any other format? #29

export from tome to any other format? #29

maximilianh commented Dec 4, 2019

hypercompetent commented Dec 6, 2019

maximilianh commented Jan 28, 2020

wuzhaoqi1015 commented Apr 2, 2020 •

edited

Loading

adrisede commented Apr 2, 2020

wuzhaoqi1015 commented Apr 3, 2020 •

edited

Loading

maximilianh commented Apr 3, 2020 via email

KaitlynPrice commented Apr 10, 2020

export from tome to any other format? #29

export from tome to any other format? #29

Comments

maximilianh commented Dec 4, 2019

hypercompetent commented Dec 6, 2019

maximilianh commented Jan 28, 2020

wuzhaoqi1015 commented Apr 2, 2020 • edited Loading

Hello, I got the following error when I used it. Could you tell me the reason?

sample_name <- read_tome_sample_names(tome) Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) : HDF5. File accessibilty. Unable to open file. gene_name <- read_tome_gene_names(tome) Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) : HDF5. File accessibilty. Unable to open file.

adrisede commented Apr 2, 2020

wuzhaoqi1015 commented Apr 3, 2020 • edited Loading

maximilianh commented Apr 3, 2020 via email

KaitlynPrice commented Apr 10, 2020

wuzhaoqi1015 commented Apr 2, 2020 •

edited

Loading

sample_name <- read_tome_sample_names(tome)
Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) :
HDF5. File accessibilty. Unable to open file.
gene_name <- read_tome_gene_names(tome)
Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) :
HDF5. File accessibilty. Unable to open file.

wuzhaoqi1015 commented Apr 3, 2020 •

edited

Loading