New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write_matrix_dir does not write rownames and colnames of a matrix transformed from dgCMatrix #29
Comments
Also, is there any way to change the rownames and colnames of an IterableMatrix? |
Here is my experience. As a result of all these, when I savedRDS of a SeuratV5 object, the RNA assay in IterableMatrix was also saved to the RDS file, which took so long, especially for SeuratV5 object with multiple layers. Every layer was saved by saving the whole IterableMatrix. An object taking up 30GB RAM ended with taking up more than 100GB disk space. I had to shut down and delete the RDS file. I think all of these could be saved by successful writing the original matrix with rownames and colnames. Thanks for any help! |
Row names: Xkr4, Rp1 ... 4933409K07Rik Data type: double Queued Operations:
Row names: unknown names Data type: double Queued Operations:
Row names: Feature1, Feature2 ... Feature41029 Data type: double Queued Operations:
|
BPCells uses Code example reading and changing row/col nameslibrary(BPCells)
x <- matrix(1:12, nrow=3)
rownames(x) <- paste0("row", seq_len(nrow(x)))
colnames(x) <- paste0("col", seq_len(ncol(x)))
x
# col1 col2 col3 col4
# row1 1 4 7 10
# row2 2 5 8 11
# row3 3 6 9 12
x_sparse <- as(x, "dgCMatrix")
x_sparse
# 3 x 4 sparse Matrix of class "dgCMatrix"
# col1 col2 col3 col4
# row1 1 4 7 10
# row2 2 5 8 11
# row3 3 6 9 12
x_bpcells <- as(x_sparse, "IterableMatrix")
x_bpcells
# 3 x 4 IterableMatrix object with class Iterable_dgCMatrix_wrapper
# Row names: row1, row2, row3
# Col names: col1, col2 ... col4
# Data type: double
# Storage order: column major
# Queued Operations:
# 1. Load dgCMatrix from memory
rownames(x_bpcells) <- paste0("newrow", seq_len(nrow(x)))
colnames(x_bpcells) <- paste0("newcol", seq_len(ncol(x)))
dir_path <- tempfile()
x_bpcells_dir <- write_matrix_dir(x_bpcells, dir_path)
# Warning: Matrix compression performs poorly with non-integers.
# • Consider calling convert_matrix_type if a compressed integer matrix is intended.
# This message is displayed once every 8 hours.
x_bpcells_dir2 <- open_matrix_dir(dir_path)
x_bpcells_dir2
# 3 x 4 IterableMatrix object with class MatrixDir
# Row names: newrow1, newrow2, newrow3
# Col names: newcol1, newcol2 ... newcol4
# Data type: double
# Storage order: column major
# Queued Operations:
# 1. Load compressed matrix from directory /tmp/RtmpNGXgHy/file23f2df836ad I hope that helps answer your question. The examples you provide are complicated by the fact that I don't have access to the same dataset, and you're also using Seurat objects as an intermediate. If you're still having issues, I'd encourage you to simplify your problem into a reproducible example that I can take a closer look at. EDIT: one additional note is that changing the row/col names of a BPCells disk-backed object does not alter the data on disk. If you want to save the new row/col names on disk, you'll need to write the matrix again, or change the row/col names prior to importing as a BPCells object |
I also got the same right result using a similar example you used (example in #23 actually). I think the key to the solution might be related tothis message: "3 x 4 IterableMatrix object with class Iterable_dgCMatrix_wrapper" (what I also got in my test) after transformed from dgCMatrix. However, in my bug code, I got "41029 x 646765 IterableMatrix object with class MatrixSubset" after transformed from dgCMatrix. |
Thank you very much for your example about how to change the row and col names! Actually, the raw.RDS was generated by SeuratV4 in the past. I think to reproduce this bug, you can use SeuratV4 on a small raw matrix. |
- Previously, changing the dimnames on a transformed matrix would not affect the dimnames when writing that matrix to disk. - Following a similar strategy to cell/chr renaming in fragments, where we add a new layer into the delayed operations - Also added an unrelated fix to properly export `merge_cells()`
Thanks for the bug report! You were right that the issue had to do with the MatrixSubset class (actually any transformation shared the problem). I've fixed this now so updated dimnames should be properly saved when you write a matrix. Please comment/reopen if this didn't actually solve your problem |
Thank you very very much for this quick response. |
Hello again,
And here is a reproductible example:
In the meantime, while this issue gets resolved, and with all due respect, is there any workaround you'd suggest? Please don't get me wrong, your package has been doing wonders for my job, I'd just really like to be able to move forward with my proyect |
Hi @Dario-Rocha, thanks for the very clear report with the reproducible example -- it made it quick to reproduce the bug on my end. I've fixed this issue in the current main branch, so you should be set for now. In the future, similar bugs having to do with dimnames failing to write to disk should be fixed by calling I appreciate you taking the time to bring up this issue -- it helps make BPCells better for everyone |
g = readRDS("raw.RDS") # previous Assay3 in dgCMatrix
g[["RNA"]] = as(g[["RNA"]], Class = "Assay5")
g[["RNA"]]$counts = as(g[["RNA"]]$counts, "IterableMatrix", strict = F)
write_matrix_dir(mat = g[["RNA"]]$counts, dir = "BPCell/counts", overwrite = T)
g = CreateSeuratObject(open_matrix_dir("BPCell/counts"))
g[["RNA"]]
the output of g[["RNA"]] is a matrix with rownames of Feature1 Feature2 Feature3 ... and colnames of Cells_1 Cell_2 ...
The text was updated successfully, but these errors were encountered: