Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Loading CellBender v3alpha H5 output in R #145

Closed
samuel-marsh opened this issue Aug 16, 2022 · 6 comments
Closed

Error Loading CellBender v3alpha H5 output in R #145

samuel-marsh opened this issue Aug 16, 2022 · 6 comments
Labels
docs Documentation
Milestone

Comments

@samuel-marsh
Copy link

Hi,

First thanks so much for developing and updating this awesome software! I have been using v2 on Terra for some time and recently saw v3alpha and decided to give that a go (Additional html report is awesome!). However, I'm running into issue loading the data in R using Seurat. Previously I could use Seurat's Read10X_h5 command. However, now when I do that (either with filtered or full .h5 files I get the following error:

test <- Read10X_h5("/PATH/SampleB_CB_out.h5", use.names = T)
Error in `[[.H5File`(infile, paste0(genome, "/data")) : 
  An object with name droplet_latents/data does not exist in this group

I saw a note in another issue about output of h5 file changing with v3 but didn't detail how. What is best way to load the v3alpha h5 files in R?

Thanks!
Sam

@samuel-marsh
Copy link
Author

Should also mention that using rhdf5 package also errors:

test <- h5read("/PATH/SampleB_CB_out_filtered.h5")
Error in H5Lexists(loc$H5Identifier, name) : 
  argument "name" is missing, with no default

@samuel-marsh
Copy link
Author

sorry just found updated tutorial doc detailing solution!
https://github.com/broadinstitute/CellBender/blob/sf_dev_0.3.0_postreg/docs/source/tutorial/remove_background/index.rst

@samuel-marsh
Copy link
Author

Hi @sjfleming

Just wanted to reopen to add note for you that it may be worthwhile in your tutorial to suggest users supply the --complevel flag in the PyTable ptrepack command. Otherwise if processing large number of large files the overall output file size increases substantially.

I just ran things with --complevel 5 and that appeared to be similar to original size output from Cell Bender.

# cell bender file
jupyter@7f2121d7448a:~/Cell_Bender_New$ ls -s
29172 SampleA_CB_10K_150_out_filtered.h5 

# no compression
ptrepack SampleA_CB_10K_out_filtered.h5:/matrix seurat_h5/SampleA_CB_CB_out_filtered_seurat.h5:/matrix
jupyter@7f2121d7448a:~/Cell_Bender_New$ ls -s
247648 SampleA_CB_CB_out_filtered_seurat.h5 

# compression level 5
jupyter@7f2121d7448a:~/Cell_Bender_New$ ptrepack --complevel 5 SampleA_CB_10K_out_filtered.h5:/matrix seurat_h5/SampleA_CB_CB_out_filtered_seurat_comp.h5:/matrix
jupyter@7f2121d7448a:~/Cell_Bender_New/seurat_h5$ ls -s
 24812 SampleA_CB_CB_out_filtered_seurat_comp.h5

Best,
Sam

@samuel-marsh samuel-marsh changed the title Error Loading CellBender v3alpha H5 outout in R Error Loading CellBender v3alpha H5 output in R Aug 16, 2022
@samuel-marsh
Copy link
Author

Hi,

As fyi, to simplify things on the R end for users I just wrote helper function for my package, scCustomize, to load the counts matrix from new CellBender outputs entirely within R without requiring PyTables. It's basically slimmed version of Seurat::Read10X_h5 that only reads the matrix.

https://github.com/samuel-marsh/scCustomize/blob/58529b4dc870860c64d5c96eb078f353d7c4ca2b/R/Read_%26_Write_Data.R#L1054

I may update function name and will add it to my other scCustomize CellBender functions vignette. Planning v0.8.0 release for scCustomize by end of the month at latest.

Thanks again,
Sam

@sjfleming
Copy link
Member

Hi @samuel-marsh , this is great advice about --complevel 5 in ptrepack: I had not noticed that the file size was increasing. I will definitely add this to the docs. Thanks!

And that sounds great about your scCustomize read function too. I don't know if it's of interest to you, but you could also try to get your changes merged into Seurat by making a pull request there. They actually did accept a change that I asked for in the past to enable cellbender (v1 or v2? now I forget) outputs to be read. They might accept something like this. It makes a whole lot of sense to me to make the data loader less "fragile".

Also I'm glad you found the v3-alpha workflow. Any other feedback about it? Did it run without error on your data? (I'm still working on getting a few more changes in before I make v0.3.0 official.)

@sjfleming sjfleming added the docs Documentation label Aug 19, 2022
@sjfleming sjfleming added this to the v0.3.0 milestone Aug 19, 2022
@samuel-marsh
Copy link
Author

Hey @sjfleming
Sorry for delayed response! Yes V3 alpha seemed to run without any issues (running on Terra)!

I think I will try and submit the primary function to SeuratWrappers in near future (that seems to be their rec for adding new import functions for other formats).

I’ve also created both multi directory and multi-file wrappers in scCustomize as well to read in files from whole experiment (stored in subdirectories or just within one directory) to make things easy for large experiments and add parallelization for additional speed (even though hdf5 files are already pretty quick).

Also added support for examining feature level changes within a dual assay Seurat object samuel-marsh/scCustomize#70 (comment) (matrix to matrix version coming) (side note it’s great to have this already included in new v3 html report). So this just provides easy wrapper to examine within R/Seurat.

Best,
Sam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation
Projects
remove-background
  
Awaiting triage
Development

No branches or pull requests

2 participants