Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't read mudata created with muon (python) #9

Closed
bio-la opened this issue Jul 28, 2022 · 7 comments
Closed

can't read mudata created with muon (python) #9

bio-la opened this issue Jul 28, 2022 · 7 comments
Labels
0.8 AnnData v0.8 spec

Comments

@bio-la
Copy link

bio-la commented Jul 28, 2022

Hello, thanks for working on interoperability between seurat and mudata!

I can't read a mudata that I created following your multimodal tutorial using ReadH5MU

test<-ReadH5MU("data_test.dir/pbmc_w3_teaseq.h5mu")
Error in dataset[[name]]$read() : attempt to apply non-function

I have no problems loading the object with muon in python

import muon as mu
mu.read_h5mu("data_test.dir/pbmc_w3_teaseq.h5mu")
MuData object with n_obs × n_vars = 5805 × 113187
  obs:  'sample', 'well', 'leiden_multiplex', 'leiden_mofa', 'leiden_wnn'
  var:  'highly_variable', 'gene_ids', 'feature_types', 'genome', 'interval'
  obsm: 'X_mofa', 'X_umap', 'X_wnn_umap'
  varm: 'LFs'
  obsp: 'mofa_connectivities', 'mofa_distances', 'wnn_connectivities', 'wnn_distances'
  3 modalities
    rna:        5805 x 16381
      obs:      'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden'
      var:      'gene_ids', 'feature_types', 'genome', 'interval', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
      uns:      'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap'
      obsm:     'X_pca', 'X_umap'
      varm:     'PCs'
      layers:   'lognorm'
      obsp:     'connectivities', 'distances'
    atac:       5805 x 96760
      obs:      'n_fragments', 'n_duplicate', 'n_mito', 'n_unique', 'altius_count', 'altius_frac', 'gene_bodies_count', 'gene_bodies_frac', 'peaks_count', 'peaks_frac', 'tss_count', 'tss_frac', 'barcodes', 'cell_name', 'well_id', 'chip_id', 'batch_id', 'pbmc_sample_id', 'DoubletScore', 'DoubletEnrichment', 'TSSEnrichment', 'n_genes_by_counts', 'total_counts', 'n_counts', 'leiden'
      var:      'gene_ids', 'feature_types', 'genome', 'interval', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
      uns:      'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap'
      obsm:     'X_pca', 'X_umap'
      varm:     'PCs'
      layers:   'counts', 'lognorm'
      obsp:     'connectivities', 'distances'
    prot:       5805 x 46
      obs:      'total_counts'
      var:      'highly_variable'
      uns:      'neighbors', 'pca', 'umap'
      obsm:     'X_pca', 'X_umap'
      varm:     'PCs'
      layers:   'counts'
      obsp:     'connectivities', 'distances'

I can explore the h5 but it breaks where the error says. it also seems to expect some attributes that I don't have in the mudata

h5 <- open_and_check_mudata("~/Documents/devel/data_test.dir/pbmc_w3_teaseq.h5mu")
metadata <- read_with_index(h5[["obs"]])
dataset = h5[['obs']]
dataset_attr <- tryCatch({
  h5attributes(dataset)
  }, error = function(e) {
  list("_index" = "_index")
  })
  indexcol <- "_index"
if ("_index" %in% names(dataset_attr)) {
  indexcol <- dataset_attr$`_index`
}
dataset_attr
columns <- names(dataset)
columns <- columns[columns != "__categories"]
columns

dataset[["sample"]]$read()

Error in dataset[[name]]$read() : attempt to apply non-function

values_attr <-h5attributes(dataset)
values_attr 
$`column-order`
[1] "sample" "well"  

$`_index`
[1] "_index"

$`encoding-type`
[1] "dataframe"

$`encoding-version`
[1] "0.2.0"

# so the following line will be NULL
# values_attr$categories

any suggestions?

thanks!

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bmcite.SeuratData_0.3.0 pbmc3k.SeuratData_3.1.4 SeuratData_0.2.2        hdf5r_1.3.5             MuDataSeurat_0.0.0.9000 magrittr_2.0.3          datapasta_3.1.0        
 [8] forcats_0.5.1           stringr_1.4.0           dplyr_1.0.8             purrr_0.3.4             readr_2.1.2             tidyr_1.2.0             tibble_3.1.6           
[15] ggplot2_3.3.5           tidyverse_1.3.1        

loaded via a namespace (and not attached):
  [1] readxl_1.4.0          backports_1.4.1       plyr_1.8.7            igraph_1.3.0          lazyeval_0.2.2        splines_4.1.2         listenv_0.8.0         scattermore_0.8      
  [9] digest_0.6.29         htmltools_0.5.2       fansi_1.0.3           tensor_1.5            cluster_2.1.3         ROCR_1.0-11           tzdb_0.3.0            remotes_2.4.2        
 [17] globals_0.14.0        modelr_0.1.8          matrixStats_0.62.0    spatstat.sparse_2.1-0 prettyunits_1.1.1     colorspace_2.0-3      rappdirs_0.3.3        rvest_1.0.2          
 [25] ggrepel_0.9.1         haven_2.4.3           callr_3.7.0           crayon_1.5.1          jsonlite_1.8.0        spatstat.data_2.1-4   survival_3.3-1        zoo_1.8-9            
 [33] glue_1.6.2            polyclip_1.10-0       gtable_0.3.0          leiden_0.3.9          clipr_0.8.0           pkgbuild_1.3.1        future.apply_1.8.1    abind_1.4-5          
 [41] scales_1.1.1          DBI_1.1.2             spatstat.random_2.2-0 miniUI_0.1.1.1        Rcpp_1.0.8.3          viridisLite_0.4.0     xtable_1.8-4          reticulate_1.24      
 [49] spatstat.core_2.4-2   bit_4.0.4             htmlwidgets_1.5.4     httr_1.4.2            anndata_0.7.5.3       RColorBrewer_1.1-3    ellipsis_0.3.2        Seurat_4.1.0         
 [57] ica_1.0-2             pkgconfig_2.0.3       uwot_0.1.11           dbplyr_2.1.1          deldir_1.0-6          utf8_1.2.2            tidyselect_1.1.2      rlang_1.0.2          
 [65] reshape2_1.4.4        later_1.3.0           munsell_0.5.0         cellranger_1.1.0      tools_4.1.2           cli_3.3.0             generics_0.1.2        broom_0.7.12         
 [73] ggridges_0.5.3        fastmap_1.1.0         goftest_1.2-3         processx_3.5.3        bit64_4.0.5           fs_1.5.2              fitdistrplus_1.1-8    RANN_2.6.1           
 [81] pbapply_1.5-0         future_1.24.0         nlme_3.1-157          mime_0.12             formatR_1.12          xml2_1.3.3            compiler_4.1.2        rstudioapi_0.13      
 [89] plotly_4.10.0         curl_4.3.2            png_0.1-7             spatstat.utils_2.3-0  reprex_2.0.1          stringi_1.7.6         ps_1.6.0              lattice_0.20-45      
 [97] Matrix_1.4-1          SeuratDisk_0.0.0.9019 vctrs_0.3.8           pillar_1.7.0          lifecycle_1.0.1       spatstat.geom_2.4-0   lmtest_0.9-40         RcppAnnoy_0.0.19     
[105] addinexamples_0.1.0   data.table_1.14.2     cowplot_1.1.1         irlba_2.3.5           httpuv_1.6.5          patchwork_1.1.1       R6_2.5.1              promises_1.2.0.1     
[113] KernSmooth_2.23-20    gridExtra_2.3         parallelly_1.31.0     codetools_0.2-18      MASS_7.3-56           assertthat_0.2.1      rprojroot_2.0.3       withr_2.5.0          
[121] SeuratObject_4.0.4    sctransform_0.3.3     mgcv_1.8-40           parallel_4.1.2        hms_1.1.1             grid_4.1.2            rpart_4.1.16          Rtsne_0.15           
[129] shiny_1.7.1           lubridate_1.8.0      
@gtca
Copy link
Contributor

gtca commented Jul 28, 2022

Hey @bio-la, thanks for the detailed report, I think the issue is just with anndata v0.8 breaking forward compatibility, i.e. AnnData files from versions >=0.8 cannot be read with packages that were targeting earlier versions of files.
This is of course something we're implementing next for MuDataMAE/MuDataSeurat packages but I don't have a solid timeline to share yet...

@bio-la
Copy link
Author

bio-la commented Jul 28, 2022

thanks for answering this! do you mean that mudataSeurat was built expecting objects created with anndata <0.8, and which version of muon/mudata ?
in the meantime do you think I should be able to create a version of the mudata by downgrading muon ? (I'm currently running latest muon )
thanks!
f

@gtca
Copy link
Contributor

gtca commented Jul 28, 2022

Sorry for being unclear. That should be a better way to phrase it: MuDataSeurat was built before AnnData v0.8 spec and hasn't been updated to handle new versions of AnnData files yet.
Basically, it is the same issue as using anndata <0.8 in Python to read new files (scverse/anndata#698, scverse/anndata#739).

I believe using mudata <0.2.0 (e.g. 0.1.2) together with anndata <0.8.0 (e.g. 0.7.8) should help with that!
muon relies on mudata for serialisation so its version shouldn't be of a problem here.

@gtca gtca added the 0.8 AnnData v0.8 spec label Aug 3, 2022
@gtca
Copy link
Contributor

gtca commented Aug 4, 2022

Hey @bio-la, I added new categorical values handling in the latest commits so you might be able to read files written by the latest anndata/mudata. This is not fully compliant with the rest of the new encoding spec but relying on rhdf5 to figure out all the other types seems to do the trick so far for the files I've tested it on.

I.e. this is a small fix to remedy the current issue but also the first step in the direction of this library becoming v0.8-compliant. 😃

tea <- ReadH5MU("muon-tutorials/data/teaseq/pbmc_w3_teaseq.h5mu")
tea
# An object of class Seurat 
# 113187 features across 5805 samples within 3 assays 
# Active assay: rna (16381 features, 2910 variable features)
#  2 other assays present: atac, prot
#  10 dimensional reductions calculated: MOFA, MOFA_UMAP, UMAP, WNN_UMAP, rnaPCA, rnaUMAP, atacPCA, atacUMAP, protPCA, protUMAP

You can also note that it respects mod-order attribute now and keeps the rna as the default assay.

@rcannood
Copy link

Hi @gtca !

I was encountering the above issue as well. I installed the latest development version of MuDataSeurat and am now experiencing the following error:

library(MuDataSeurat)

## VIASH START
par <- list(
  input = "resources_test/pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu",
  output = "output.rds"
)
## VIASH END

obj <- ReadH5MU(par$input)
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 't': attempt to apply non-function
In addition: Warning messages:
1: In missing_on_read("/var", paste0("global variables metadata (",  :
  Missing on read: /var. Seurat does not support global variables metadata (genome, feature_types, gene_symbol).
2: In missing_on_read(paste0("some of mod/", modalityname, "/layers"),  :
  Missing on read: some of mod/rna/layers. Seurat does not support custom layers, unless labeled 'counts'.

Files which trigger this error:

rcannood added a commit to openpipelines-bio/openpipeline that referenced this issue Sep 29, 2022
rcannood added a commit to openpipelines-bio/openpipeline that referenced this issue Sep 30, 2022
* Add initial config generator for cellranger

* add back lanes

* fix variable

* Untar reference input and add fastqs as symlinks

* make fixes to pipeline execution script

* fix processing of input files

* make sure fastqs parameter is added to the csv

* fix localcores parameter

* fix directory traversal issue

* silly commit message

* fix pebkac issues

* Add beginnings of tests, fix output

* os.path.listdir() to os.listdir()

* update script

* update workflowhelper

* clean up bin

* test bd rhap using new reference, untar if need be

* remove old resources scripts

* update bdrhap_5kjrt script

* update to new references

* painstakingly fix bd rhapsody component

* fix bd 5kjrt script

* allow velocyto to work with compressed gtf files

* fix velocyto

* add links to original pages

* fix test

* fix pbmc script

* add something to assert

* fix test

* Move to pathlib

* Fix checking of ouput files.

* Shutil doest not accept Path objects on python < 3.9

* Fix paths in tests

* Fix paths in tests

* CI Only download viash (ci force)

* fix names

* update changelog #ci force

* Allow push run when using: ci force

* Undo changes to labels

* Use correct conditional check (ci force)

* Fix typo (ci force)

* Correct conditional (ci force)

* Fix typo (ci force)

* obsm location was changed in mms test resource

* feature_type -> feature_types

* `convert/from_h5mu_to_seurat`: Disabled because MuDataSeurat is currently broken, see [PMBio/MuDataSeurat#9.

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
@DriesSchaumont
Copy link

Hi @gtca, I am still observing the issue @rcannood described. I was wondering if there were any updates regarding supporting objects created with anndata > 0.8? Thanks!

@DriesSchaumont
Copy link

Thanks a lot, @ilia-kats 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.8 AnnData v0.8 spec
Projects
None yet
Development

No branches or pull requests

4 participants