Hi there. I noticed that the read_parquet call here realizes transcripts into memory as a data.frame. Instead, I would propose something like the following, which would realize only i) cell IDs and ii) unique FOV entries of the cell IDs present in the data. This also prevents costly dplyr computations (group_by, select, distinct, left_join) on a data.frame that could potentially contain millions of entries, which beats the purpose of having a .parquet file to begin with.
mol <- metadata(xen)$transcripts
mol <- read_parquet(mol, as_data_frame=FALSE) # this is important
idx <- pull(mol, "cell_id")
idx <- match(xen$cell_id, idx)
fov <- pull(mol[idx, ], "fov")
Hi there. I noticed that the
read_parquetcall here realizes transcripts into memory as adata.frame. Instead, I would propose something like the following, which would realize only i) cell IDs and ii) unique FOV entries of the cell IDs present in the data. This also prevents costlydplyrcomputations (group_by,select,distinct,left_join) on adata.framethat could potentially contain millions of entries, which beats the purpose of having a .parquet file to begin with.