-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Hi, I have a problem that I cannot sort out. I am doing a three species coparison from public single cell datasets that were processed with Seurat. I used the SeuratDisk to convert the lognormalized and annotated experiments into h5ad objects. I loaded the files and created sam objects and corrected batch effects as suggested in #83.
from samalg import SAM
import scanpy as sc
# load data into adata objects
mm10_adata = sc.read_h5ad("/scratch/jmontenegro/samap/data/mm10/sce_data.h5ad")
spur_adata = sc.read_h5ad("/scratch/jmontenegro/samap/data/spur4.2/spur4.2.h5ad)
nv2_adata = sc.read_h5ad("/scratch/jmontenegro/samap/data/nv2/Nv2.18-48h.h5ad")
# convert to sam objects
mm10_sam=SAM(counts=mm10_adata)
mm10_sam.run(batch_key="stage")
spur_sam=SAM(counts=spur_adata)
spur_sam.run(batch_key="orig.ident")
# correct the orig.ident into a category
nv2_adata.obs["orig.ident"] = nv2_adata.obs["orig.ident"].astype("category")
nv2_sam=SAM(counts=nv2_adata)
nv2_sam.run(batch_key="orig.ident")
I used mmseqs to generate all vs all blast-like tables between the transcripts of all three species and generated the following directory:
/scratch/jmontenegro/samap/mmseqs/ > ls -ltrh
total 804M
-rw-r--r-- 1 jmontenegro molevo 121M May 14 17:17 nv2_to_mm10.txt
-rw-r--r-- 1 jmontenegro molevo 81M May 14 17:19 nv2_to_spur.txt
-rw-r--r-- 1 jmontenegro molevo 86M May 14 17:24 spur_to_nv2.txt
-rw-r--r-- 1 jmontenegro molevo 181M May 14 17:29 mm10_to_nv2.txt
-rw-r--r-- 1 jmontenegro molevo 139M May 14 17:32 spur_to_mm10.txt
-rw-r--r-- 1 jmontenegro molevo 199M May 14 17:32 mm10_to_spur.txt
Because the transcripts' names do not match the gene names I created mapping tables and generated lists of tuples as suggested in #81:
mm10_map = '/scratch/jmontenegro/samap/data/mm10/GRCm38.92_gene_transcript.map'
with open(mm10_map, 'r') as f:
mm10_map_list = []
for line in f:
parts = line.strip().split('\t')
if len(parts) == 2:
gene, transcript = parts
mm10_map_list.append((transcript, gene))
mm10_map_list
[('ENSMUST00000231106', 'ENSMUSG00000116508'), ('ENSMUST00000230662', 'ENSMUSG00000116514'), ('ENSMUST00000230244', 'ENSMUSG00000116516'), ('ENSMUST00000228976', 'ENSMUSG00000116518'), ('ENSMUST00000230732', 'ENSMUSG00000116518'), ('ENSMUST00000230045', 'ENSMUSG00000116519'), ('ENSMUST00000230696', 'ENSMUSG00000116520'), ('ENSMUST00000229187', 'ENSMUSG00000116521'), ('ENSMUST00000230927', 'ENSMUSG00000116525'), ('ENSMUST00000229675', 'ENSMUSG00000116528')
....
]
I did the same for the other 2 species and generated the nv2_map_list and spur_map_list. Once everything was ready I tried creating the samap object:
sams={'nv2':nv2_sam, 'mm10':mm10_sam, 'spur':spur_sam}
sm=SAMAP(sams, f_maps='/scratch/jmontenegro/samap/mmseqs/', names={'nv2':nv2_map_list, 'mm10':mm10_map_list, 'spur':spur_map_list})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/lisc/app/conda/miniforge3/envs/samap-1.0.16/lib/python3.9/site-packages/samap/mapping.py", line 143, in __init__
gnnm, gns, gns_dict = _calculate_blast_graph(
File "/lisc/app/conda/miniforge3/envs/samap-1.0.16/lib/python3.9/site-packages/samap/mapping.py", line 1070, in _calculate_blast_graph
raise FileExistsError(
FileExistsError: BLAST mapping tables with the input IDs (nv2 and mm10) not found in the specified path.
I don't understand why the blast results cannot be found. And specifically, why the mm10 and nv2, but the spur seems to be ok? Could you point me in the right direction?
Best regards,