Skip to content

BLAST mapping tables not found #162

@jdmontenegro

Description

@jdmontenegro

Hi, I have a problem that I cannot sort out. I am doing a three species coparison from public single cell datasets that were processed with Seurat. I used the SeuratDisk to convert the lognormalized and annotated experiments into h5ad objects. I loaded the files and created sam objects and corrected batch effects as suggested in #83.

from samalg import SAM
import scanpy as sc

# load data into adata objects
mm10_adata = sc.read_h5ad("/scratch/jmontenegro/samap/data/mm10/sce_data.h5ad")
spur_adata = sc.read_h5ad("/scratch/jmontenegro/samap/data/spur4.2/spur4.2.h5ad)
nv2_adata = sc.read_h5ad("/scratch/jmontenegro/samap/data/nv2/Nv2.18-48h.h5ad")

# convert to sam objects
mm10_sam=SAM(counts=mm10_adata)
mm10_sam.run(batch_key="stage")
spur_sam=SAM(counts=spur_adata)
spur_sam.run(batch_key="orig.ident")
# correct the orig.ident into a category
nv2_adata.obs["orig.ident"] = nv2_adata.obs["orig.ident"].astype("category")
nv2_sam=SAM(counts=nv2_adata)
nv2_sam.run(batch_key="orig.ident")

I used mmseqs to generate all vs all blast-like tables between the transcripts of all three species and generated the following directory:

/scratch/jmontenegro/samap/mmseqs/ > ls -ltrh
total 804M
-rw-r--r-- 1 jmontenegro molevo 121M May 14 17:17 nv2_to_mm10.txt
-rw-r--r-- 1 jmontenegro molevo  81M May 14 17:19 nv2_to_spur.txt
-rw-r--r-- 1 jmontenegro molevo  86M May 14 17:24 spur_to_nv2.txt
-rw-r--r-- 1 jmontenegro molevo 181M May 14 17:29 mm10_to_nv2.txt
-rw-r--r-- 1 jmontenegro molevo 139M May 14 17:32 spur_to_mm10.txt
-rw-r--r-- 1 jmontenegro molevo 199M May 14 17:32 mm10_to_spur.txt

Because the transcripts' names do not match the gene names I created mapping tables and generated lists of tuples as suggested in #81:

mm10_map = '/scratch/jmontenegro/samap/data/mm10/GRCm38.92_gene_transcript.map'
with open(mm10_map, 'r') as f:
    mm10_map_list = []
    for line in f:
        parts = line.strip().split('\t')
        if len(parts) == 2:
            gene, transcript = parts
            mm10_map_list.append((transcript, gene))
mm10_map_list
[('ENSMUST00000231106', 'ENSMUSG00000116508'), ('ENSMUST00000230662', 'ENSMUSG00000116514'), ('ENSMUST00000230244', 'ENSMUSG00000116516'), ('ENSMUST00000228976', 'ENSMUSG00000116518'), ('ENSMUST00000230732', 'ENSMUSG00000116518'), ('ENSMUST00000230045', 'ENSMUSG00000116519'), ('ENSMUST00000230696', 'ENSMUSG00000116520'), ('ENSMUST00000229187', 'ENSMUSG00000116521'), ('ENSMUST00000230927', 'ENSMUSG00000116525'), ('ENSMUST00000229675', 'ENSMUSG00000116528') 
....
]

I did the same for the other 2 species and generated the nv2_map_list and spur_map_list. Once everything was ready I tried creating the samap object:

sams={'nv2':nv2_sam, 'mm10':mm10_sam, 'spur':spur_sam}
sm=SAMAP(sams, f_maps='/scratch/jmontenegro/samap/mmseqs/', names={'nv2':nv2_map_list, 'mm10':mm10_map_list, 'spur':spur_map_list})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lisc/app/conda/miniforge3/envs/samap-1.0.16/lib/python3.9/site-packages/samap/mapping.py", line 143, in __init__
    gnnm, gns, gns_dict = _calculate_blast_graph(
  File "/lisc/app/conda/miniforge3/envs/samap-1.0.16/lib/python3.9/site-packages/samap/mapping.py", line 1070, in _calculate_blast_graph
    raise FileExistsError(
FileExistsError: BLAST mapping tables with the input IDs (nv2 and mm10) not found in the specified path.

I don't understand why the blast results cannot be found. And specifically, why the mm10 and nv2, but the spur seems to be ok? Could you point me in the right direction?
Best regards,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions