# Run BLAST

In [1]:
file1 = 'planarian_transcriptome.fasta'
type1 = 'nucl' #or 'prot' if file1 is a proteome
id1 = 'pl' #2-character ID (e.g. 'hu' for human)

file2 = 'schisto_transcriptome.fasta'

type2 = 'nucl'
id2 = 'sc' #2-character ID (e.g. 'mo' for mouse)
!bash map_genes.sh --tr1 {file1} --t1 {type1} --n1 {id1} --tr2 {file2} --t2 {type2} --n2 {id2}

Running tblastx in both directions
^C


# Run SAMap

In [1]:
from samap.mapping import SAMAP
from samap.analysis import get_mapping_scores, GenePairFinder
from samalg import SAM

SAMap accepts file paths to unprocessed, raw `.h5ad` files. Alternatively, if you already have a processed `SAM` object, you can load them in directly.


Prior to running SAMap, you should have run `map_genes.sh`, which expects 2-character identifiers describing each species. For example, planarians and schistosomes might get `pl` and `sc` identifiers, respectively. `map_genes.sh` generates a `maps/` directory with the transcriptome mapping BLAST results deposited. The input species IDs and path to the `maps/` directory should be input into SAMap.

In [2]:
id1 = 'pl'
id2 = 'sc'

In [4]:
# passing in file names (SAMap will process the data with SAM and save the resulting objects to two `.h5ad` files.)
fn1 = '/media/storage/Dropbox/scrna/notebooks/data/planarian.h5ad' #processed data will be automatically saved to `/path/to/file/file1_pr.h5ad`
fn2 = '/media/storage/Dropbox/scrna/notebooks/data/schisto.h5ad' #processed data will be automatically saved to `/path/to/file/file2_pr.h5ad`
# runs SAMAP (f_maps should be the path to the 'maps' directory generated in the BLAST step above)
sm = SAMAP(fn1,fn2,id1,id2,f_maps = '/media/storage/Dropbox/scrna/notebooks/maps/')
samap = sm.run()

"""
# passing in already-processed SAM objects
sam1=SAM()
sam2=SAM()
sam1.load_data('/path/to/file1_pr.h5ad')
sam2.load_data('/path/to/file2_pr.h5ad')

sm = SAMAP(sam1,sam2,id1,id2,f_maps = 'maps/')
samap = sm.run()
"""

Preparing data 1 for SAMap.
Preparing data 2 for SAMap.
11630 `pl` genes and 7427 `sc` gene symbols match between the datasets and the BLAST graph.
Stitching SAM 0 and SAM 1
Found 88729 gene pairs
Recomputing PC projections with gene pair subsets...
Running hsnwlib
Using leiden_clusters and leiden_clusters cluster labels.
Out-neighbor smart expansion 1
Out-neighbor smart expansion 2
Indegree coarsening
0/3 (0, 7657) True
1/3 (20000, 7657) True
2/3 (40000, 7657) True
Concatenating SAM objects...
ITERATION: 0 
Average alignment score (A.S.):  0.4494758679567957 
Max A.S. improvement: 0.5895449249249848 
Min A.S. improvement: 0.0
Calculating gene-gene correlations in the homology graph...
Stitching SAM 0 and SAM 1
Found 36515 gene pairs
Recomputing PC projections with gene pair subsets...
Running hsnwlib
Using leiden_clusters and leiden_clusters cluster labels.
Out-neighbor smart expansion 1
Out-neighbor smart expansion 2
Indegree coarsening
0/3 (0, 7657) True
1/3 (20000, 7657) True
2/3 (

"\n# passing in already-processed SAM objects\nsam1=SAM()\nsam2=SAM()\nsam1.load_data('/path/to/file1_pr.h5ad')\nsam2.load_data('/path/to/file2_pr.h5ad')\n\nsm = SAMAP(sam1,sam2,id1,id2,f_maps = 'maps/')\nsamap = sm.run()\n"

To calculate alignment scores between cell types, we can use `get_mapping_scores`. This function will use the combined SAM object produced by SAMap to calculate alignment scores between cell types in the provided cell type annotation columns of `sam.adata.obs`. If no cell type annotations exist, the leiden clusters generated by SAM can be used (`k1=k2='leiden_clusters'`).

The resulting tables show the highest-scoring alignment scores for each cell type in organism 1 (`D1`) and organism 2 (`D2`), respectively.

In [5]:
k1 = 'cluster' #cell types annotation key in `sm.sam1.adata.obs`
k2 = 'tissue' #cell types annotation key in `sm.sam2.adata.obs`
D1,D2,MappingTable = get_mapping_scores(sm,k1,k2)

In [6]:
D1

Unnamed: 0_level_0,pl_Neoblast: 0,pl_Neoblast: 0,pl_Cathepsin+ cells: 10,pl_Cathepsin+ cells: 10,pl_Muscle: 14,pl_Muscle: 14,pl_Epidermal: 2,pl_Epidermal: 2,pl_Intestine: 30,pl_Intestine: 30,pl_Protonephridia: 29,pl_Protonephridia: 29,pl_Muscle: 13,pl_Muscle: 13,pl_Epidermal: 11,pl_Epidermal: 11,pl_Neural: 8,pl_Neural: 8,pl_Epidermal: 3,pl_Epidermal: 3,pl_Cathepsin+ cells: 4,pl_Cathepsin+ cells: 4,pl_Neoblast: 5,pl_Neoblast: 5,pl_Neural: 20,pl_Neural: 20,pl_Epidermal: 35,pl_Epidermal: 35,pl_Pharynx: 37,pl_Pharynx: 37,pl_Muscle: 7,pl_Muscle: 7,pl_Cathepsin+ cells: 28,pl_Cathepsin+ cells: 28,pl_Neural: 23,pl_Neural: 23,pl_Neural: 1,pl_Neural: 1,pl_Intestine: 6,pl_Intestine: 6,pl_Epidermal: 24,pl_Epidermal: 24,pl_Neural: 18,pl_Neural: 18,pl_Neoblast: 22,pl_Neoblast: 22,pl_Parapharyngeal: 38,pl_Parapharyngeal: 38,pl_Muscle: 16,pl_Muscle: 16,pl_Pharynx: 25,pl_Pharynx: 25,pl_Neural: 9,pl_Neural: 9,pl_Parapharyngeal: 12,pl_Parapharyngeal: 12,pl_Protonephridia: 26,pl_Protonephridia: 26,pl_Neural: 21,pl_Neural: 21,pl_Cathepsin+ cells: 15,pl_Cathepsin+ cells: 15,pl_Cathepsin+ cells: 17,pl_Cathepsin+ cells: 17,pl_Neural: 33,pl_Neural: 33,pl_Protonephridia: 40,pl_Protonephridia: 40,pl_Intestine: 19,pl_Intestine: 19,pl_Neural: 32,pl_Neural: 32,pl_Neural: 36,pl_Neural: 36,pl_Neural: 34,pl_Neural: 34,pl_Cathepsin+ cells: 39,pl_Cathepsin+ cells: 39,pl_Pharynx: 27,pl_Pharynx: 27,pl_Cathepsin+ cells: 42,pl_Cathepsin+ cells: 42,pl_Cathepsin+ cells: 31,pl_Cathepsin+ cells: 31,pl_Intestine: 43,pl_Intestine: 43,pl_Cathepsin+ cells: 41,pl_Cathepsin+ cells: 41
Unnamed: 0_level_1,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score
0,sc_Neoblast,0.598764,sc_Cathepsin,0.58303,sc_Muscle,0.551705,sc_Tegument_prog,0.529709,sc_Intestine,0.423742,sc_Flame cells,0.384437,sc_Muscle,0.377412,sc_Tegument,0.36336,sc_Neural,0.343862,sc_Tegument_prog,0.329127,sc_Cathepsin,0.284357,sc_Neoblast,0.271645,sc_Neural,0.260733,sc_Tegument,0.258121,sc_Tegument,0.250161,sc_Muscle,0.242052,sc_Cathepsin,0.223917,sc_Neural,0.208507,sc_Neural,0.185195,sc_Tegument_prog,0.173964,sc_Tegument,0.163489,sc_Neural,0.154325,sc_Neoblast,0.153925,sc_Gland,0.152423,sc_Muscle,0.146065,sc_Tegument,0.139406,sc_Neural,0.122546,sc_Parenchymal,0.122045,sc_Flame cells,0.12063,sc_Neural,0.115261,sc_Cathepsin,0.075856,sc_Cathepsin,0.0687788,sc_Neural,0.066575,sc_Tegument,0.0475718,sc_Cathepsin,0.0449842,sc_Neural_KK7,0.0432449,sc_Tegument_prog,0.0417364,sc_Neural,0.0311426,sc_Cathepsin,0.0252688,sc_Neural,0.0234031,sc_Cathepsin,0.00965553,sc_Cathepsin,0.00865499,sc_Cathepsin,0.00612058,sc_Cathepsin,0.00399417
1,sc_Tegument_prog,0.000372316,sc_Tegument_prog,0.00652464,sc_Intestine,0.0221045,sc_Tegument,0.0110845,sc_Cathepsin,0.0359549,sc_Cathepsin,0.010683,sc_Neural,0.000674606,sc_Tegument_prog,0.0916215,sc_Neural_KK7,0.0103679,sc_Parenchymal,0.0589594,sc_Neoblast,0.0461106,sc_Parenchymal,0.0172562,sc_Tegument_prog,0.00115135,sc_Tegument_prog,0.0616554,sc_Tegument_prog,0.0163983,sc_Parenchymal,0.0222642,sc_Tegument_prog,0.00262725,sc_Cathepsin,0.00138224,sc_Parenchymal,0.0875216,sc_Cathepsin,0.0990904,sc_Neural,0.0451302,sc_Parenchymal,0.00538424,sc_Parenchymal,0.00415196,sc_Tegument_prog,0.00494203,sc_Intestine,0.000538733,sc_Tegument_prog,0.0406316,sc_Muscle,0.00489054,sc_Cathepsin,0.0854291,sc_Parenchymal,0.0207327,sc_Neoblast,0.00364086,sc_Parenchymal,0.00650258,sc_Muscle,0.00632711,sc_Muscle,0.000355507,sc_Flame cells,0.0358028,sc_Parenchymal,0.0194232,sc_Cathepsin,0.0143484,sc_Muscle,0.00310874,sc_Muscle,0.00211209,sc_Muscle,0.00266928,sc_Tegument,0.000546309,sc_Tegument_prog,0.000454047,sc_Neural,0.000191134,sc_Intestine,0.00286722,sc_Muscle,5.30182e-05
2,sc_Neural_KK7,0.000255347,sc_Tegument,0.00312608,sc_Neural_KK7,0.007516,sc_Parenchymal,0.00218263,sc_Gland,0.0266434,sc_Neural,0.00523998,sc_Neural_KK7,0.000261405,sc_Cathepsin,0.0410767,sc_Muscle,0.00518293,sc_Neoblast,0.0427878,sc_Parenchymal,0.03626,sc_Neural,0.00284452,sc_Neoblast,0.000850726,sc_Intestine,0.000743582,sc_Neural,0.000910168,sc_Neoblast,0.0219356,sc_Neoblast,0.001225,sc_Muscle,0.00112905,sc_Neoblast,0.0304383,sc_Neoblast,0.03546,sc_Tegument_prog,0.0114,sc_Neoblast,0.00112168,sc_Cathepsin,0.00214021,sc_Neoblast,0.00336135,sc_Neural_KK7,0.00044142,sc_Parenchymal,0.0112741,sc_Parenchymal,0.000518203,sc_Neural,0.0841057,sc_Cathepsin,0.00655298,sc_Tegument_prog,0.00083906,sc_Muscle,0.00389578,sc_Neural,0.00360345,sc_Parenchymal,9.39531e-05,sc_Tegument_prog,0.00383181,sc_Tegument_prog,0.00822886,sc_Tegument,0.011094,sc_Parenchymal,0.00310838,sc_Parenchymal,0.00079361,sc_Tegument_prog,0.000513047,sc_Muscle,0.000439206,sc_Neural,0.000213625,sc_Parenchymal,6.51859e-05,sc_Tegument_prog,0.000684832,sc_Tegument_prog,0.0
3,sc_Parenchymal,0.000238063,sc_Neoblast,0.00255907,sc_Neoblast,0.00177825,sc_Neural,0.000533285,sc_Tegument,0.0072613,sc_Tegument,0.000344565,sc_Intestine,0.000218814,sc_Parenchymal,0.00633025,sc_Parenchymal,0.00344101,sc_Neural,0.00991286,sc_Tegument_prog,0.00465219,sc_Cathepsin,0.0026979,sc_Flame cells,0.000820837,sc_Neoblast,0.000651279,sc_Cathepsin,8.10503e-06,sc_Intestine,0.00316284,sc_Muscle,0.00109694,sc_Parenchymal,0.000897407,sc_Cathepsin,0.00274775,sc_Parenchymal,0.0280967,sc_Cathepsin,0.00821673,sc_Tegument_prog,0.00104577,sc_Neural_KK7,0.000489673,sc_Muscle,0.000534204,sc_Neoblast,0.000230289,sc_Neoblast,0.00608962,sc_Tegument,0.000317393,sc_Muscle,0.0135867,sc_Neural,0.00590324,sc_Flame cells,0.000298878,sc_Neoblast,0.00249855,sc_Parenchymal,0.000159807,sc_Tegument_prog,8.14172e-05,sc_Cathepsin,0.00212236,sc_Neoblast,0.00796742,sc_Parenchymal,0.00330107,sc_Tegument,0.00269645,sc_Neural_KK7,0.000312165,sc_Neural,0.00022178,sc_Flame cells,4.2193e-05,sc_Muscle,1.72094e-05,sc_Neoblast,2.85698e-05,sc_Tegument,0.000312964,sc_Tegument,0.0
4,sc_Cathepsin,0.000176026,sc_Intestine,0.00213583,sc_Neural,0.000827803,sc_Neoblast,0.000321612,sc_Muscle,0.00262176,sc_Tegument_prog,0.000295734,sc_Cathepsin,2.26788e-05,sc_Neural,0.00209702,sc_Cathepsin,0.00178197,sc_Tegument,0.00331855,sc_Neural,0.00271026,sc_Muscle,0.000482277,sc_Parenchymal,0.000635242,sc_Gland,0.000296811,sc_Parenchymal,0.0,sc_Neural,0.00115635,sc_Neural,0.000477838,sc_Flame cells,0.000601598,sc_Muscle,0.00244264,sc_Muscle,0.0017991,sc_Flame cells,0.00458293,sc_Intestine,0.000812372,sc_Tegument_prog,0.000200495,sc_Cathepsin,0.000185859,sc_Parenchymal,0.000174243,sc_Cathepsin,0.00606807,sc_Cathepsin,0.000193865,sc_Tegument_prog,0.0124812,sc_Neoblast,0.00570168,sc_Parenchymal,0.000269765,sc_Neural,0.00157465,sc_Neoblast,5.2646e-05,sc_Neoblast,7.26552e-06,sc_Intestine,0.000954054,sc_Gland,0.00132846,sc_Muscle,0.000518912,sc_Neural,0.00129922,sc_Cathepsin,0.00025914,sc_Neoblast,5.27963e-05,sc_Tegument_prog,0.0,sc_Tegument,0.0,sc_Muscle,2.61002e-05,sc_Neural,2.56224e-05,sc_Parenchymal,0.0
5,sc_Neural,0.000154783,sc_Muscle,0.00204314,sc_Cathepsin,0.000182522,sc_Cathepsin,6.92936e-05,sc_Tegument_prog,0.00165449,sc_Parenchymal,0.000211041,sc_Tegument_prog,1.7636e-05,sc_Muscle,0.00124177,sc_Intestine,0.000681193,sc_Cathepsin,0.00273008,sc_Muscle,0.00210427,sc_Tegument_prog,0.000207564,sc_Cathepsin,0.000576514,sc_Parenchymal,0.000232921,sc_Neural_KK7,0.0,sc_Tegument_prog,0.000619894,sc_Intestine,0.000336652,sc_Tegument_prog,0.000547261,sc_Tegument_prog,0.000435221,sc_Neural,0.000804331,sc_Parenchymal,0.00366316,sc_Muscle,0.000790871,sc_Neural,0.000161616,sc_Parenchymal,8.4048e-05,sc_Neural,0.000163775,sc_Neural,0.00453688,sc_Neoblast,8.80583e-05,sc_Neoblast,0.0121177,sc_Tegument_prog,0.00110823,sc_Cathepsin,0.000169934,sc_Tegument_prog,0.00144615,sc_Tegument_prog,4.25988e-05,sc_Tegument,0.0,sc_Muscle,0.000557493,sc_Muscle,0.00105023,sc_Neural,0.000474062,sc_Neoblast,0.00124665,sc_Gland,0.000123139,sc_Tegument,0.0,sc_Parenchymal,0.0,sc_Parenchymal,0.0,sc_Tegument_prog,1.97785e-05,sc_Muscle,2.38216e-05,sc_Neural_KK7,0.0
6,sc_Muscle,0.000148423,sc_Neural,0.00149881,sc_Tegument_prog,0.000160643,sc_Muscle,6.42681e-05,sc_Neural,3.29212e-05,sc_Muscle,0.00011042,sc_Neoblast,1.39472e-05,sc_Intestine,0.000308751,sc_Neoblast,0.000459836,sc_Muscle,0.00139009,sc_Tegument,0.0002527,sc_Intestine,0.000157151,sc_Muscle,0.00040614,sc_Cathepsin,5.98466e-05,sc_Neoblast,0.0,sc_Cathepsin,0.000446071,sc_Tegument,0.000223497,sc_Neoblast,0.0003554,sc_Neural_KK7,0.00031435,sc_Tegument,0.00030405,sc_Muscle,0.000482012,sc_Cathepsin,0.000716543,sc_Muscle,4.1107e-05,sc_Neural,6.57768e-05,sc_Gland,0.000145394,sc_Intestine,0.0027424,sc_Intestine,2.80718e-05,sc_Tegument,0.00746565,sc_Muscle,0.000860428,sc_Intestine,0.000118023,sc_Tegument,0.000157659,sc_Tegument,0.0,sc_Neural_KK7,0.0,sc_Gland,0.000211234,sc_Intestine,0.000992771,sc_Neoblast,1.49269e-05,sc_Cathepsin,0.000252654,sc_Neoblast,9.06908e-05,sc_Parenchymal,0.0,sc_Neural_KK7,0.0,sc_Neural_KK7,0.0,sc_Tegument,0.0,sc_Neoblast,4.86255e-06,sc_Neural,0.0
7,sc_Intestine,3.65333e-06,sc_Gland,0.000367092,sc_Tegument,0.0,sc_Gland,5.64936e-05,sc_Parenchymal,0.0,sc_Neoblast,3.25973e-05,sc_Tegument,0.0,sc_Gland,0.000249147,sc_Tegument_prog,0.000417833,sc_Gland,0.000281335,sc_Intestine,0.000123119,sc_Tegument,4.71704e-05,sc_Gland,0.000102242,sc_Neural,4.23274e-05,sc_Muscle,0.0,sc_Neural_KK7,0.000398095,sc_Flame cells,2.35555e-05,sc_Intestine,0.000296857,sc_Intestine,0.000122504,sc_Intestine,0.000178186,sc_Intestine,7.97385e-05,sc_Neural_KK7,6.71567e-05,sc_Tegument,0.0,sc_Tegument,0.0,sc_Cathepsin,0.000144754,sc_Muscle,0.000313065,sc_Neural_KK7,2.69076e-05,sc_Gland,0.00472098,sc_Intestine,0.000615579,sc_Tegument,0.000116282,sc_Intestine,0.000118806,sc_Neural_KK7,0.0,sc_Intestine,0.0,sc_Neural,9.85793e-05,sc_Neural,0.000773393,sc_Tegument_prog,1.38991e-05,sc_Neural_KK7,0.000121057,sc_Tegument_prog,0.0,sc_Neural_KK7,0.0,sc_Neoblast,0.0,sc_Neoblast,0.0,sc_Neural_KK7,0.0,sc_Parenchymal,0.0,sc_Neoblast,0.0
8,sc_Flame cells,2.41592e-06,sc_Flame cells,0.000176834,sc_Parenchymal,0.0,sc_Intestine,2.62418e-05,sc_Neural_KK7,0.0,sc_Neural_KK7,0.0,sc_Parenchymal,0.0,sc_Neoblast,3.57892e-05,sc_Gland,0.000322435,sc_Intestine,0.000229421,sc_Gland,2.17256e-05,sc_Neural_KK7,1.05724e-05,sc_Neural_KK7,6.38496e-06,sc_Neural_KK7,0.0,sc_Intestine,0.0,sc_Gland,0.000100789,sc_Parenchymal,0.0,sc_Gland,0.000166187,sc_Gland,3.31701e-05,sc_Gland,0.000107772,sc_Neural_KK7,0.0,sc_Tegument,5.34048e-05,sc_Intestine,0.0,sc_Neural_KK7,0.0,sc_Tegument_prog,0.000119982,sc_Gland,0.000138008,sc_Tegument_prog,1.62621e-05,sc_Neural_KK7,6.91089e-05,sc_Tegument,0.000151646,sc_Muscle,6.91974e-05,sc_Neural_KK7,4.321e-05,sc_Intestine,0.0,sc_Gland,0.0,sc_Parenchymal,4.63048e-05,sc_Tegument,0.000312033,sc_Intestine,0.0,sc_Intestine,5.31115e-05,sc_Tegument,0.0,sc_Intestine,0.0,sc_Intestine,0.0,sc_Intestine,0.0,sc_Intestine,0.0,sc_Neural_KK7,0.0,sc_Intestine,0.0
9,sc_Tegument,0.0,sc_Parenchymal,9.91723e-05,sc_Gland,0.0,sc_Neural_KK7,0.0,sc_Neoblast,0.0,sc_Intestine,0.0,sc_Gland,0.0,sc_Neural_KK7,0.0,sc_Tegument,2.56126e-05,sc_Neural_KK7,2.04767e-05,sc_Flame cells,1.6122e-05,sc_Gland,0.0,sc_Tegument,0.0,sc_Muscle,0.0,sc_Gland,0.0,sc_Tegument,0.0,sc_Neural_KK7,0.0,sc_Tegument,5.40327e-05,sc_Tegument,3.03452e-05,sc_Flame cells,5.68548e-05,sc_Neoblast,0.0,sc_Gland,5.01361e-05,sc_Gland,0.0,sc_Intestine,0.0,sc_Tegument,0.0,sc_Neural_KK7,9.5971e-06,sc_Gland,0.0,sc_Intestine,8.55421e-06,sc_Gland,3.12583e-05,sc_Neural_KK7,2.54723e-05,sc_Flame cells,1.86445e-05,sc_Gland,0.0,sc_Flame cells,0.0,sc_Neoblast,3.80496e-05,sc_Neural_KK7,0.0,sc_Gland,0.0,sc_Gland,0.0,sc_Intestine,0.0,sc_Gland,0.0,sc_Gland,0.0,sc_Gland,0.0,sc_Gland,0.0,sc_Gland,0.0,sc_Gland,0.0


In [7]:
D2

Unnamed: 0_level_0,sc_Neoblast,sc_Neoblast,sc_Cathepsin,sc_Cathepsin,sc_Muscle,sc_Muscle,sc_Tegument_prog,sc_Tegument_prog,sc_Intestine,sc_Intestine,sc_Flame cells,sc_Flame cells,sc_Tegument,sc_Tegument,sc_Neural,sc_Neural,sc_Gland,sc_Gland,sc_Parenchymal,sc_Parenchymal,sc_Neural_KK7,sc_Neural_KK7
Unnamed: 0_level_1,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score,Cluster,Alignment score
0,pl_Neoblast: 0,0.598764,pl_Cathepsin+ cells: 10,0.58303,pl_Muscle: 14,0.551705,pl_Epidermal: 2,0.529709,pl_Intestine: 30,0.423742,pl_Protonephridia: 29,0.384437,pl_Epidermal: 11,0.36336,pl_Neural: 8,0.343862,pl_Parapharyngeal: 38,0.152423,pl_Parapharyngeal: 12,0.122045,pl_Neural: 32,0.0432449
1,pl_Neoblast: 5,0.271645,pl_Cathepsin+ cells: 4,0.284357,pl_Muscle: 13,0.377412,pl_Epidermal: 3,0.329127,pl_Muscle: 14,0.0221045,pl_Protonephridia: 26,0.12063,pl_Epidermal: 35,0.258121,pl_Neural: 20,0.260733,pl_Intestine: 30,0.0266434,pl_Neural: 1,0.0875216,pl_Neural: 8,0.0103679
2,pl_Neoblast: 22,0.153925,pl_Cathepsin+ cells: 28,0.223917,pl_Muscle: 7,0.242052,pl_Intestine: 6,0.173964,pl_Muscle: 7,0.00316284,pl_Protonephridia: 40,0.0358028,pl_Pharynx: 37,0.250161,pl_Neural: 23,0.208507,pl_Parapharyngeal: 12,0.00472098,pl_Epidermal: 3,0.0589594,pl_Muscle: 14,0.007516
3,pl_Cathepsin+ cells: 4,0.0461106,pl_Intestine: 6,0.0990904,pl_Muscle: 16,0.146065,pl_Epidermal: 11,0.0916215,pl_Intestine: 43,0.00286722,pl_Epidermal: 24,0.00458293,pl_Epidermal: 24,0.163489,pl_Neural: 1,0.185195,pl_Intestine: 19,0.00132846,pl_Cathepsin+ cells: 4,0.03626,pl_Neoblast: 22,0.000489673
4,pl_Epidermal: 3,0.0427878,pl_Parapharyngeal: 12,0.0854291,pl_Parapharyngeal: 12,0.0135867,pl_Epidermal: 35,0.0616554,pl_Pharynx: 25,0.0027424,pl_Neural: 20,0.000820837,pl_Pharynx: 25,0.139406,pl_Neural: 18,0.154325,pl_Cathepsin+ cells: 10,0.000367092,pl_Intestine: 6,0.0280967,pl_Muscle: 16,0.00044142
5,pl_Intestine: 6,0.03546,pl_Cathepsin+ cells: 15,0.075856,pl_Cathepsin+ cells: 17,0.00632711,pl_Neural: 36,0.0417364,pl_Cathepsin+ cells: 10,0.00213583,pl_Neural: 23,0.000601598,pl_Protonephridia: 40,0.0475718,pl_Neural: 9,0.122546,pl_Neural: 8,0.000322435,pl_Muscle: 7,0.0222642,pl_Muscle: 7,0.000398095
6,pl_Neural: 1,0.0304383,pl_Cathepsin+ cells: 17,0.0687788,pl_Neural: 8,0.00518293,pl_Pharynx: 25,0.0406316,pl_Intestine: 19,0.000992771,pl_Neural: 21,0.000298878,pl_Neural: 32,0.011094,pl_Neural: 21,0.115261,pl_Epidermal: 35,0.000296811,pl_Protonephridia: 26,0.0207327,pl_Neural: 1,0.00031435
7,pl_Muscle: 7,0.0219356,pl_Intestine: 19,0.0449842,pl_Neural: 9,0.00489054,pl_Pharynx: 37,0.0163983,pl_Protonephridia: 40,0.000954054,pl_Cathepsin+ cells: 10,0.000176834,pl_Epidermal: 2,0.0110845,pl_Parapharyngeal: 12,0.0841057,pl_Epidermal: 3,0.000281335,pl_Intestine: 19,0.0194232,pl_Neural: 34,0.000312165
8,pl_Parapharyngeal: 12,0.0121177,pl_Epidermal: 11,0.0410767,pl_Cathepsin+ cells: 15,0.00389578,pl_Parapharyngeal: 12,0.0124812,pl_Neural: 18,0.000812372,pl_Intestine: 6,5.68548e-05,pl_Parapharyngeal: 12,0.00746565,pl_Neural: 33,0.066575,pl_Epidermal: 11,0.000249147,pl_Neoblast: 5,0.0172562,pl_Muscle: 13,0.000261405
9,pl_Intestine: 19,0.00796742,pl_Intestine: 30,0.0359549,pl_Neural: 36,0.00310874,pl_Epidermal: 24,0.0114,pl_Epidermal: 35,0.000743582,pl_Pharynx: 27,4.2193e-05,pl_Intestine: 30,0.0072613,pl_Epidermal: 24,0.0451302,pl_Protonephridia: 40,0.000211234,pl_Pharynx: 25,0.0112741,pl_Neoblast: 0,0.000255347


In [8]:
MappingTable

Unnamed: 0,pl_Cathepsin+ cells: 10,pl_Cathepsin+ cells: 15,pl_Cathepsin+ cells: 17,pl_Cathepsin+ cells: 28,pl_Cathepsin+ cells: 31,pl_Cathepsin+ cells: 39,pl_Cathepsin+ cells: 4,pl_Cathepsin+ cells: 41,pl_Cathepsin+ cells: 42,pl_Epidermal: 11,pl_Epidermal: 2,pl_Epidermal: 24,pl_Epidermal: 3,pl_Epidermal: 35,pl_Intestine: 19,pl_Intestine: 30,pl_Intestine: 43,pl_Intestine: 6,pl_Muscle: 13,pl_Muscle: 14,pl_Muscle: 16,pl_Muscle: 7,pl_Neoblast: 0,pl_Neoblast: 22,pl_Neoblast: 5,pl_Neural: 1,pl_Neural: 18,pl_Neural: 20,pl_Neural: 21,pl_Neural: 23,pl_Neural: 32,pl_Neural: 33,pl_Neural: 34,pl_Neural: 36,pl_Neural: 8,pl_Neural: 9,pl_Parapharyngeal: 12,pl_Parapharyngeal: 38,pl_Pharynx: 25,pl_Pharynx: 27,pl_Pharynx: 37,pl_Protonephridia: 26,pl_Protonephridia: 29,pl_Protonephridia: 40
sc_Cathepsin,0.58303,0.075856,0.068779,0.223917,0.008655,0.025269,0.284357,0.003994,0.009656,0.041077,6.9e-05,0.008217,0.00273,6e-05,0.044984,0.035955,0.006121,0.09909,2.3e-05,0.000183,0.000145,0.000446,0.000176,0.00214,0.002698,0.002748,0.000717,0.000577,0.00017,0.001382,0.014348,0.0,0.000259,0.000253,0.001782,0.000194,0.085429,0.000186,0.006068,0.0,8e-06,0.006553,0.010683,0.002122
sc_Flame cells,0.000177,1.9e-05,0.0,2.4e-05,0.0,0.0,1.6e-05,0.0,0.0,0.0,0.0,0.004583,1.7e-05,0.0,0.0,0.0,0.0,5.7e-05,0.0,0.0,0.0,0.0,2e-06,0.0,0.0,0.0,3.7e-05,0.000821,0.000299,0.000602,0.0,0.0,0.0,0.0,1.2e-05,0.0,0.0,0.0,0.0,4.2e-05,0.0,0.12063,0.384437,0.035803
sc_Gland,0.000367,0.0,0.0,0.0,0.0,0.0,2.2e-05,0.0,0.0,0.000249,5.6e-05,0.0,0.000281,0.000297,0.001328,0.026643,0.0,0.000108,0.0,0.0,0.000145,0.000101,0.0,0.0,0.0,3.3e-05,5e-05,0.000102,0.0,0.000166,0.0,0.0,0.000123,0.0,0.000322,0.0,0.004721,0.152423,0.000138,0.0,0.0,3.1e-05,0.0,0.000211
sc_Intestine,0.002136,0.000119,0.0,0.000337,0.0,0.0,0.000123,0.0,0.0,0.000309,2.6e-05,8e-05,0.000229,0.000744,0.000993,0.423742,0.002867,0.000178,0.000219,0.022104,0.000539,0.003163,4e-06,0.0,0.000157,0.000123,0.000812,0.0,0.000118,0.000297,0.0,0.0,0.0,5.3e-05,0.000681,2.8e-05,9e-06,0.0,0.002742,0.0,0.0,0.000616,0.0,0.000954
sc_Muscle,0.002043,0.003896,0.006327,0.001097,2.6e-05,0.002669,0.002104,5.3e-05,1.7e-05,0.001242,6.4e-05,0.000482,0.00139,0.0,0.00105,0.002622,2.4e-05,0.001799,0.377412,0.551705,0.146065,0.242052,0.000148,4.1e-05,0.000482,0.002443,0.000791,0.000406,6.9e-05,0.001129,0.000519,0.000356,0.002112,0.003109,0.005183,0.004891,0.013587,0.000534,0.000313,0.000439,0.0,0.00086,0.00011,0.000557
sc_Neoblast,0.002559,0.002499,5.3e-05,0.001225,2.9e-05,5.3e-05,0.046111,0.0,0.0,3.6e-05,0.000322,0.0,0.042788,0.000651,0.007967,0.0,5e-06,0.03546,1.4e-05,0.001778,0.00023,0.021936,0.598764,0.153925,0.271645,0.030438,0.001122,0.000851,0.003641,0.000355,1.5e-05,7e-06,9.1e-05,0.001247,0.00046,8.8e-05,0.012118,0.003361,0.00609,0.0,0.0,0.005702,3.3e-05,3.8e-05
sc_Neural,0.001499,0.001575,0.003603,0.000478,0.000191,0.000222,0.00271,0.0,0.000214,0.002097,0.000533,0.04513,0.009913,4.2e-05,0.000773,3.3e-05,2.6e-05,0.000804,0.000675,0.000828,0.000164,0.001156,0.000155,0.000162,0.002845,0.185195,0.154325,0.260733,0.115261,0.208507,0.000474,0.066575,0.031143,0.001299,0.343862,0.122546,0.084106,6.6e-05,0.004537,0.023403,0.00091,0.005903,0.00524,9.9e-05
sc_Neural_KK7,5.2e-05,4.3e-05,0.0,0.0,0.0,0.0,1.3e-05,0.0,0.0,0.0,0.0,0.0,2e-05,0.0,0.0,0.0,0.0,2.2e-05,0.000261,0.007516,0.000441,0.000398,0.000255,0.00049,1.1e-05,0.000314,6.7e-05,6e-06,2.5e-05,1e-05,0.043245,0.0,0.000312,0.000121,0.010368,2.7e-05,6.9e-05,0.0,1e-05,0.0,0.0,0.0,0.0,0.0
sc_Parenchymal,9.9e-05,0.006503,0.00016,0.0,6.5e-05,0.0,0.03626,0.0,0.0,0.00633,0.002183,0.003663,0.058959,0.000233,0.019423,0.0,0.0,0.028097,0.0,0.0,0.000174,0.022264,0.000238,0.004152,0.017256,0.087522,0.005384,0.000635,0.00027,0.000897,0.003301,9.4e-05,0.000794,0.003108,0.003441,0.000518,0.122045,8.4e-05,0.011274,0.0,0.0,0.020733,0.000211,4.6e-05
sc_Tegument,0.003126,0.000158,0.0,0.000223,0.0,0.0,0.000253,0.0,0.0,0.36336,0.011084,0.163489,0.003319,0.258121,0.000312,0.007261,0.000313,0.000304,0.0,0.0,0.0,0.0,0.0,0.0,4.7e-05,3e-05,5.3e-05,0.0,0.000116,5.4e-05,0.011094,0.0,0.0,0.002696,2.6e-05,0.000317,0.007466,0.0,0.139406,0.000546,0.250161,0.000152,0.000345,0.047572


SAMap provides a class to find gene pairs enriched in different cell type pairs. The method entails finding gene pairs that contribute positively to the cross-species correlation between cell types and are differentially expressed in their respective mapped cell types.

In [9]:
gpf = GenePairFinder(sm,k1=k1,k2=k2)

Finding cluster-specific markers in pl:cluster and sc:tissue.



These matrices should now be stored in the .obsp attribute.
This slicing behavior will be removed in anndata 0.8.
... storing 'cluster' as categorical
  foldchanges = (self.expm1_func(mean_group) + 1e-9) / (

These matrices should now be stored in the .obsp attribute.
This slicing behavior will be removed in anndata 0.8.
... storing 'tissue' as categorical
  self.expm1_func(mean_rest) + 1e-9
  self.expm1_func(mean_rest) + 1e-9
  foldchanges[global_indices]


`gpf.find_genes` can now be used to find gene pairs enriched in a cell type mapping.

In [10]:
n1 = 'Neoblast: 0' #cell type ID from organism 1 (must be present in `sam1.adata.obs[k1]`)
n2 = 'Neoblast' #cell type ID from organism 2 (must be present in `sam2.adata.obs[k1]`)
Gp,G1,G2 = gpf.find_genes(n1,n2)
#Gp are the gene pairs, G1 are the genes from organism 1, G2 are the genes from organism 2

In [11]:
Gp,G1,G2

(array(['pl_dd_Smed_v4_659_0_1;sc_Smp_179320',
        'pl_dd_Smed_v4_2787_0_1;sc_Smp_086860',
        'pl_dd_Smed_v4_648_0_1;sc_Smp_027920',
        'pl_dd_Smed_v4_2614_0_1;sc_Smp_086860',
        'pl_dd_Smed_v4_15019_0_1;sc_Smp_086860',
        'pl_dd_Smed_v4_5764_0_1;sc_Smp_032500',
        'pl_dd_Smed_v4_5764_0_1;sc_Smp_143490',
        'pl_dd_Smed_v4_5764_0_1;sc_Smp_172530',
        'pl_dd_Smed_v4_5764_0_1;sc_Smp_094140',
        'pl_dd_Smed_v4_7837_0_1;sc_Smp_082490',
        'pl_dd_Smed_v4_1484_0_1;sc_Smp_086860',
        'pl_dd_Smed_v4_5764_0_1;sc_Smp_054840',
        'pl_dd_Smed_v4_10594_0_1;sc_Smp_162370',
        'pl_dd_Smed_v4_4273_0_1;sc_Smp_009600',
        'pl_dd_Smed_v4_3708_0_1;sc_Smp_245030',
        'pl_dd_Smed_v4_5764_0_1;sc_Smp_037590',
        'pl_dd_Smed_v4_11487_0_1;sc_Smp_082490',
        'pl_dd_Smed_v4_756_0_1;sc_Smp_179320',
        'pl_dd_Smed_v4_12365_0_1;sc_Smp_086860',
        'pl_dd_Smed_v4_4712_0_1;sc_Smp_032500',
        'pl_dd_Smed_v4_2787_0_1;sc_Smp_

To get a table of enriched gene pairs from all cell type mappings that have an alignment score above some threshold (`thr`), you can use:

In [None]:
gene_pairs = gpf.find_all(thr=0.1)

Calculating gene pairs for the mapping: pl;Cathepsin+ cells: 10 to sc;Cathepsin
Calculating gene pairs for the mapping: pl;Cathepsin+ cells: 28 to sc;Cathepsin
Calculating gene pairs for the mapping: pl;Cathepsin+ cells: 4 to sc;Cathepsin
Calculating gene pairs for the mapping: pl;Epidermal: 11 to sc;Tegument
Calculating gene pairs for the mapping: pl;Epidermal: 2 to sc;Tegument_prog
Calculating gene pairs for the mapping: pl;Epidermal: 24 to sc;Tegument
Calculating gene pairs for the mapping: pl;Epidermal: 3 to sc;Tegument_prog
Calculating gene pairs for the mapping: pl;Epidermal: 35 to sc;Tegument
Calculating gene pairs for the mapping: pl;Intestine: 30 to sc;Intestine
Calculating gene pairs for the mapping: pl;Intestine: 6 to sc;Tegument_prog
Calculating gene pairs for the mapping: pl;Muscle: 13 to sc;Muscle
Calculating gene pairs for the mapping: pl;Muscle: 14 to sc;Muscle
Calculating gene pairs for the mapping: pl;Muscle: 16 to sc;Muscle


In [None]:
gene_pairs

# Saving/Loading SAMap

In [None]:
from samap.utils import save_samap, load_samap
# save to a .pkl file
save_samap(sm,'path/to/file') #including the file ending (.pkl) is optional

In [None]:
# load
sm = load_samap('path/to/file') #including the file ending (.pkl) is optional

# Visualizing SAMap results


Launching an interactive GUI (requires SAMGUI, see the README instructions in the [SAM](https://github.com/atarashansky/self-assembling-manifold) github repo):

In [None]:
sm.gui()

To create a sankey plot, use `samap.analysis.sankey_plot`. This requires `holoviews` to be installed (`pip install holoviews`). It should already be installed if you're using the Docker image.

In [None]:
from samap.analysis import sankey_plot
k1 = 'cluster' #cell types annotation key in `sam1.adata.obs`
k2 = 'tissue' #cell types annotation key in `sam2.adata.obs`
sankey_plot(sm,k1,k2)