# Determining CD34 / Sox10 / S100 / Col9a2 / Shisa3 populations, and their transmembrane markers

When analysing populations we have observed two/three different patterns of cells based on these 5 markers:
* (A) CD34$^+$/S100$^+$/**Sox10**$^-$/Shisa3$^+$/**Col9a2**$^+$ populations. ***These populations are FAP-like cells.***
* (B) CD34$^+$/S100$^+$/**Sox10**$^+$/Shisa3$^+$?/**Col9a2**$^-$ populations. ***These are Schwann-like cells.***
* (C) CD34$^+$/**S100**$^-$/**Sox10**$^-$/Shisa3$^+$/**Col9a2**$^+$ populations. We will ignore these cells so far.

In this section we are going to isolate these populations and characterize them. In Scott et al. only the Sox10$^-$ / Col9a2$^+$ is available.

**How will we work in this section?**
We are going to run DEGs on each of the possible populations (A, B, C) and get the first 700 DEGs from the 
analysis, for each group. DEGs will be manually filtered on a first screening from the tracksplot. 
After that, we will filter out some of these genes based on the UMAPs: if a gene is too widely expressed or is not really specific for the cluster of interest (it can be for more than 1 cluster from A, B, or C), it *must* be excluded. From there, we will create a more refined version of the list of markers.

Then, we will run each marker list against the rest of datasets, in case some gene has gone unnoticed and is coexpressed in other datasets. Marker lists will be updated accordingly. For example, if marker X was not detected (or was skipped) for De Micheli dataset but it was detected in Oprescu, and it is expressed in both according to the criteria, marker X will be added to a general marker list. With this method we will get a list of markers for each cluster in each dataset.

## Oprescu

In this UMAP we would be interested in clusters 23 and 29, which have Col9a2 and Shisa3 expression. From cluster 29, only a few cells are interesting, and we will keep them.

In [None]:
sc.pl.umap(adata_oprescu_d0, color=['leiden', 'Cd34', 'Sox10', 'S100b', 'Col9a2', 'Shisa3', 'Mpz', 'Ptn'], 
           cmap=magma, ncols=2, legend_loc='on data')

In [None]:
adata_oprescu_d0_sub = adata_oprescu_d0[adata_oprescu_d0.obs['leiden'].isin(['23', '29'])].copy()

In [None]:
sc.pp.filter_genes(adata_oprescu_d0_sub, min_cells=1)
tk.tl.triku(adata_oprescu_d0_sub, n_procs=1, random_state=seed)
sc.pp.pca(adata_oprescu_d0_sub, random_state=seed, n_comps=30)
sc.pp.neighbors(adata_oprescu_d0_sub, random_state=seed, knn=len(adata_oprescu_d0_sub) ** 0.5 // 2, metric='cosine')

After cluster isolation, we reject cluster 2 because it is negative for Col9a2, Shisa3, Sox10 and S100b. From that set, we will discern the clusters 0+1 as the type A kranocytes (Sox10$^-$/Col9a2$^+$/S100$^+$), cluster 3 (partially) (Sox10$^+$/Col9a2$^-$/S100$^+$) as the type B kranocytes and cluster 4 (Sox10$^-$/Col9a2$^+$/S100$^-$) as the type C.

In [None]:
sc.tl.umap(adata_oprescu_d0_sub, min_dist=0.2, random_state=seed)
sc.tl.leiden(adata_oprescu_d0_sub, resolution=1, random_state=seed, key_added='leiden_sub')
sc.pl.umap(adata_oprescu_d0_sub, color=['leiden', 'leiden_sub', 'Cd34', 'Sox10', 'S100b', 
                                        'Col9a2', 'Shisa3', 'Mpz'], cmap=magma, legend_loc='on data', ncols=3)

In [None]:
adata_oprescu_d0.obs['Krano_type'] = 'Other'
adata_oprescu_d0.obs['Krano_type'].loc[adata_oprescu_d0_sub[adata_oprescu_d0_sub.obs['leiden_sub'].isin(['0', '1'])].obs_names.values] = 'A'
adata_oprescu_d0.obs['Krano_type'].loc[adata_oprescu_d0_sub[adata_oprescu_d0_sub.obs['leiden_sub'].isin(['3'])].obs_names.values] = 'B'
adata_oprescu_d0.obs['Krano_type'].loc[adata_oprescu_d0_sub[adata_oprescu_d0_sub.obs['leiden_sub'].isin(['4'])].obs_names.values] = 'C'
adata_oprescu_d0.uns['Krano_type_colors'] = ["#007ab7", "#b7007a", "#e3b10f", "#bcbcbc",]

In [None]:
sc.pl.umap(adata_oprescu_d0, color=['Krano_type'], cmap=magma, ncols=2)

In [None]:
sc.tl.rank_genes_groups(adata_oprescu_d0, groupby='Krano_type',groups=['A', 'B', 'C'], reference='rest')
sc.pl.rank_genes_groups_tracksplot(adata_oprescu_d0, dendrogram=False, n_genes=700, )

In [None]:
markers_A_oprescu = ['6030408B16Rik', 'Agt', 'Arhgdig', 
                     'Cd300lg', 'Cd38', 'Cdh19', 'Cdkn2b', 'Ch25h', 'Col26a1', 'Col9a2', 
                     'Dlk1', 'Fetub', 'Gfra2', 'Gli1',  'Gm11681', 'Greb1', 'Gria1', 'Grin2b', 
                     'Kank4', 'Kcnb2', 'Mpzl2', 'Ngfr', 'Nipal1', 'Plxdc1', 'Rasgrp2', 'Reln', 
                     'Saa1', 'Sdc3', 'Shisa3', 'Sipa1l1', 'Sox9', 'Tenm2', 'Trpm6', ]

markers_B_oprescu = ['9530059O14Rik', 'Aatk', 'Cldn19', 'Cmtm5', 'Ddn', 'Dusp15', 'Elovl7', 
                     'Ephb6', 'Fa2h', 'Fxyd3', 'Gjb1', 'Gjc3', 'Gpr37l1', 'Hepacam', 
                     'Kcna1', 'Kcnk1', 'Kif1a', 'Mag', 'Mal', 'Mansc4', 'Moxd1', 'Mpz', 'Mt3', 'Nes', 
                     'Pllp', 'Plp1', 'Pou3f1', 'Prx', 'Rimklb', 'S100b', 'Sbspon', 
                     'Sfrp5', 'Slc36a2', 'Slco4a1', 'Smco3', 'Snca', 'Sox10',  
                     'Tenm2', 'Tspan15', 'Ugt8a', 'Vat1l', 'Wnt10a', 'Wnt6', ]

markers_C_oprescu = ['Ano1', 'Cdkn2a', 'Cdkn2b', 'Clic6', 'Col9a2', 'Gjb5', 'Nipal1', 
                     'Rasgrf2', 'Shisa3', 'Spata18', 'Tenm2', ]

In [None]:
sc.pl.umap(adata_oprescu_d0, color=['Krano_type'] + markers_B_oprescu, 
           cmap=magma, ncols=3, legend_loc='on data')

## Scott

In this dataset type A kranos appear (Sox10$^+$/Col9a2$^-$/S100$^+$) merged at the bottom of cluster 6, and top of 13. We will simply isolate them. Also, type B kranos appear at the bottom of the cluster 17.

In [None]:
sc.pl.umap(adata_scott_d0, color=['leiden', 'Cd34', 'Sox10', 'S100b', 'Col9a2', 'Shisa3', 'Ptn'], 
           cmap=magma, ncols=2, legend_loc='on data')

In [None]:
adata_scott_d0_sub = adata_scott_d0[adata_scott_d0.obs['leiden'].isin(['6', '17'])].copy()

In [None]:
sc.pp.filter_genes(adata_scott_d0_sub, min_cells=1)
tk.tl.triku(adata_scott_d0_sub, n_procs=1, random_state=seed)
sc.pp.pca(adata_scott_d0_sub, random_state=seed, n_comps=30)
sc.pp.neighbors(adata_scott_d0_sub, random_state=seed, knn=len(adata_scott_d0_sub) ** 0.5 // 2, metric='cosine')

When selectiong clusters 6 and 17 from the dataset, the new clustering shows that most of the cells of interest are located in cluster 6 (B) and clusters 5 and 7 (A). It is true that it is not exactly like that (part of cluster 6 should belong to A) but I cannot further subdivided some clusters.

In [None]:
sc.tl.umap(adata_scott_d0_sub, min_dist=0.05, random_state=seed)
sc.tl.leiden(adata_scott_d0_sub, resolution=1.5, random_state=seed, key_added='leiden_sub')
sc.pl.umap(adata_scott_d0_sub, color=['leiden', 'leiden_sub', 'Cd34', 'S100b', 
                                        'Col9a2', 'Shisa3', 'Lypd2', 'Itgb4'], cmap=magma, legend_loc='on data', ncols=3)

In [None]:
adata_scott_d0.obs['Krano_type'] = 'Other'
adata_scott_d0.obs['Krano_type'].loc[adata_scott_d0_sub[adata_scott_d0_sub.obs['leiden_sub'].isin(['5', '7'])].obs_names.values] = 'A'
adata_scott_d0.obs['Krano_type'].loc[adata_scott_d0_sub[adata_scott_d0_sub.obs['leiden_sub'].isin(['6'])].obs_names.values] = 'B'
adata_scott_d0.uns['Krano_type_colors'] = ["#007ab7", "#b7007a", "#bcbcbc"]

In [None]:
sc.pl.umap(adata_scott_d0, color=['Krano_type'], cmap=magma, ncols=2)

In [None]:
sc.tl.rank_genes_groups(adata_scott_d0, groupby='Krano_type',groups=['A', 'B'], reference='rest')
sc.pl.rank_genes_groups_tracksplot(adata_scott_d0, dendrogram=False, n_genes=700, )

In [None]:
markers_A_scott = ['6030408B16Rik', 'Adamtsl2', 'Aspa', 'Col9a2',  'Dlk1', 'Fam213a', 
                   'Gm3336', 'Gprasp2', 'Grin2b', 'Hmgcs2', 'Kcnk2', 'Pla2g7', 'Plxnc1', 
                   'Rgs17', 'Saa1', 'Sbspon', 'Shisa3', 'Sipa1l1', 'Slc27a1', 'Stra6', 
                   'Thrsp', 'Trpm6',  ]

markers_B_scott = ['Col23a1', 'Itga6', 'Itgb4', 'Lypd2', 'Moxd1', 'Mpzl2', 'Perp', 'Prodh', 
                   'Ptch1', 'Slc2a1', 'Sostdc1', 'Tenm2',]

In [None]:
sc.pl.umap(adata_scott_d0, color=['Krano_type'] + markers_A_scott, 
           cmap=magma, ncols=3, legend_loc='on data')

## De Micheli

In this dataset we seem to find the type A (Sox10$^-$/Col9a2$^+$/S100$^+$) within clusters 2 and 6 (we assume they are S100$^+$ because there is a general low expression within the clusters), and a set of cells near cluster 15 (Sox10$^+$/Col9a2$^-$/S100$^+$) as the type B kranocytes. Clusters 15 and 17 are, respectively, Schwann and Neural/Glial cells. However, the cells near the Schwann cluster are Cd34$^+$, Shisa3$^+$ and have a smaller expression of Sox10 and Mpz, which might be indicative of another type of cell type, related to a Schwann cell. These findings are more or less consistent with the B type from Oprescu and Scott.

In [None]:
sc.pl.umap(adata_de_micheli_mouse_d0, color=['leiden', 'Cd34', 'Sox10', 'S100b', 'Col9a2', 'Shisa3', 
                                             'Mpz', 'Ptn'], 
           cmap=magma, ncols=2, legend_loc='on data')

In [None]:
adata_de_micheli_mouse_d0_sub = adata_de_micheli_mouse_d0[adata_de_micheli_mouse_d0.obs['leiden'].isin(['2', '6', '15', '17'])].copy()

In [None]:
sc.pp.filter_genes(adata_de_micheli_mouse_d0_sub, min_cells=1)
tk.tl.triku(adata_de_micheli_mouse_d0_sub, n_procs=1, random_state=seed)
sc.pp.pca(adata_de_micheli_mouse_d0_sub, random_state=seed, n_comps=30)
sc.pp.neighbors(adata_de_micheli_mouse_d0_sub, random_state=seed, knn=len(adata_de_micheli_mouse_d0_sub) ** 0.5 // 2, metric='cosine')

After reclustering, we see that subcluster 5 are the type A kranocytes, while subcluster 8 are the type B kranocytes.

In [None]:
sc.tl.umap(adata_de_micheli_mouse_d0_sub, min_dist=0.2, random_state=seed)
sc.tl.leiden(adata_de_micheli_mouse_d0_sub, resolution=1.3, random_state=seed, key_added='leiden_sub')
sc.pl.umap(adata_de_micheli_mouse_d0_sub, color=['leiden', 'leiden_sub', 'Cd34', 'Sox10', 'S100b', 
                                        'Col9a2', 'Shisa3'], cmap=magma, legend_loc='on data', ncols=3)

In [None]:
adata_de_micheli_mouse_d0.obs['Krano_type'] = 'Other'
adata_de_micheli_mouse_d0.obs['Krano_type'].loc[adata_de_micheli_mouse_d0_sub[adata_de_micheli_mouse_d0_sub.obs['leiden_sub'].isin(['5'])].obs_names.values] = 'A'
adata_de_micheli_mouse_d0.obs['Krano_type'].loc[adata_de_micheli_mouse_d0_sub[adata_de_micheli_mouse_d0_sub.obs['leiden_sub'].isin(['8'])].obs_names.values] = 'B'
adata_de_micheli_mouse_d0.uns['Krano_type_colors'] = ["#007ab7", "#b7007a", "#bcbcbc"]

In [None]:
sc.pl.umap(adata_de_micheli_mouse_d0, color=['Krano_type'], cmap=magma, ncols=2)

In [None]:
sc.tl.rank_genes_groups(adata_de_micheli_mouse_d0, groupby='Krano_type',groups=['A', 'B'], reference='rest')
sc.pl.rank_genes_groups_tracksplot(adata_de_micheli_mouse_d0, dendrogram=False, n_genes=700, )

In [None]:
markers_A_de_micheli = ['6030408B16Rik', 'Adamtsl2', 'Bmp7', 'Capn6', 
                        'Col18a1', 'Col9a2', 'Dlk1', 'Fetub', 'Gfra2', 'Gli1', 'Gm11681', 
                        'Gpld1', 'Inhba', 'Mdfi', 'Mest', 'Morc4', 'Nipal1', 'Plppr4', 
                        'Rgs17', 'Saa1', 'Saa2', 'Shisa3', 'Sorcs2', 'Sox9', 'Sphkap', 
                        'Syndig1', 'Trpm6']

markers_B_de_micheli = ['Cldn1', 'Crabp2', 'Dleu7', 'Efnb3', 'Gfra3', 'Gjb5', 'Grin2b', 
                        'Kcnj13', 'Kcnj2', 'Lgals7', 'Lypd2', 'Mansc4', 'Moxd1', 
                        'Perp', 'RP23-291B1.2', 'Shisa3', 'Slc6a13', 'Spink1', 'Srcin1', 'Tec', 'Tenm2', 
                        'Trim46', 'Wnt10a', 'Wnt6']

In [None]:
sc.pl.umap(adata_de_micheli_mouse_d0, color=['Krano_type'] + markers_B_de_micheli, 
           cmap=magma, ncols=3, legend_loc='on data')

## Giordani

In [None]:
adata_giordani_FAPs = adata_giordani[adata_giordani.obs['cell_type'].isin(['FAPs', 'Glial cells'])]

In [None]:
sc.pp.filter_genes(adata_giordani_FAPs, min_cells=1)

In [None]:
tk.tl.triku(adata_giordani_FAPs, n_procs=1, random_state=seed)
sc.pp.pca(adata_giordani_FAPs, random_state=seed, n_comps=30)
sc.pp.neighbors(adata_giordani_FAPs, random_state=seed, knn=len(adata_giordani_FAPs) ** 0.5 // 2, metric='cosine')

In [None]:
sc.tl.umap(adata_giordani_FAPs, min_dist=0.5, random_state=seed)
sc.tl.leiden(adata_giordani_FAPs, resolution=1, random_state=seed)
sc.pl.umap(adata_giordani_FAPs, color=['leiden', 'cell_type', 'batch', 'n_genes_by_counts'], legend_loc='on data')

In [None]:
sc.tl.rank_genes_groups(adata_giordani_FAPs, groupby='leiden', groups=['7', '9', '10', '11', '14'], reference='rest')
sc.pl.rank_genes_groups_tracksplot(adata_giordani_FAPs, dendrogram=False, n_genes=100)

In [None]:
sc.pl.umap(adata_giordani_FAPs, color=['leiden', 'cell_type', 'S100b', 'Sox10', 'Mpz', 'Klk8'], legend_loc='on data', cmap=magma)

In [None]:
adata_giordani.obs['Krano_type'] = 'Other'
adata_giordani.obs['Krano_type'].loc[adata_giordani_FAPs[adata_giordani_FAPs.obs['leiden'].isin(['7', '9', '11', '14'])].obs_names.values] = 'A'
adata_giordani.obs['Krano_type'].loc[adata_giordani_FAPs[adata_giordani_FAPs.obs['leiden'].isin(['10'])].obs_names.values] = 'B'
adata_giordani.uns['Krano_type_colors'] = ["#007ab7", "#b7007a", "#bcbcbc"]

## Filtering common markers

Now that all markers are filtered, we are going to plot all A/B markers in all datasets. If we see that the pattern is correct in 2-3 datasets, then we add it to the list. 

We will try to create a conservative list, that is, a list where markers are as specific to the designed regions as possible. This does not mean that a marker expressed in other cell types is not valid, but we will probably exclude it from this list, so as to have specific markers of these putative cell types, and not others.

In [None]:
# All A markers combined. Some of these genes will be excluded because they are only expressed in one dataset,
# or are not as specific as they should.

for i in sorted(set(markers_A_de_micheli + markers_A_oprescu + markers_A_scott)):
    print(i)
    fig, axs = plt.subplots(1, 3, figsize=(18, 4))
    sc.pl.umap(adata_de_micheli_mouse_d0, color=i, cmap=magma, ax=axs[0], show=False)
    sc.pl.umap(adata_oprescu_d0, color=i, cmap=magma, ax=axs[1], show=False)
    sc.pl.umap(adata_scott_d0, color=i, cmap=magma, ax=axs[2])

In [None]:
# All B markers combined. Some of these genes will be excluded because they are only expressed in one dataset,
# or are not as specific as they should.

for i in sorted(set(markers_B_de_micheli + markers_B_oprescu + markers_B_scott)):
    print(i)
    fig, axs = plt.subplots(1, 3, figsize=(18, 4))
    try:
        sc.pl.umap(adata_de_micheli_mouse_d0, color=i, cmap=magma, ax=axs[0], show=False)
        sc.pl.umap(adata_oprescu_d0, color=i, cmap=magma, ax=axs[1], show=False)
        sc.pl.umap(adata_scott_d0, color=i, cmap=magma, ax=axs[2])
    except:
        pass

## Plotting A and B filtered markers

These marker should be either specific of the cluster of interest, or should be highly expressed in the cluster.

In [None]:
A_markers = ['6030408B16Rik', 'Adamtsl2', 'Cdh19', 'Cdkn2b', 'Col18a1', 'Col26a1', 
             'Col9a2', 'Dlk1', 'Fetub', 'Gfra2', 'Gm11681', 'Gpld1', 'Greb1', 'Gria1', 
             'Kcnb2', 'Kcnk2', 'Mpzl2', 'Ngfr', 'Plppr4', 
             'Ptgfr', 'Rgs17', 'Saa1', 'Saa2', 'Shisa3', 'Sipa1l1', 'Sorcs2', 'Sox9', 
             'Sphkap', 'Syndig1', 'Trpm6']
B_markers = ['Cldn1', 'Crabp2', 'Dleu7', 'Efnb3', 'Gjb5', 'Grin2b', 'Itgb4', 'Kcnj13', 
             'Kcnj2', 'Lgals7', 'Lypd2', 'Mansc4', 'Moxd1', 'Mpzl2', 'Perp', 'Prodh', 'Ptch1', 
             'Slc6a13', 'Stra6', 'Tec', 'Tenm2', 'Wnt10a', 'Wnt6']

In [None]:
for i in ['Krano_type'] + A_markers:
    print(i)
    fig, axs = plt.subplots(2, 2, figsize=(12, 8))
    try:
        sc.pl.umap(adata_de_micheli_mouse_d0, color=i, cmap=magma, ax=axs[0][0], show=False, legend_loc='on data')
        sc.pl.umap(adata_oprescu_d0, color=i, cmap=magma, ax=axs[0][1], show=False, legend_loc='on data')
        sc.pl.umap(adata_scott_d0, color=i, cmap=magma, ax=axs[1][0], show=False, legend_loc='on data')
        sc.pl.umap(adata_giordani, color=i, cmap=magma, ax=axs[1][1], legend_loc='on data')
    except:
        pass

In [None]:
for i in ['Krano_type'] + B_markers:
    print(i)
    fig, axs = plt.subplots(2, 2, figsize=(12, 8))
    try:
        sc.pl.umap(adata_de_micheli_mouse_d0, color=i, cmap=magma, ax=axs[0][0], show=False, legend_loc='on data')
        sc.pl.umap(adata_oprescu_d0, color=i, cmap=magma, ax=axs[0][1], show=False, legend_loc='on data')
        sc.pl.umap(adata_scott_d0, color=i, cmap=magma, ax=axs[1][0], show=False, legend_loc='on data')
        sc.pl.umap(adata_giordani, color=i, cmap=magma, ax=axs[1][1], legend_loc='on data')
    except:
        pass

## Detecting membrane markers

The next step is to mark which genes are expressed in the membrane, so that they can be selected via FACS (or whatever extra analysis that could be done). 
To do that, we will download the Swiss-Prot proteome table, where the celullar location of the protein is included. 

The table can be downloaded from here:
https://www.uniprot.org/uniprot/?query=*&fil=organism%3A%22Mus+musculus+%28Mouse%29+%5B10090%5D%22+AND+reviewed%3Ayes

The columns to be selected are "Gene names" and "Subcellular location".

In [None]:
# Load the table
uniprot_df = pd.read_csv(data_dir+'/Uniprot_table.tab', sep='\t')

# Process table to have a gene per row:
list_genes = []
list_locs = []

for i in range(len(uniprot_df)):
    gene_str = uniprot_df['Gene names'].iloc[i]
    loc_str = uniprot_df['Subcellular location [CC]'].iloc[i]
    if type(loc_str) == str:
        loc_str = loc_str.replace('SUBCELLULAR LOCATION:', '')
    else:
        loc_str = ''

    try:
        list_genes_i = gene_str.split(' ')    
        for gene in list_genes_i:
            list_genes.append(gene)
            list_locs.append(loc_str)
    except:
        pass

uniprot_df = pd.DataFrame({'Gene': list_genes, 'Location': list_locs}).set_index('Gene', drop=True)
uniprot_df = uniprot_df.groupby(level=0).transform('sum').drop_duplicates()

In [None]:
df_A_markers = pd.DataFrame({'Gene': A_markers, 'Location': [''] * len(A_markers)})

for A_idx, A in enumerate(A_markers):
    if A in uniprot_df.index.values:
        df_A_markers['Location'].iloc[A_idx] = uniprot_df.loc[A, 'Location']
        
        
df_B_markers = pd.DataFrame({'Gene': B_markers, 'Location': [''] * len(B_markers)})

for B_idx, B in enumerate(B_markers):
    if B in uniprot_df.index.values:
        df_B_markers['Location'].iloc[B_idx] = uniprot_df.loc[B, 'Location']

In [None]:
pd.set_option('display.max_colwidth', -1)

In [None]:
df_A_markers 

In [None]:
df_B_markers

In [None]:
# Marcadores gliales que cambian
for i in ['Krano_type'] + ['Ngfr', 'Gpc1', 'Tubb2b', 'Tubb5', 'Cryab', 'Tuba1a', 'Tnc', 'Plk2', 
                          'Tgfbi', 'Lgals1', 'Lgals3', 'Syt4', 'Ucn2', 'Gas1', 'Mmp19', 'Vim', 'Arbp1', 
                          'Col18a1', 'Cpe', 'Uchl1', 'Gadd45a', 'Igfbp5', 'Atf3', 'Tmem158', 
                          'Apod', 'Psap', 'Stmn1', 'Epha5', 'Entpd2', 'Nav2', 'Oaf', 'Fgf5']:
    print(i)
    fig, axs = plt.subplots(1, 3, figsize=(18, 4))
    try:
        sc.pl.umap(adata_de_micheli_mouse_d0, color=i, cmap=magma, ax=axs[0], show=False, legend_loc='on data')
        sc.pl.umap(adata_oprescu_d0, color=i, cmap=magma, ax=axs[1], show=False, legend_loc='on data')
        sc.pl.umap(adata_scott_d0, color=i, cmap=magma, ax=axs[2], legend_loc='on data')
    except:
        pass

In [None]:
# Marcadores FAPs que cambian
for i in ['Krano_type'] + ['Timp1', 'Sh3bgrl3', 'Lgals1', 'Spp1', 'Inhba', 'Ctgrc1', 'Ccl9', 'Ank', 'Tgfbi', 
                           'Fn1', 'Tnc', 'Ier3', 'Il11', 'Trf', 'Aldh1a3', 'Bgn', 'Mgp', 'Igfbp4', 
                           'Lgals3', 'Crif1', 'Serpine2', 'Scd1', 'Igfbp7', 'Thy1', 'Pdgfa', 'Postn', 
                           'Ptgs2', '1810011O10Rik', 'Rdh10', 'Neat1', 'Igf1', 'Sdc1', 'Cxcl14', 'Cxcl5']:
    print(i)
    fig, axs = plt.subplots(1, 3, figsize=(18, 4))
    try:
        sc.pl.umap(adata_de_micheli_mouse_d0, color=i, cmap=magma, ax=axs[0], show=False, legend_loc='on data')
        sc.pl.umap(adata_oprescu_d0, color=i, cmap=magma, ax=axs[1], show=False, legend_loc='on data')
        sc.pl.umap(adata_scott_d0, color=i, cmap=magma, ax=axs[2], legend_loc='on data')
    except:
        pass

# Running datasets against markers

## Kumar et al 2017 PC1 and PC2

In [None]:
list_genes = ['Krano_type'] + ['Rgs5', 'Acta2', 'Cxcl1', 'Cxcl2', 'Cxcl5', 'Il6', 'Il1b']

fig, axs = plt.subplots(len(list_genes), 3, figsize=(18, 4 * len(list_genes)))

for idx, gene in enumerate(list_genes):
    try:
        sc.pl.umap(adata_de_micheli_mouse_d0, color=gene, cmap=magma, ax=axs[idx][0], show=False, legend_loc='on data')
        sc.pl.umap(adata_oprescu_d0, color=gene, cmap=magma, ax=axs[idx][1], show=False, legend_loc='on data')
        sc.pl.umap(adata_scott_d0, color=gene, cmap=magma, ax=axs[idx][2], show=False, legend_loc='on data')
    except:
        pass

## Kumar et al 2017 Capillary proinflammatory/capillary and contractile/arteriolar PCs

In [None]:
list_genes = ['Krano_type'] + ['Cd274', 'Dlk1', 'Nt5e'] # Cd73 = Nt5e

fig, axs = plt.subplots(len(list_genes), 3, figsize=(18, 4 * len(list_genes)))

for idx, gene in enumerate(list_genes):
    try:
        sc.pl.umap(adata_de_micheli_mouse_d0, color=gene, cmap=magma, ax=axs[idx][0], show=False, legend_loc='on data')
        sc.pl.umap(adata_oprescu_d0, color=gene, cmap=magma, ax=axs[idx][1], show=False, legend_loc='on data')
        sc.pl.umap(adata_scott_d0, color=gene, cmap=magma, ax=axs[idx][2], show=False, legend_loc='on data')
    except:
        pass

## Birbrair PCs

In [None]:
list_genes = ['Krano_type'] + ['Pdgfrb', 'Mcam', 'Cspg4', 'Nes'] # Cd146 = Mcam, Ng2 = Cspg4

fig, axs = plt.subplots(len(list_genes), 3, figsize=(18, 4 * len(list_genes)))

for idx, gene in enumerate(list_genes):
    try:
        sc.pl.umap(adata_de_micheli_mouse_d0, color=gene, cmap=magma, ax=axs[idx][0], show=False, legend_loc='on data')
        sc.pl.umap(adata_oprescu_d0, color=gene, cmap=magma, ax=axs[idx][1], show=False, legend_loc='on data')
        sc.pl.umap(adata_scott_d0, color=gene, cmap=magma, ax=axs[idx][2], show=False, legend_loc='on data')
    except:
        pass

## Camps ISC

* ISC1: Ly6c1, Cd55
* ISC2: Gdf10, Meox2, F3/Cd142
* ISC3: Thbs4, Fbln7, Sdc1

In [None]:
list_genes = ['Krano_type'] + ['Cd55', 'F3', 'Sdc1']

fig, axs = plt.subplots(len(list_genes), 3, figsize=(18, 4 * len(list_genes)))

for idx, gene in enumerate(list_genes):
    try:
        sc.pl.umap(adata_de_micheli_mouse_d0, color=gene, cmap=magma, ax=axs[idx][0], show=False, legend_loc='on data')
        sc.pl.umap(adata_oprescu_d0, color=gene, cmap=magma, ax=axs[idx][1], show=False, legend_loc='on data')
        sc.pl.umap(adata_scott_d0, color=gene, cmap=magma, ax=axs[idx][2], show=False, legend_loc='on data')
    except:
        pass

# Export adatas

In [None]:
os.makedirs(data_dir + '/processed', exist_ok=True)

In [None]:
adata_de_micheli_mouse_d0.write_h5ad(data_dir + '/processed/de_micheli_mouse_D0.h5ad')

In [None]:
adata_oprescu_d0.write_h5ad(data_dir + '/processed/oprescu_D0.h5ad')

In [None]:
adata_giordani.write_h5ad(data_dir + '/processed/giordani_D0.h5ad')

In [None]:
adata_scott_d0.write_h5ad(data_dir + '/processed/scott_D0.h5ad')

# Beautiful figs

In [None]:
if not os.path.exists(fig_dir + 'clusters/'): os.makedirs(fig_dir + 'clusters/')

In [None]:
# MPL config
font = {'family' : 'normal',
        'weight' : 'light',
        'size'   : 15}

mpl.rc('font', **font)

In [None]:
def makefig(list_genes, name_order=None, adata_list=[adata_oprescu_d0, adata_scott_d0, adata_de_micheli_mouse_d0, adata_giordani], 
           list_datasets = ['Oprescu', 'Scott', 'De Micheli', 'Giordani']):
    n_cols = len(adata_list)
    fig, axs = plt.subplots(len(list_genes), n_cols, figsize=(6 * n_cols, 4 * len(list_genes)))
    
        
    for idx, gene in enumerate(list_genes):
        try:
            for adata_idx, adata in enumerate(adata_list):
                sc.pl.umap(adata, color=gene, cmap=magma, ax=axs[idx][adata_idx], show=False, legend_fontsize=11)
        except:
            raise
               
    for ax_idx, ax in enumerate(axs.ravel()):
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        ax.spines['bottom'].set_visible(False)
        if ax_idx % n_cols == 0:
            ax.spines['left'].set_visible(False)
            
        ax.set_xlabel('')
        
        if ax_idx % n_cols == 0:
            ax.set_ylabel(list_genes[ax_idx // n_cols])
        else:
            ax.set_ylabel('')
            
        if ax_idx in range(len(list_datasets)):
            ax.set_title(list_datasets[ax_idx])
        else:
            ax.set_title('')
            
        # legend unification 
        if ax_idx % n_cols == 0:
            dict_legends = {}
        
        try:
            legend = ax.get_legend()
            names, handles = [str(x._text) for x in legend.texts], legend.legendHandles  
            for i in range(len(names)):
                dict_legends[names[i]] = handles[i]
            if ax_idx % n_cols != n_cols - 1:
                ax.get_legend().remove()
            else:
                if name_order is None:
                    ax.legend(dict_legends.values(), dict_legends.keys(), bbox_to_anchor=(1.05, 1), frameon=False, prop={'size': 11})
                else:
                    ax.legend([dict_legends[i] for i in name_order if i in dict_legends.keys()], 
                              [i for i in name_order if i in dict_legends.keys()], bbox_to_anchor=(1.05, 1), frameon=False, prop={'size': 11})
        except:
            pass 
        
    
    plt.tight_layout()
    plt.savefig(fig_dir + 'clusters/' + '-'.join(list_genes) + '.png', dpi=500)
    plt.savefig(fig_dir + 'clusters/' + '-'.join(list_genes) + '.pdf')
    


In [None]:
name_order = ['Endothelial', 'Pericyte', 'Fibroblast', 'FAP', 'Tenocyte', 'Neural cell',
                                                          'Myonuclei', 'MuSC', 'Immune', 'APC / Proliferative ICs', 
                                                          'Monocyte', 'Neutrophil', 'Myeloid', 'B cell', 'T cell', 'A', 'B', 'C', 'Other']

In [None]:
makefig(['cell_type', 'Krano_type'], name_order=name_order)

In [None]:
# Cluster 7
makefig(['cell_type', 'Krano_type', 'Cxcl14', 'G0s2', 'Adamtsl2', 'Saa1', 'Thrsp'], name_order=name_order)

In [None]:
makefig(['Krano_type', 'Cd34', 'S100b'], name_order=name_order)
makefig(['Krano_type', 'Pdgfrb', 'Sox10'], name_order=name_order)
makefig(['Krano_type', 'Ngfr', 'Cspg4'], name_order=name_order)
makefig(['Krano_type', 'Col9a2', 'Shisa3'], name_order=name_order)

In [None]:
makefig(['Krano_type', '6030408B16Rik', 'Col18a1'], name_order=name_order)
makefig(['Krano_type', 'Col9a2', 'Cldn1'], name_order=name_order)
makefig(['Krano_type', 'Dlk1', 'Fetub'], name_order=name_order)
makefig(['Krano_type', 'Gpld1', 'Grin2b'], name_order=name_order)
makefig(['Krano_type', 'Kcnb2', 'Lypd2'], name_order=name_order)
makefig(['Krano_type', 'Mansc4', 'Nipal1'], name_order=name_order)
makefig(['Krano_type', 'Saa1', 'Shisa3'], name_order=name_order)
makefig(['Krano_type', 'Tenm2', 'Trpm6'], name_order=name_order)

In [None]:
makefig(['Tnc', 'Tnmd', 'Nipal1', 'Dlk1'], name_order=name_order, 
        adata_list=[adata_oprescu_d0, adata_oprescu_d2, adata_oprescu_d35, adata_oprescu_d5, adata_oprescu_d10, adata_oprescu_d21], 
        list_datasets=['D0', 'D2', 'D3.5', 'D5', 'D10', 'D21'])