# Mouse / human comparison

For this notebook **you need to run the 4M and 4H notbeooks previously!!**.

For part of the analysis **you need to run the 0A notbeook** too. This part is not strictly necessary to be run, but instead it helps me do part of some gene analyses quicker. If you have trouble loading content from this notebook, ignore it because the analysis will be the same.

In this notebook we are going to analyse the similarities and differences between mouse and human skin fibroblast populations. We already know that mouse and human skin are different between them, but we want to know how much of this translates into the transcriptomic realm of single cell.

To do this analysis we are going to do two analyses. 
* Choose datasets with human and mouse samples from the same lab (e.g., Boothby, Vorstandlechner) and try to find overlaps in the populations.
* Map human/mouse genes between them using an homology databe (MGI) and try to find similarities between the list of markers of human and mouse populations. 

## imports

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import scanpy as sc
import scanpy.external as sce
import pandas as pd
import numpy as np
import os
import triku as tk
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
from tqdm.notebook import tqdm
import scipy.sparse as spr
import matplotlib.cm as cm
import networkx as nx
from matplotlib import pylab

In [None]:
!pip install munkres
from munkres import Munkres

In [None]:
# local imports and imports from other notebooks
from cellassign import assign_cats
from fb_functions import make_gene_scoring_with_expr, plot_score_graph, plot_UMAPS_gene, plot_adata_cluster_properties
%store -r dict_colors_human
%store -r dict_colors_mouse

dict_colors_human_mouse = {**dict_colors_human , **dict_colors_mouse}

%store -r seed
%store -r magma
%store -r data_dir

In [None]:
%store -r dict_make_gene_scoring_robust
%store -r dict_make_gene_scoring_axis_robust

In [None]:
# FROM NOTEBOOK 0A!!
%store -r df_human_genes_codes
%store -r df_mouse_genes_codes

In [None]:
%store -r plot_params

pylab.rcParams.update(plot_params)
pd.set_option('display.max_columns', None)
pd.options.display.float_format = "{:,.2f}".format

In [None]:
def join_fbs_adatas(adata_full, adata_fb):
    cell_types = adata_full.obs['assigned_cats'].copy().astype(str)
    intersect_idx = np.intersect1d(adata_fb.obs_names, adata_full.obs_names)
    cell_types[intersect_idx] = [f'fibro_{i}' for i in adata_fb[intersect_idx].obs['cluster']]
    adata_full.obs['full_cell_type'] = cell_types.astype('category')

## Creating the mouse-human gene homology dictionary

In [None]:
# !cd results && wget http://www.informatics.jax.org/downloads/reports/HOM_AllOrganism.rpt

In [None]:
df = pd.read_csv('results/HOM_AllOrganism.rpt', sep='\t')
df = df[df['Common Organism Name'].isin(['mouse, laboratory', 'human'])][['DB Class Key', 'Common Organism Name', 'Symbol']].reset_index(drop=True)

list_DB = set(df['DB Class Key'].values)

dict_mouse_human = {}
for el in tqdm(list_DB):
    df_sub = df[df['DB Class Key'] == el].sort_values(by='Common Organism Name')
    if len(df_sub) == 2:
        dict_mouse_human[df_sub.iloc[1, 2]] = df_sub.iloc[0, 2]

In [None]:
dict_human_mouse = dict(zip(dict_mouse_human.values(), dict_mouse_human.keys()))

# Comparison of UMAPs of populations

In this section we are going to compare mouse and human datasets of 
* Datasets from the same laboratory
* Datasets from diferent laboratories (more confirmatory)

To do this we are going to translate the mouse into human genes and get the subset of genes that have homology and are present in the human adata.

### Boothby

In [None]:
boothby_2021_dir = data_dir + '/boothby_2021'
boothby_2021_ctrl_mouse_fb = sc.read(boothby_2021_dir + '/boothby_2021_ctrl_mouse_fb_robust.h5')
boothby_2021_ctrl_human_fb = sc.read(boothby_2021_dir + '/boothby_2021_ctrl_human_fb_robust.h5')

boothby_2021_ctrl_mouse_fb_raw = sc.read(boothby_2021_dir + '/boothby_2021_mouse_ctrl_mouse.h5')
boothby_2021_ctrl_human_fb_raw = sc.read(boothby_2021_dir + '/boothby_2021_ctrl_human.h5')

In [None]:
boothby_2021_ctrl_mouse_fb.X = boothby_2021_ctrl_mouse_fb_raw[boothby_2021_ctrl_mouse_fb.obs_names, boothby_2021_ctrl_mouse_fb.var_names].X.copy()
boothby_2021_ctrl_human_fb.X = boothby_2021_ctrl_human_fb_raw[boothby_2021_ctrl_human_fb.obs_names, boothby_2021_ctrl_human_fb.var_names].X.copy()

In [None]:
adata_mouse_genes, adata_human_genes = boothby_2021_ctrl_mouse_fb.var_names, boothby_2021_ctrl_human_fb.var_names
mouse_selected_genes, human_homolog, human_mouse_gene = [], [], []

for i in adata_mouse_genes:
    if i in dict_mouse_human:
        if (dict_mouse_human[i] in adata_human_genes) & (dict_mouse_human[i] not in human_homolog):
            mouse_selected_genes.append(i); human_homolog.append(dict_mouse_human[i]); human_mouse_gene.append(f'{dict_mouse_human[i]} | {i}')

In [None]:
boothby_2021_ctrl_human_fb, boothby_2021_ctrl_mouse_fb = boothby_2021_ctrl_human_fb[:, human_homolog], boothby_2021_ctrl_mouse_fb[:, mouse_selected_genes]
boothby_2021_ctrl_human_fb.var_names, boothby_2021_ctrl_mouse_fb.var_names = human_mouse_gene, human_mouse_gene
boothby_2021_ctrl_human_mouse_fb = sc.AnnData.concatenate(boothby_2021_ctrl_human_fb, boothby_2021_ctrl_mouse_fb, batch_categories=['human', 'mouse'], batch_key='organism')

In [None]:
sc.pp.log1p(boothby_2021_ctrl_human_mouse_fb)

In [None]:
sc.pp.pca(boothby_2021_ctrl_human_mouse_fb, random_state=seed, n_comps=50)
sce.pp.harmony_integrate(boothby_2021_ctrl_human_mouse_fb, key='Internal sample identifier', max_iter_harmony=50)
sc.pp.neighbors(boothby_2021_ctrl_human_mouse_fb, use_rep='X_pca_harmony', n_neighbors=int(0.5 * len(boothby_2021_ctrl_human_mouse_fb) ** 0.5 // 4), metric='cosine')
tk.tl.triku(boothby_2021_ctrl_human_mouse_fb)

In [None]:
boothby_2021_ctrl_human_mouse_fb.var_names[boothby_2021_ctrl_human_mouse_fb.var['highly_variable'] == True]

In [None]:
sc.tl.umap(boothby_2021_ctrl_human_mouse_fb, min_dist=0.2, random_state=seed)

In [None]:
boothby_2021_ctrl_human_mouse_fb.obs['cluster'] = boothby_2021_ctrl_human_mouse_fb.obs['cluster'].astype('category')
boothby_2021_ctrl_human_mouse_fb.uns['cluster_colors'] = [dict_colors_human_mouse[i] if i in dict_colors_human_mouse else '#bcbcbc' for  i in boothby_2021_ctrl_human_mouse_fb.obs['cluster'].cat.categories]

In [None]:
sc.pp.subsample(boothby_2021_ctrl_human_mouse_fb, fraction=1, random_state=0, copy=False)
sc.pl.umap(boothby_2021_ctrl_human_mouse_fb, color=['cluster', 'organism'], legend_loc='on data', frameon=False)

In [None]:
sc.tl.rank_genes_groups(boothby_2021_ctrl_human_mouse_fb, groupby='organism')
sc.pl.umap(boothby_2021_ctrl_human_mouse_fb, color=['organism'], legend_loc='on data')
sc.pl.rank_genes_groups_tracksplot(boothby_2021_ctrl_human_mouse_fb, groupby='organism', n_genes=10, dendrogram=False)

In [None]:
adata_regress = sc.pp.regress_out(boothby_2021_ctrl_human_mouse_fb, 'organism', n_jobs=32, copy=True)

In [None]:
sc.pp.pca(adata_regress, random_state=seed, n_comps=50)
sce.pp.harmony_integrate(adata_regress, key='Internal sample identifier', max_iter_harmony=50)
sc.pp.neighbors(adata_regress, use_rep='X_pca_harmony', n_neighbors=int(0.5 * len(adata_regress) ** 0.5 // 4), metric='cosine')

sc.tl.umap(adata_regress)
sc.pl.umap(adata_regress, color=['cluster', 'organism'], legend_loc='on data', frameon=False)

### Vorstandlechner

In [None]:
vorstandlechner_2021_dir = data_dir + '/Vorstandlechner_2021'
vorstandlechner_2021_ctrl_human_fb = sc.read(f"{vorstandlechner_2021_dir}/vors_2021_ctrl_human_fb_robust.h5")
vorstandlechner_2021_ctrl_mouse_fb = sc.read(f"{vorstandlechner_2021_dir}/vorstandlechner_2021_ctrl_mouse_fb_robust.h5")

vorstandlechner_2021_ctrl_human_fb_raw = sc.read(f"{vorstandlechner_2021_dir}/vorstandlechner_2021_ctrl_human.h5")
vorstandlechner_2021_ctrl_mouse_fb_raw = sc.read(f"{vorstandlechner_2021_dir}/vorstandlechner_2021_ctrl_mouse.h5")

In [None]:
vorstandlechner_2021_ctrl_human_fb.X = vorstandlechner_2021_ctrl_human_fb_raw[vorstandlechner_2021_ctrl_human_fb.obs_names, vorstandlechner_2021_ctrl_human_fb.var_names].X.copy()
vorstandlechner_2021_ctrl_mouse_fb.X = vorstandlechner_2021_ctrl_mouse_fb_raw[vorstandlechner_2021_ctrl_mouse_fb.obs_names, vorstandlechner_2021_ctrl_mouse_fb.var_names].X.copy()

In [None]:
adata_mouse_genes, adata_human_genes = vorstandlechner_2021_ctrl_mouse_fb.var_names, vorstandlechner_2021_ctrl_human_fb.var_names
mouse_selected_genes, human_homolog, human_mouse_gene = [], [], []

for i in adata_mouse_genes:
    if i in dict_mouse_human:
        if (dict_mouse_human[i] in adata_human_genes) & (dict_mouse_human[i] not in human_homolog):
            mouse_selected_genes.append(i); human_homolog.append(dict_mouse_human[i]); human_mouse_gene.append(f'{dict_mouse_human[i]} | {i}')

In [None]:
vorstandlechner_2021_ctrl_human_fb, vorstandlechner_2021_ctrl_mouse_fb = vorstandlechner_2021_ctrl_human_fb[:, human_homolog], vorstandlechner_2021_ctrl_mouse_fb[:, mouse_selected_genes]
vorstandlechner_2021_ctrl_human_fb.var_names, vorstandlechner_2021_ctrl_mouse_fb.var_names = human_mouse_gene, human_mouse_gene
vorstandlechner_2021_ctrl_human_mouse_fb = sc.AnnData.concatenate(vorstandlechner_2021_ctrl_human_fb, vorstandlechner_2021_ctrl_mouse_fb, batch_categories=['human', 'mouse'], batch_key='organism')

In [None]:
sc.pp.log1p(vorstandlechner_2021_ctrl_human_mouse_fb)

In [None]:
sc.pp.pca(vorstandlechner_2021_ctrl_human_mouse_fb, random_state=seed, n_comps=50)
sce.pp.harmony_integrate(vorstandlechner_2021_ctrl_human_mouse_fb, key='Internal sample identifier', max_iter_harmony=50)
sc.pp.neighbors(vorstandlechner_2021_ctrl_human_mouse_fb, use_rep='X_pca_harmony', n_neighbors=int(0.5 * len(vorstandlechner_2021_ctrl_human_mouse_fb) ** 0.5 // 4), metric='cosine')
tk.tl.triku(vorstandlechner_2021_ctrl_human_mouse_fb)

In [None]:
vorstandlechner_2021_ctrl_human_mouse_fb.var_names[vorstandlechner_2021_ctrl_human_mouse_fb.var['highly_variable'] == True]

In [None]:
sc.tl.umap(vorstandlechner_2021_ctrl_human_mouse_fb, min_dist=0.2, random_state=seed)

In [None]:
vorstandlechner_2021_ctrl_human_mouse_fb.obs['cluster'] = vorstandlechner_2021_ctrl_human_mouse_fb.obs['cluster'].astype('category')
vorstandlechner_2021_ctrl_human_mouse_fb.uns['cluster_colors'] = [dict_colors_human_mouse[i] if i in dict_colors_human_mouse else '#bcbcbc' for  i in vorstandlechner_2021_ctrl_human_mouse_fb.obs['cluster'].cat.categories]

In [None]:
sc.pp.subsample(vorstandlechner_2021_ctrl_human_mouse_fb, fraction=1, random_state=0, copy=False)
sc.pl.umap(vorstandlechner_2021_ctrl_human_mouse_fb, color=['cluster', 'organism'], legend_loc='on data', frameon=False)

In [None]:
sc.tl.rank_genes_groups(vorstandlechner_2021_ctrl_human_mouse_fb, groupby='organism')
sc.pl.umap(vorstandlechner_2021_ctrl_human_mouse_fb, color=['organism'], legend_loc='on data')
sc.pl.rank_genes_groups_tracksplot(vorstandlechner_2021_ctrl_human_mouse_fb, groupby='organism', n_genes=10, dendrogram=False)

In [None]:
adata_regress = sc.pp.regress_out(vorstandlechner_2021_ctrl_human_mouse_fb, 'organism', n_jobs=32, copy=True)

In [None]:
sc.pp.pca(adata_regress, random_state=seed, n_comps=50)
sce.pp.harmony_integrate(adata_regress, key='Internal sample identifier', max_iter_harmony=50)
sc.pp.neighbors(adata_regress, use_rep='X_pca_harmony', n_neighbors=int(0.5 * len(adata_regress) ** 0.5 // 4), metric='cosine')

sc.tl.umap(adata_regress)
sc.pl.umap(adata_regress, color=['cluster', 'organism'], legend_loc='on data', frameon=False)

#### Showing DEGs of human VS mouse populations in Vorstandlechner et al 2021

In [None]:
sc.pp.subsample(vorstandlechner_2021_ctrl_human_mouse_fb, fraction=1, random_state=0, copy=False)
sc.pl.umap(vorstandlechner_2021_ctrl_human_mouse_fb, color=['cluster'], legend_loc='on data')

In [None]:
sc.pl.umap(vorstandlechner_2021_ctrl_human_mouse_fb[vorstandlechner_2021_ctrl_human_mouse_fb.obs['organism']=='mouse'], color=['cluster'], legend_loc='on data')
sc.pl.umap(vorstandlechner_2021_ctrl_human_mouse_fb[vorstandlechner_2021_ctrl_human_mouse_fb.obs['organism']=='human'], color=['cluster'], legend_loc='on data')

In [None]:
sc.tl.leiden(vorstandlechner_2021_ctrl_human_mouse_fb, resolution=0.01)
sc.pl.umap(vorstandlechner_2021_ctrl_human_mouse_fb, color=['leiden'], legend_loc='on data')
sc.tl.rank_genes_groups(vorstandlechner_2021_ctrl_human_mouse_fb, groupby='leiden')
sc.pl.rank_genes_groups_tracksplot(vorstandlechner_2021_ctrl_human_mouse_fb, n_genes=30)

In [None]:
sc.pp.subsample(vorstandlechner_2021_ctrl_human_mouse_fb, fraction=1, random_state=0, copy=False)
sc.pl.umap(vorstandlechner_2021_ctrl_human_mouse_fb, color=['cluster'], legend_loc='on data')

In [None]:
sc.pl.umap(vorstandlechner_2021_ctrl_human_mouse_fb[vorstandlechner_2021_ctrl_human_mouse_fb.obs['organism']=='mouse'], color=['cluster'], legend_loc='on data')
sc.pl.umap(vorstandlechner_2021_ctrl_human_mouse_fb[vorstandlechner_2021_ctrl_human_mouse_fb.obs['organism']=='human'], color=['cluster'], legend_loc='on data')

In [None]:
sc.tl.leiden(vorstandlechner_2021_ctrl_human_mouse_fb, resolution=0.01)
sc.pl.umap(vorstandlechner_2021_ctrl_human_mouse_fb, color=['leiden'], legend_loc='on data')
sc.tl.rank_genes_groups(vorstandlechner_2021_ctrl_human_mouse_fb, groupby='leiden')
sc.pl.rank_genes_groups_tracksplot(vorstandlechner_2021_ctrl_human_mouse_fb, n_genes=30)

### Conclusions
Datasets do not match. When reading the methods they point out that they use a restricted set of genes to create then do the UMAP. I don't feel confortable with that strategy, so I prefer to do a more "unbiased" analysis based on the markers.

# Comparison of gene patterns between populations
In this section we are going to take the gene markers from human and mouse, and do human-mouse, mouse-mouse and human-human comparisons between the set of markers. From this we expect to find similarities betwen both parties and establish a homology model of the skin fibroblasts.

To do this analysis we are going to:
* Get the top N markers for each population in mouse and human (also in human-human and mouse-mouse) and compute the Jaccard index between them. 
    * To find the "best" N we are going to compute for an array of N the matrix the jaccard matrix and get the trace with highest values using the hungarian algorithm. The sum of the trace is stored. A higher sum would imply a general higher overlap between markers and, therefore, a more relevant analysis. From the array of values, we are going to get the best "rounded" N.
* With the top N we are going to get the Jaccard matrix and plot it with a heatmap and a clustergram.
* From there, we are going to select the best matches and analyze the relationship between human and mouse populations

We do the human-human and mouse-mouse comparisons to get a "control" view. We expect more gene overlap between clusters of the same axis than of different ones. Also, we get a sense of the expected overlaps, to see how this is reflected in mouse.

To make the joining as "easy" as possible, we are going to use the subset of genes that have a mouse-human homology only. This might limit the extent of the analysis but, in general, we find a fair overlap, and this will make the dataset mapping and batch effect correction much easier.

In [None]:
%store -r dict_make_gene_scoring_cluster_robust_human
%store -r dict_make_gene_scoring_cluster_robust_mouse

In [None]:
def get_df_overlap(dict_1, dict_2, N=100, translate=True):
    df_overlap = pd.DataFrame(0, index=dict_1.keys(), columns=dict_2.keys())
    
    for cluster_name_1, cluster_df_1 in dict_1.items():
        for cluster_name_2, cluster_df_2 in dict_2.items():
            gene_list_1 = set(cluster_df_1.index[:N])
            gene_list_2_unchanged = cluster_df_2.index[:N]
            if translate:
                gene_list_2 = set([dict_mouse_human[i] if i in dict_mouse_human else i for i in gene_list_2_unchanged])
            else:
                gene_list_2 = set(gene_list_2_unchanged)

            overlap = len(gene_list_1 & gene_list_2) / len(gene_list_1 | gene_list_2)
            if overlap == 1:
                overlap = 0
                
            df_overlap.loc[cluster_name_1, cluster_name_2] = overlap
    
    return df_overlap

In [None]:
def plot_best_N(dict_1, dict_2, N_min=10, N_max=300, translate=True):
    list_N, list_total = [],[] 
    for N in tqdm(range(N_min, N_max)):
        df_jaccard_1_2 = get_df_overlap(dict_1, dict_2, N=N, translate=translate)

        m = Munkres()
        indexes = m.compute(1-df_jaccard_1_2.values)
        total = 0
        for row, column in indexes:
            value = df_jaccard_1_2.values[row][column]
            total += value

        list_N.append(N)
        list_total.append(total)

    plt.plot(list_N, list_total)

In [None]:
def plot_heatmap(df_jaccard_1_2, dict_colors_1, dict_colors_2, figsize=(12,8), ticklabelsize=15, diag=False, annot=True, cmap='magma'):
    fig, ax = plt.subplots(1, 1, figsize=figsize)
    if diag:
        mask = np.eye(df_jaccard_1_2.shape[0], dtype=bool)
        sns.heatmap(df_jaccard_1_2, annot=annot, fmt='.2f', ax=ax, mask=mask, cmap=cmap)
        ax.set_facecolor("#989898")
    else:
        sns.heatmap(df_jaccard_1_2, annot=annot, fmt='.2f', ax=ax, cmap=cmap)
    [t.set_color(dict_colors_1[t.get_text()]) for t in ax.xaxis.get_ticklabels()]; [t.set_color(dict_colors_2[t.get_text()]) for t in ax.yaxis.get_ticklabels()]
    ax.set_xticklabels(ax.get_xticklabels(),  weight='bold', size=ticklabelsize); ax.set_yticklabels(ax.get_yticklabels(),  weight='bold', size=ticklabelsize, va='center')
    None # To avoid plotting stuff on screen
    
def plot_clustermap(df_jaccard_1_2, dict_colors_1, dict_colors_2, figsize=(12,8), ticklabelsize=15, diag=False):
    if diag:
        mask = np.eye(df_jaccard_1_2.shape[0], dtype=bool)
        cg = sns.clustermap(df_jaccard_1_2, mask=mask)
        ax = cg.ax_heatmap
        ax.set_facecolor("#989898")
    else:
        cg = sns.clustermap(df_jaccard_1_2, )
        ax = cg.ax_heatmap
        
    [t.set_color(dict_colors_1[t.get_text()]) for t in ax.xaxis.get_ticklabels()]; [t.set_color(dict_colors_2[t.get_text()]) for t in ax.yaxis.get_ticklabels()]
    ax.set_xticklabels(ax.get_xticklabels(),  weight='bold', size=ticklabelsize); ax.set_yticklabels(ax.get_yticklabels(),  weight='bold', size=ticklabelsize, va='center')
    None # To avoid plotting stuff on screen

In [None]:
def print_common_genes(dict_1, dict_2, cluster_1, cluster_2, translate=True, N=150):
    gene_list_1 = set(dict_1[cluster_1].index[:N])
    gene_list_2_unchanged = dict_2[cluster_2].index[:N]
    if translate:
        gene_list_2 = set([dict_mouse_human[i] if i in dict_mouse_human else i for i in gene_list_2_unchanged])
    else:
        gene_list_2 = set(gene_list_2_unchanged)
        
    return sorted(list(gene_list_1 & gene_list_2))

### Human-human comparison

In [None]:
plot_best_N(dict_make_gene_scoring_cluster_robust_human, dict_make_gene_scoring_cluster_robust_human, N_min=10, N_max=300)

In [None]:
df_jaccard_human_human = get_df_overlap(dict_make_gene_scoring_cluster_robust_human, dict_make_gene_scoring_cluster_robust_human, N=150)
plot_heatmap(df_jaccard_human_human, dict_colors_human, dict_colors_human, figsize=(12,7), ticklabelsize=15, diag=True, cmap='Blues')

In [None]:
# sorted
df_jaccard_human_human_sorted = df_jaccard_human_human.loc[['A2', 'A3', 'A1', 'A4', 'B1', 'B3', 'B2', 'B4', 'C2', 'C3', 'C1', 'C5', 'D1', 'D2', 'E1'], 
                                                           ['A2', 'A3', 'A1', 'A4', 'B1', 'B3', 'B2', 'B4', 'C2', 'C3', 'C1', 'C5', 'D1', 'D2', 'E1']]

df_jaccard_human_human_sorted[df_jaccard_human_human_sorted < 0.03] = np.nan

plot_heatmap(df_jaccard_human_human_sorted, dict_colors_human, dict_colors_human, figsize=(12,7), ticklabelsize=15, annot=True, cmap='Blues')

plot_heatmap(df_jaccard_human_human_sorted, dict_colors_human, dict_colors_human, figsize=(12,7), ticklabelsize=15, annot=False, cmap='Blues')

#### Comments on human-human
* It is clear that A3 is a bridge between A1-A4 and A2. A2 and A1 are clearly different clusters, and A3 is more "favorable" to A1 than to A2.
    * A1 - A4 (no A3): **ACKR3**, ADGRD1, C1QTNF3, **CA12**, CD151, CD248, **CD34**, **CD55**, **CD70**, CHRDL1, **CLEC3B**, CTHRC1, CYB5R3, **DBN1**, DPP4, **EMILIN2**, **FAM180B**, **FAP**, FBLN2, **FBN1**, FSTL1, **GALNT15**, **GPX3**, **IGFBP6**, **ISLR**, **LGR5**, LIMS2, MATN4, MFAP5, MMP2, PALM, PAMR1, PCOLCE2, PIGT, PLAC9, PPIC, **PRG4**, **SCARA5**, SEMA3C, SLC29A1, SLITRK4, **SLPI**, SSC5D, TGFBR3, TIMP2, TNXB, VASN, VAT1 [48 total, **19**]
    * A1 - A3 (no A4): ADAMTSL1, AEBP1, ANGPTL1, ANGPTL5, **AOX1**, **ARFGEF3**, C1QTNF3, CD248, CERCAM, CLEC3B, **COL14A1**, COL1A1, COL1A2, **CORIN**, CPE, CPVL, CREB5, CRYAB, CTHRC1, CTSB, CTSK, CYB5R3, CYBRD1, DCN, DPP4, ECM1, ELN, FBLN1, FBLN2, FBN1, FKBP9, **GLRB**, HEXA, ISLR, ITIH5, LOX, LRFN5, LRRC2, MMP2, MMP27, NUCB2, OLFML3, **OMD**, PCOLCE2, PDGFRL, PI16, PLAC9, PLD3, PODN, PPIC, QPCT, RECK, SEMA3B, SERPINF1, SFRP2, SMOC2, SPARC, SSC5D, **SVEP1**, THBS2, THBS3, TNXB, WISP2, XG [64 total, **7**]
    * A4 - A1 - A3 (all 3): **C1QTNF3**, **CD248**, CLEC3B, CTHRC1, CYB5R3, **DPP4,** **FBLN2**, FBN1, ISLR, MMP2, **PCOLCE2**, PLAC9, PPIC, SSC5D, **TNXB** [15 total, **6**]
    * A2 - A3: CD9, COL5A1, COL6A2, HSPB7, **ISM1**, **LEPR**, NBL1, NCKAP5, **RSPO1**, SFRP2, TWIST2, **WIF1**, WNT16 [13 total, **X**]


* A2 has a "shared" transcriptomic profile with C and E axis! It is also apparent form UMAPs and connectivity graphs. However, its relatively low similarity profile with C1, C3 and E1 makes it a somewhat "independent" population.
    * A2-C1: APELA, CCDC3, **COL21A1**, COL27A1, COL3A1, **COL7A1**, **COL6A3**, CPXM2, **DKK3**, DUSP4, **LAMC3**, ROBO2, ROR2, **SEMA5A**, **SPON1**, **STMN1**, **THSD4**, **TMEM119**  [18 total, **10**]
    * A2-C3: **ADGRE2**, AK5, ANTXR1, AQP1, C4orf48, COL3A1, **COL5A1**, **COL6A1**, **COL6A3**, **COL7A1**, **COMP**, CPXM2, DKK3, F13A1, LBH, **LOXL2**, MFAP2, **PTK7**, RAB31, RGS3, SCARF2, SEMA5A, **SPON1**, SUGCT, TANC2, TCF4, TMEM119, TNC, UNC5B  [29 total, **9**]
    * A2-E1: ACVR2A, AK5, **AQP1**, CDC42EP3, **CMKLR1**, COL21A1, DKK3, **DUSP6**, HS3ST6, ID1, **IGFBP2**, LAMC3, **MAP2**, PLCB1, PREX1, **RGCC**, SAMD5, SEMA5A, SPON1, **SPRY1**, TGFBI, UNC5B  [22 total, **7**]


* B axis seems quite independent from the rest of axes, although it is related to T1 and, also, B1 with D1. 
    * It is also claear that B3 acts as a bridge between B1, B2 and B4. The bridge is more clear between B3 and B2 than B3 and B1. B1 and B2 are not related, and B4 is slightly related to B2, but not B1 (although they appear more related in UMAPs and graphs).
        * B3 - B1 (not B2): ADAMTS4, APOE, **BIRC3**, **CCL2**, **CXCL1**, **CXCL2**, **CXCL3**, **GEM**, **HAS2**, ICAM1, IRF1, IRF8, NFKB2, NFKBIA, PHLDA1, PPP1R15A, RELB, SOCS3, **TNFAIP3**, UGCG, ZNF267 [21 total, **8**]
        * B3 - B2 (not B1): ABCC3, ACHE, ADAMDEC1, **ADRA2A**, **ANKRD29**, APLNR, APOE, B2M, C1RL-AS1, **C3**, **CCL19**, **CD74**, CDX1, CH25H, **CLSTN3**, COX4I2, **CRB2**, **CTSC**, CTSH, **CX3CL1**, CXCL12, CYP7B1, ENC1, EXOC3L4, FLT3LG, FXYD6, GGT5, GUCY1A1, GUCY1B1, HES4, HLA-B, HLA-DRB1, HLA-F, **ICAM2**, **IGFBP3**, IGFBP7, IL15, IL33, **IL34**, IRF8, ITGA8, **JAK3**, LIFR, MICALL2, MSC, **NLGN4X**, NPB, ODF3B, OLFM2, OSMR, P2RY14, PGM5, PGM5-AS1, PKP2, PLA2G4C, PLEKHH2, POPDC2, PSMB9, **PTGDS**, PTPRT, RARRES2, RASSF4, **RBP5**, SCN4B, SDK1, SLC51A, SLC9A3R2, **SLCO2B1**, **ST8SIA1**, TMEM132C, **TMEM150C**, TMEM176A, TMEM176B, TNFSF10, **TNFSF13B**, **TYMP**, UBD, VEGFA, WIPF3 [79 total, **21**]
        * B1 - B3 - B2 (all 3): **APOE**, **IRF8** [2 total, **2**]
    * B1, similar to A2, shows a *more* independent transcriptomic profile with its neighbour B clusters.
    * B4 is more similar to B2 than to the rest of B clusters
        * B4 - B2: **ABCA10**, **ABCA8**, ADH1B, APOC1, APOE, C1orf54, C3, C7, COL4A4, CXCL12, CYGB, DEPTOR, EBF1, EFEMP1, EPHX1, **FXYD6**, GGT5, HEYL, IFITM1, IGFBP7, IL33, NRP1, PDE5A, PRRX1, PTCH2, RARRES2, SERPINF2, SLIT2, TMEM176A, VWA1 [30 total, **3**]
    * B4 shares a decent set of genes with D1, D2 and E1
        * B4 - D1: A2M, **ABCA8**, **ABCA9**, ABLIM1, AKR1C1, **APOD**, BNC2, CNTN4, CRISPLD1, **CYGB**, EGFR, EPS8, FLRT2, FMO1, FMO2, FMO3, FOXS1, **GPC3**, **IGFBP7**, INMT, **ITM2A**, LGALS3BP, **LSP1**, NGFR, NR2F2, OAF, PEAR1, PLPPR4, SFRP4, SHISA3, SOX5, SPTBN1, TFPI, **VIT**  [34 total, **9**]
        * B4 - D2: A2M, **ABCA8**, ADAMTSL3, BNC2, CCL13, **CHRDL1**, CNTN4, COL4A2, DACT1, EGFR, FHL2, FLRT2, FOXS1, **IGFBP6**, IGFBP7, INMT, ITGA7, **ITM2A**, ITM2B, NGFR, NOTCH3, NR2F2, OAF, PDZRN4, PEAR1, PHLDA3, **RARRES2**, SFRP4, SMIM10, SPTBN1, TFPI, TGFBR2, **TXNIP**, **VIT**  [35 total, **7**]
        * B4 - E1: A2M, **APOD**, ARHGDIB, CALD1, **COL15A1**, CRLF1, DACT1, EPHA3, FHL2, FMO2, **GSN**, **IGF1**, INMT, **ITM2A**, ITM2B, **LSP1**, **MGP**, OAF, PEAR1, **PLA2G5**, PLBD1, PRSS23, RARRES2, RELN, SLC16A4, SPON2, **SPRY1**, **STMN2**, SULT1A1, **TIMP3**, TSHZ2, TXNIP, VWA1  [33 total, **11**]
    * B1 also shares some similarities with D1:
        * B1 - D1: AMD1, **CHD1**, CYCS, DNAJA1, DUSP5, EGR3, ERF, KLF5, LDLR, **MAFF**, NFE2L2, PHLDA1 [12 total, **2**]


* The C, D and E axes, although they have a substructure, they are more correlated between them than A or B axes.
    * C1 and C3 are quite related, and C3 specially is the bridge cluster among the C axis (with C1 specially, then with C2 and then with C5). 
    * D1 and D2 are also very related.
    * E1 axis interacts to some extent with the all C and D axes, but also with A2 and B4; which makes it a great candidate to analyse where and how it is located within the dermis. 

### Mouse-mouse comparison

In [None]:
plot_best_N(dict_make_gene_scoring_cluster_robust_mouse, dict_make_gene_scoring_cluster_robust_mouse, N_min=10, N_max=300, translate=False)

In [None]:
df_jaccard_mouse_mouse = get_df_overlap(dict_make_gene_scoring_cluster_robust_mouse, dict_make_gene_scoring_cluster_robust_mouse, N=150, translate=False)
plot_heatmap(df_jaccard_mouse_mouse, dict_colors_mouse, dict_colors_mouse, figsize=(12,7), ticklabelsize=15, diag=True, cmap='Blues')

In [None]:
# sorted
df_jaccard_mouse_mouse_sorted = df_jaccard_mouse_mouse.loc[['z2', 'z1', 'x2', 'x1', 'x/y', 'y3', 'y1', 'y2', 'y4', 'y5', 'v1', 'w/x', 'w4', 'w3', 'w1', 'w2', 'w5'], 
                                                           ['z2', 'z1', 'x2', 'x1', 'x/y', 'y3', 'y1', 'y2', 'y4', 'y5', 'v1', 'w/x', 'w4', 'w3', 'w1', 'w2', 'w5']]

df_jaccard_mouse_mouse_sorted[df_jaccard_mouse_mouse_sorted < 0.05] = np.nan

plot_heatmap(df_jaccard_mouse_mouse_sorted, dict_colors_mouse, dict_colors_mouse, figsize=(12,7), ticklabelsize=15, annot=False, cmap='Blues')
plot_heatmap(df_jaccard_mouse_mouse_sorted, dict_colors_mouse, dict_colors_mouse, figsize=(12,7), ticklabelsize=15, annot=True, cmap='Blues')

In [None]:
cluster_1, cluster_2, cluster_3 = 'w3', 'w/x', 'x2'

list_genes_int_1 = print_common_genes(dict_make_gene_scoring_cluster_robust_mouse, dict_make_gene_scoring_cluster_robust_mouse, 
                   cluster_1, cluster_2, translate=False, N=150)

list_genes_int_2 = print_common_genes(dict_make_gene_scoring_cluster_robust_mouse, dict_make_gene_scoring_cluster_robust_mouse, 
                   cluster_2, cluster_3, translate=False, N=150)

list_genes_int = [i for i in list_genes_int_1 if i in list_genes_int_2]

print(', '.join(list_genes_int), f'[{len(list_genes_int)} total, *X*]')

aaa = df_mouse_genes_codes.loc[[i for i in list_genes_int if i in df_mouse_genes_codes.index]]
aaa.loc[np.array([cluster_1 in i for i in aaa['code'].values]) & np.array([cluster_2 in i for i in aaa['code'].values]) & np.array([cluster_3 in i for i in aaa['code'].values])]

In [None]:
cluster_1, cluster_2 = 'x2', 'w/x'
# cluster_not = 'y5'

list_genes_int = print_common_genes(dict_make_gene_scoring_cluster_robust_mouse, dict_make_gene_scoring_cluster_robust_mouse, 
                   cluster_1, cluster_2, translate=False, N=150)
print(', '.join(list_genes_int), f'[{len(list_genes_int)} total, *X*]')

aaa = df_mouse_genes_codes.loc[[i for i in list_genes_int if i in df_mouse_genes_codes.index]]
aaa.loc[np.array([cluster_1 in i for i in aaa['code'].values]) & np.array([cluster_2 in i for i in aaa['code'].values])  ]
#         & np.array([cluster_not not in i for i in aaa['code'].values])]

#### Comments on mouse-mouse
* z1 - z2: 1700019D03Rik, *Aif1l*, *Akr1c18*, *Aldh1a3*, Anxa1, Anxa3, Axl, Basp1, C3, Car8, Cd248, Cd55, Chrdl1, Cmah, Col14a1, Creb5, Dbn1, Ddr2, *Dpp4*, Ebf2, Efemp1, *Efhd1*, Emilin2, Fn1, Fndc1, Galnt16, Gap43, Heg1, Ifi27l2a, Igfbp4, Igfbp6, *Il18*, Kcnk5, Limch1, Lrrn4cl, Mfap5, Nid1, Npr1, Opcml, Pde8a, Pi16, *Pla1a*, Plac8, Plat, Prkg2, Procr, Prss23, Ptgis, Pygl, Rab32, Rcan2, Sema3c, Sfrp2, Slc43a3, *Smpd3*, Stmn4, Tek, Thbd, Wnt10b, Wnt2 [60 total, *8*]

* x2 - w/x: 3300005D01Rik, *Ackr4*, Ccbe1, Cdkn2c, *Col7a1*, Cyp26b1, *Entpd1*, *Fam180a*, Fxyd5, *Gldn*, Gpm6b, Hmcn1, Meox2, Rspo4, S100a4, Tcf4, Thy1, Twist2 [18 total, *5*]

* x1 - x2 (no x/y): Adcy1, Adh7, *Ahrr*, Aldh3a1, Ankrd29, *Axin2*, Cd63, Cdh4, Cldn10, Col3a1, Cox6c, Cyp26b1, Fgfr4, Gda, Gpnmb, *Grem1*, H2-D1, Lum, *Miat*, Nbl1, Ndufa4l2, Nt5e, Nupr1, Osr2, Pla2g2e, Plpp1, Plpp3, Por, Ppic, Ppp1r14a, *Ppp2r2c*, Rspo1, S100a4, Scara5, Sulf2, Tgfbi, Tmem132c, Tnfrsf19, Zfp536 [39 total, *5*]

* x1 - x/y (no x2): Aebp1, Aldh1l2, C1qtnf3, Cd34, Cdh4, Cgref1, Cldn10, Clec11a, Cpz, Creb3l3, Crip2, Csf1r, Ctsh, Ctsk, Cyp2f2, Cyp4b1, Dapk1, Dcn, Ddah1, Ecm2, Fam46c, Gas6, Il1r2, Lgi1, Mmp27, P4ha2, Pdgfrl, Rarres2, Rcn3, Sectm1a, Sema3b, Serpina3n, Tgfb2, Tgfbi, Zfp536 [35 total, *0*]
* x2 - x1 - x/y (all 3): Cdh4, Cldn10, Tgfbi, *Zfp536* [4 total, *1*]

* x/y - y3: Abcc9, Art3, Aspn, C1qtnf3, Cilp, Col12a1, Fzd1, Gas6, Ghr, Gpx3, Igf1, Lbp, Lepr, Lgr5, Lox, Nfib, Nkain4, Ogn, Pcolce, Pcsk5, Pdgfrl, Plxna4, Rerg, Serpinf1, Sfrp1, Slit2, Tgfbi, Tgm2 [28 total, *0*]
* x/y - y1: Agtr1a, Akap12, Akr1cl, Ctsh, Cxcl12, Dpep1, Dpt, Efemp1, Gpx3, Hpgd, *Lbp*, Lepr, Ly6a, Ly6c1, Mgst1, Nfib, Ogn, Pcsk5, Penk, *Pltp*, Slit2, Thbs3, Tnxb [23 total, *2]

* y2 - y3: Abca8a, Adamtsl2, *Angptl1*, Aspn, Bgn, Cd9, *Cilp*, Cxcl14, Cygb, Dpysl3, F3, Fbln2, Fxyd6, Gfra1, Gpx3, Igf1, Igfbp7, Itm2a, Jam2, Lox, Ltbp4, Mgp, Mn1, Myoc, Nfib, Pcolce, Podn, Rbp1, Serpinf1, Sfrp1, Smoc2, Tgfbi [32 total, *2*]
* y2 - y1: Adam19, Bicc1, Bmper, Col14a1, Col4a1, Col4a2, *Col5a3*, Col6a3, Col6a6, Cygb, Dpt, Fap, Fzd4, Ggt5, Gldc, Gpx3, Ifi205, Itih5, Itm2a, Lpl, Lsp1, Mfap5, Nfib, Slfn5, Srpx2, Thbs3, Tmeff2 [27 total, *0*]
* y2 - y4: Adamtsl3, Adcyap1r1, Angpt1, B3gnt8, Bicc1, Bmper, Col4a1, Col4a2, Col4a4, Cp, Cst3, Cxcl14, Cygb, Entpd2, F3, Fxyd6, Fzd4, G0s2, Ggt5, Ifi203, Igfbp7, Il17d, *Inmt*, Lmo2, Loxl3, Lpl, Mgp, Msx1, Nfib, *Nmb*, Ntf3, P2ry1, P2ry14, Rbp1, Sept4, Slfn5, Sparcl1, Steap4, Tmeff2, Vtn [40 total, *2*]
* y2 - y5: *Abca8a*, Bicc1, Col15a1, Col5a3, *Col8a1*, Cp, Entpd2, F3, *Fap*, Gfra1, Hspg2, Islr, Itm2a, *Meox1*, Mgp, Myoc, Nid1, P2ry14, Plxdc2, Smoc2, Sparcl1, Steap4, Tgfbi, Thbs1, Thbs4, Vtn [26 total, *3*]

* y5 - y2 (no y4): *Abca8a*, Bicc1, Col15a1, Col5a3, *Col8a1*, Cp, Entpd2, F3, *Fap*, Gfra1, Hspg2, Islr, Itm2a, *Meox1*, Mgp, Myoc, Nid1, P2ry14, Plxdc2, Smoc2, Sparcl1, Steap4, Tgfbi, Thbs1, Thbs4, Vtn [26 total, *3*]
* y5 - y4 (no y2): Abi3bp, Bicc1, C1s1, Ccl11, Cp, Cpe, Ebf2, *Ecm1*, Entpd2, F3, Gsn, Igfbp4, Il33, Mgp, Nr2f2, Ntrk2, P2ry14, Phlda1, Sparcl1, *Steap4*, Sult5a1, Tshz2, Vit, Vtn [24 total, *1*]
* y5 - y2 - y4 (all 3): Bicc1, Cp, Entpd2, F3, Mgp, P2ry14, Sparcl1, *Steap4*, *Vtn* [9 total, *2*]

* v1 - y5 (no y4): Abi3bp, Cpe, Cspg4, Ebf2, *Gfra1*, *Hmgcs2*, Il33, *Lmo4*, Mgp, Nr2f2, Sox9, *Sult5a1* [12 total, *4*]
* v1 - y4 (no y5): Abi3bp, Acp5, Adamtsl3, B3gnt8, Bnc2, Cav1, *Ccdc3*, Cpe, Cttnbp2, Ebf2, Fxyd6, Il33, *Ildr2*, Mgp, *Ndrg2*, Nfib, Nr2f2, Pear1, Rgs16, Sult5a1 [20 total, *3*]
* v1 - y5 - y4 (all 3): Abi3bp, Acp5, Adamtsl3, B3gnt8, Bnc2, Cav1, Ccdc3, Cpe, Cttnbp2, Ebf2, Fxyd6, Il33, Ildr2, Mgp, Ndrg2, Nfib, *Nr2f2*, Pear1, Rgs16, Sult5a1 [20 total, *X*]

* w1 - w2: 6330403K07Rik, Acot1, *Alpl*, Alx4, Aplp1, Apoe, *Bambi*, Bcl2, *Bmp4*, C1qtnf4, *Cd24a*, Cebpa, Chodl, Chst8, Col23a1, Crabp1, *Crabp2*, Cttnbp2, Daam2, Ebf1, *Edn3*, Enpp2, Eva1a, Fgf10, Fgf7, Fgfr1, Fgfr2, Frem1, Gldn, Gnai1, Gng2, H2-Ab1, H3f3b, Heph, Hey2, *Hhip*, *Inhba*, *Kctd1*, Lamc3, Ldhb, *Lef1*, Lrrtm3, Ltbp1, Mdk, Mir155hg, *Mpped2*, Mpzl1, Ncam1, *Ndnf*, Ndp, Nkd2, *Notum*, Ntng1, Pappa, *Pappa2*, *Prdm1*, *Prlr*, Ptk7, Ptma, Ptprz1, Rab20, Rspo1, Rspo2, Rspo3, Runx3, *Scube3*, Serpine2, Slc16a2, Snhg11, Sostdc1, Spon1, Ss18, Ssbp2, Tceal3, *Tfap2c*, Tle2, Tmem132c, Tmem176a, Tmem176b, Trpm3, Trps1, Vcan, Wif1, a [84 total, *18*]
* w3 - w4: Acan, *Acta2*, Actg2, Actn1, *Adamts18*, Bcl11b, *Bok*, Cd200, Cdc42ep3, Clic5, Cnn2, Col11a1, Csrp1, *Egfl6*, Etl4, Fam101b, Fam65b, Fry, Igfbp4, Itih5, Lmo4, *Lrrc15*, Mef2c, Mgst3, Myl9, *Mylk*, Nav2, Nradd, Nrtn, *Ntrk3*, Palld, Pard6g, Pawr, Phldb2, *Tagln*, Tcf7l2, Tmem119, Tnnt1, Tpbg, Tpd52l1, Tpm1, Tpm2, Vcl [43 total, *8*]


* w3 - w/x - x1: 0
* w3 - w/x - x2: 0

* x2 - w4: *Igfbp2*, *Mme*, *Sema3a* [3 total, *3*]



### Mouse-human comparison

In [None]:
plot_best_N(dict_make_gene_scoring_cluster_robust_human, dict_make_gene_scoring_cluster_robust_mouse, N_min=10, N_max=300)

In [None]:
N_human_mouse = 250
df_jaccard_mouse_human = get_df_overlap(dict_make_gene_scoring_cluster_robust_human, dict_make_gene_scoring_cluster_robust_mouse, N=N_human_mouse)
plot_heatmap(df_jaccard_mouse_human, dict_colors_mouse, dict_colors_human, figsize=(12,7), ticklabelsize=15, cmap='Blues')

In [None]:
# sorted
df_jaccard_mouse_human_sorted = df_jaccard_mouse_human.loc[['A2', 'A3', 'A1', 'A4', 'B1', 'B3', 'B2', 'B4', 'C2', 'C3', 'C1', 'C5', 'D1', 'D2', 'E1'], 
                                                           ['x2', 'x1', 'x/y', 'y3', 'y1', 'z2', 'z1', 'y4', 'y2', 'w/x', 'w4', 'w3', 'w1', 'w2', 'y5', 'v1', 'w5']]

df_jaccard_mouse_human_sorted[df_jaccard_mouse_human_sorted < 0.05] = np.nan

plot_heatmap(df_jaccard_mouse_human_sorted, dict_colors_mouse, dict_colors_human, figsize=(12,7), ticklabelsize=15, annot=True, cmap='Blues')

plot_heatmap(df_jaccard_mouse_human_sorted, dict_colors_mouse, dict_colors_human, figsize=(12,7), ticklabelsize=15, annot=False, cmap='Blues')

In [None]:
# Case: mouse has two clusters to be considered
cluster_1, cluster_2, cluster_2a = 'E1', 'w4', 'w3'

list_genes_int = print_common_genes(dict_make_gene_scoring_cluster_robust_human, dict_make_gene_scoring_cluster_robust_mouse, 
                   cluster_1, cluster_2, translate=True, N=250)

list_genes_inta = print_common_genes(dict_make_gene_scoring_cluster_robust_human, dict_make_gene_scoring_cluster_robust_mouse, 
                   cluster_1, cluster_2a, translate=True, N=250)

list_genes_int = sorted(set(list_genes_int + list_genes_inta))
print(', '.join(list_genes_int), f'[A, **X**, *Y*, ***Z***] - {len(list_genes_int)}')

aaa_human = df_human_genes_codes.loc[[i for i in list_genes_int if i in df_human_genes_codes.index]]
aaa_human = aaa_human.loc[np.array([cluster_1 in i for i in aaa_human['code'].values])]
aaa_human.reset_index(inplace=True)

aaa_mouse = df_mouse_genes_codes.loc[[i for i in [dict_human_mouse[j] for j in list_genes_int] if i in df_mouse_genes_codes.index]]
aaa_mouse = aaa_mouse.loc[np.array([cluster_2 in i for i in aaa_mouse['code'].values]) | np.array([cluster_2a in i for i in aaa_mouse['code'].values]) | np.array([cluster_2[0] in i for i in aaa_mouse['code'].values])]
aaa_mouse.reset_index(inplace=True)


pd.concat([aaa_human, aaa_mouse], axis=1)

In [None]:
# Case: human has two clusters to be considered

cluster_1, cluster_1a, cluster_2 = 'D1', 'D2', 'y4'
# cluster_not = 'y5'

list_genes_int = print_common_genes(dict_make_gene_scoring_cluster_robust_human, dict_make_gene_scoring_cluster_robust_mouse, 
                   cluster_1, cluster_2, translate=True, N=250)

list_genes_inta = print_common_genes(dict_make_gene_scoring_cluster_robust_human, dict_make_gene_scoring_cluster_robust_mouse, 
                   cluster_1a, cluster_2, translate=True, N=250)

list_genes_int = sorted(set(list_genes_int + list_genes_inta))
print(', '.join(list_genes_int), f'[A, **X**, *Y*, ***Z***] - {len(list_genes_int)}')

aaa_human = df_human_genes_codes.loc[[i for i in list_genes_int if i in df_human_genes_codes.index]]
aaa_human = aaa_human.loc[np.array([cluster_1 in i for i in aaa_human['code'].values]) | np.array([cluster_1a in i for i in aaa_human['code'].values]) | np.array([cluster_1[0] in i for i in aaa_human['code'].values])]
aaa_human.reset_index(inplace=True)

aaa_mouse = df_mouse_genes_codes.loc[[i for i in [dict_human_mouse[j] for j in list_genes_int] if i in df_mouse_genes_codes.index]]
aaa_mouse = aaa_mouse.loc[np.array([cluster_2 in i for i in aaa_mouse['code'].values]) | np.array([cluster_2[0] in i for i in aaa_mouse['code'].values])]
aaa_mouse.reset_index(inplace=True)


pd.concat([aaa_human, aaa_mouse], axis=1)

In [None]:
cluster_1, cluster_2 = 'E1', 'v1'
# cluster_not = 'y5'

list_genes_int = print_common_genes(dict_make_gene_scoring_cluster_robust_human, dict_make_gene_scoring_cluster_robust_mouse, 
                   cluster_1, cluster_2, translate=True, N=250)
print(', '.join(list_genes_int), f'[A, **X**, *Y*, ***Z***] - {len(list_genes_int)}')

aaa_human = df_human_genes_codes.loc[[i for i in list_genes_int if i in df_human_genes_codes.index]]
aaa_human = aaa_human.loc[np.array([cluster_1 in i for i in aaa_human['code'].values])]
aaa_human.reset_index(inplace=True)

aaa_mouse = df_mouse_genes_codes.loc[[i for i in [dict_human_mouse[j] for j in list_genes_int] if i in df_mouse_genes_codes.index]]
aaa_mouse = aaa_mouse.loc[np.array([cluster_2 in i for i in aaa_mouse['code'].values]) | np.array([cluster_2[0] in i for i in aaa_mouse['code'].values])]
aaa_mouse.reset_index(inplace=True)


pd.concat([aaa_human, aaa_mouse], axis=1)

#### Comments on human-mouse


Representation of genes: MARKER, **MARKER IN HUMAN**, *MARKER IN MOUSE*, ***MARKER IN HUMAN AND MOUSE***  [1, **3**, *6*, ***3***]  -  13
[MIRAR SI EL SOLAPAMIENTO ES MUCHO O POCO EN GENERAL]
* A2 - x2: CD109, COL3A1, COL5A1, LOXL2, NFATC2, NOTUM, PAM, RSPO3, SCARF2, TCF4, ZNF608, **COL13A1**, **CYP26B1**, **DAAM2**, **ISM1**, **NKD1**, **NKD2**, **PAPPA**, **PREX1**, **PTGS1**, **TGFBI**, **THBD**, **THSD4**, *C10orf105*, *CAV1*, *CCBE1*, *EMX2*, *ENHO*, *LSAMP*, *MAMDC2*,  *SPRY1*, *TWIST2*, *WNT5A*, ***AHRR***, ***AXIN2***, ***CD9***, ***COL7A1***, ***F13A1***, ***GREM2***, ***IGFBP2***, ***KCNK2***, ***PTK7***, ***PTPRE***, ***RSPO1***, ***SMIM3***, ***STC1***, ***TNFRSF19*** [11, **12**, *10*, ***14***] - 47

##### Esto es curioso porque A2 vemos que también comparte genes con C2 y derivados, así que este fenómeno es comun en human-human y human-mouse (pero no tan evidente en mouse-mouse)!!
* A2 - w1: C4orf48, EMB, ETV1, PAPPA, PLCB1, SFRP2, WIF1, WNT5A, **APCDD1**, **AXIN2**, **FGFR2**, **NKD1**, **RAMP3**, **RGS2**, **SMIM3**, *LAMC3*, *NOTUM*, *NRN1*, *RSPO1*, *SPON1*, *TFAP2C*, ***APELA***, ***COL13A1***, ***COL23A1***, ***DAAM2***, ***F5***, ***NKD2***, ****PTK7***, ***RSPO3*** [8, **7**, *6*, ***8***] - 29

* A1 - x1: C1QTNF3, CD34, CHPF, CLEC3B, COL1A1, COL1A2, COL3A1, CREB5, CTSK, DCN, ELN, FKBP9, HTRA1, IGFBP5, ITGBL1, KDELR3, LOXL4, MFAP4, MMP14, P4HA2, PCOLCE2, PDGFRL, PLPP1, RCN3, SLC38A10, SPARC, TSPAN4, VKORC1, **AEBP1**, **ANGPTL1**, **CADM3**, **GDF15**, **GPNMB**, **HPGD**, **MMP2**, **PODN**, **SCARA5**, **TIMP2**, **TNXB**, *ADAMTS2*, *PPIC*, ***CGREF1***, ***CPZ***, ***CYBRD1***, ***CYP4B1***, ***MMP27***, ***SEMA3B*** [28, **11**, *2*, ***6***] - 47

* A1 - x/y: ADAMTSL1, ANGPTL1, BASP1, CCDC80, CD34, CILP, COL12A1, COL1A1, COL1A2, CTSB, CTSK, DCN, FKBP9, IGSF10, ITGBL1, NUCB2, OGN, P4HA2, MGST1, PCOLCE2, PPIB, RCN3, SERF2, SERPINF1, VKORC1,  **ABCC9**, **ADGRD1**, **AEBP1**, **AGTR1**, **CADM3**, **CYP4B1**, **GALNT15**, **HPGD**, **LOX**, **PCOLCE**, **PDGFRL**, **SEMA3B**, **SMOC2**, **SVEP1**, **THBS2**, **THBS3**, **TNXB**, *C1QTNF3*, *GPX3*, *GREB1L*, *PLTP*, ***CGREF1***, ***CPZ***, ***LGR5***, ***MMP27*** [25, **17**, *4*, ***4***] - 50

* A1 - z2: CREB5, ECM1, FBN1, IGFBP5, IGFBP6, METRNL, PAMR1, PLXDC2, SEMA3C, TIMP3, VGLL3, **CHRDL1**, **CLEC3B**, **MFAP5**, **QPCT**, **SCARA5**, **TIMP2**, *ADAMTSL4*, *BASP1*, *CD248*, *CD34*, *COL14A1*, *DBN1*, *EMILIN2*, *LRRN4CL*, *PTGIS*, *SFRP2*, *UCHL1*, ***ACKR3***, ***ADGRD1***, ***CD55***, ***DPP4***, ***ISLR***, ***LIMS2***, ***MTCL1***, ***NPR1***, ***PI16***, ***PRKG2***, ***SEMA3E***, ***TUBB4A*** [11, **6**, *11*, ***12***] - 40

* A1 - y3: ANGPTL1, COL12A1, COL14A1, CRYAB, CYB5R3, DHRS3, ELN, FGL2, GAS1, GPX3, IGFBP6, ITGBL1, MFAP4, OGN, PRELP, SERPINF1, SFRP2, TIMP3, **CLU**, **FBLN1**, **FBLN2**, **GALNT15**, **LGR5**, **LOX**, **CD151**, **PCOLCE**, **PDGFRL**, *C1QTNF3*, *CILP*, *MGP*, *PTGIS*, ***ABCC9***, ***OMD***, ***PODN***, ***SMOC2*** [18, **9**, *4*, ***4***] - 35



* A4 - z1: ADAMTS5, FSTL1, GFPT2, MGLL, RAMP2, SDK1, TIMP2, TNFAIP6, VASN, ZYX, **ACE**, **CLEC3B**, **GPX3**, **IGFBP6**, **LOXL1**, **MFAP5**, **PTGIS**, **RAB32**, **SCARA5**, *ACKR3*, *ADGRD1*, *AXL*, *CHRDL1*, *FLNC*, *GAP43*, *HAS1*, *HEG1*, *PROCR*, *PRSS23*, *UGDH*, ***AIF1L***, ***CD248***, ***CD55***, ***DBN1***, ***DPP4***, ***EMILIN2***, ***NPR1***, ***SEMA3C***, ***SEMA3E***, ***SFRP4***, ***WNT2*** [10, **9**, *11*, ***11***] - 41

* A4 - z2: BMP7, CD34, DDAH2, GFPT2, IGFBP5, MGLL, NHSL1, PCSK6, PPP1R14B, PXN, RAMP2, SCARA3, SDK1, SMURF2, TRIO, ZNF385A, ZYX, **ACE**, **ACKR3**, **CLEC3B**, **DBN1**, **FBN1**, **IGFBP6**, **LOXL1**, **MFAP5**, **SCARA5**, **TIMP2**, **TIMP3**, *ACKR2*, *ADAMTSL4*, *ADGRG2*, *AXL*, *CHRDL1*, *DACT2*, *GAP43*, *HEG1*, *PROCR*, *PRSS23*, *UGDH*, ***ADGRD1***, ***AIF1L***, ***CD248***, ***CD55***, ***DPP4***, ***EMILIN2***, ***ISLR***, ***LIMS2***, ***NPR1***, **PAMR1**, ***PTGIS***, ***RAB32***, ***SEMA3C***, ***SEMA3E***, ***SFRP4***, ***WNT2*** [17, **11**, *11*, ***16***] - 55

* A4 - y1: ADAMTS5, ADGRD1, AHNAK2, APBB1IP, AXL, CAPG, CDH13, FAP, FSTL1, GFPT2, GLIPR2, HEG1, LGR4, LSP1, MEDAG, MGST1, MMP2, PRSS23, RAMP2, TAGLN2, THBS3, TMSB10, TNFAIP6, UGDH, VCAN, ZNF385A, **CD248**, **CD55**, **CLEC3B**, **EMILIN2**, **FBN1**, **GPX3**, **IGFBP6**, **LOXL1**, **MFAP5**, **TNXB**, **TPPP3**, *HMCN2*, *NOVA1*, *PLTP*, *SSC5D*, ***ACE***, ***CTHRC1*** [26, **11**, *4*, ***2***] - 43


* z2 - A1: CREB5, ECM1, FBN1, IGFBP5, IGFBP6, METRNL, PAMR1, PLXDC2, SEMA3C, TIMP3, VGLL3, **CHRDL1**, **CLEC3B**, **MFAP5**, **QPCT**, **SCARA5**, **TIMP2**, *ADAMTSL4*, *BASP1*, *CD248*, *CD34*, *COL14A1*, *DBN1*, *EMILIN2*, *LRRN4CL*, *PTGIS*, *SFRP2*, *UCHL1*, ***ACKR3***, ***ADGRD1***, ***CD55***, ***DPP4***, ***ISLR***, ***LIMS2***, ***MTCL1***, ***NPR1***, ***PI16***, ***PRKG2***, ***SEMA3E***, ***TUBB4A*** [11, **6**, *11*, ***12***] - 40

* z2 - A4: BMP7, CD34, DDAH2, GFPT2, IGFBP5, MGLL, NHSL1, PCSK6, PPP1R14B, PXN, RAMP2, SCARA3, SDK1, SMURF2, TRIO, ZNF385A, ZYX, **ACE**, **ACKR3**, **CLEC3B**, **DBN1**, **FBN1**, **IGFBP6**, **LOXL1**, **MFAP5**, **PAMR1**, **SCARA5**, **TIMP2**, **TIMP3**, *ACKR2*, *ADAMTSL4*, *ADGRG2*, *AXL*, *CHRDL1*, *DACT2*, *GAP43*, *HEG1*, *PROCR*, *PRSS23*, *UGDH*, ***ADGRD1***, ***AIF1L***, ***CD248***, ***CD55***, ***DPP4***, ***EMILIN2***, ***ISLR***, ***LIMS2***, ***NPR1***, ***PTGIS***, ***RAB32***, ***SEMA3C***, ***SEMA3E***, ***SFRP4***, ***WNT2*** [17, **12**, *11*, ***15***] - 55




* z1 - A4: ADAMTS5, FSTL1, GFPT2, MGLL, RAMP2, SDK1, TIMP2, TNFAIP6, VASN, ZYX, **ACE**, **CLEC3B**, **GPX3**, **IGFBP6**, **LOXL1**, **MFAP5**, **PTGIS**, **RAB32**, **SCARA5**, *ACKR3*, *ADGRD1*, *AXL*, *CHRDL1*, *FLNC*, *GAP43*, *HAS1*, *HEG1*, *PROCR*, *PRSS23*, *UGDH*, ***AIF1L***, ***CD248***, ***CD55***, ***DBN1***, ***DPP4***, ***EMILIN2***, ***NPR1***, ***SEMA3C***, ***SEMA3E***, ***SFRP4***, ***WNT2*** [10, **9**, *11*, ***11***] - 41
* z1 - B1: ADAMTS1, BCL3, C3, CSRNP1, CSRP2, GFPT2, HMOX1, IFI16, MYC, NFE2L2, NFKBIZ, NOCT, PHLDA1, PIM1, PLSCR1, PNP, SOD2, **ERRFI1**, **KDM6B**, **MAFF**, **NFKB1**, **NFKBIA**, **NR4A3**, **REL**, **TIPARP**, **TNFAIP3**, **TNFAIP6**, **UAP1**, **ZC3H12A**, *HAS2*, *PTGS2*, *PTX3*, *TNFAIP2*, ***CCL2***, ***CXCL2***, ***FOSL1***, ***GCH1***, ***IL6***   [17, **12**, *4*, ***5***] - 38
 


* y4 - B2: ABCA8, ABI3BP, C1S, FRMD6, HGF, ID4, IL11RA, LDB2, NRP1, OSMR, P2RY14, PLCXD3, TRIM47, TSPAN11, VEGFA, **APOC1**, **C7**, **CXCL12**, **CYGB**, **GGT5**, **IGFBP3**, **IGFBP7**, *ADCYAP1R1*, *AVPR1A*, *COL4A4*, *NDRG2*, *PTCH2*, *RBP1*, *SNED1*, *TMEM176A*, *TMEM176B*, ***APOE***, ***C3***, ***IL33***, ***MGP***, ***NFIB***, ***SLCO2B1***, ***TNFSF13B*** [15, **7**, *9*, ***7***] - 38
* y4 - B4: ARHGDIB, COL4A2, CYGB, DACT1, EGFR, EPHA3, EPS8, F3, FAM13A, FOXP1, GSN, HGF, IGFBP3, IGFBP7, ITM2B, MGST1, NGFR, NID2, OLFML2B, SERPING1, SRPX, TMEM176B, TNFSF13B, TSHZ2, **ABCA8**, **APOC1**, **APOD**, **C7**, **CXCL12**, **FGF7**, **FMO1**, **FZD4**, **GGT5**, **GPX3**, **HHIP**, **IGF1**, **NFIB**, **NTRK2**, **PODN**, **VIT**, **ZFHX4**, *ABCC9*, *ADAMTSL3*, *APOE*, *BMPER*, *BNC2*, *C3*, *COL4A4*, *FMO2*, *IL33*, *INMT*, *NR2F2*, NRP1, *PEAR1*, *PPL*, *PTCH2*, *SFRP4*, *TMEM176A*, ***CHRDL1***, ***GDF10***, ***MGP*** [24, **17**, *17*, ***3***] - 61



##### Sensación: hay genes de y2-B4 que no están en negrita-cursiva porque son más de B que de B4. Entonces quizás y2 tenga una relación más immune, pero no con un cluster específico.
* y2 - B4: ARPC1B, CAPN6, COL15A1, COL4A2, COL4A4, CRLF1, EGFR, EPS8, F3, FHL2, HIC1, HSPG2, HTRA3, IGFBP7, ITM2A, ITM2B, NID2, SPRY1, VWA1, **ABCA9**, **APOD**, **CYGB**, **FMO1**, **FZD4**, **GGT5**, **GPX3**, **GSN**, **IGF1**, **LSP1**, **MGP**, **MYOC**, **NFIB**, **PODN**, **TSHZ2**, **ZFHX4**, *ADAMTSL3*, *BMPER*, *INMT*, *PEAR1*, ***ABCA8*** [19, **16**, *4*, ***1***] - 40
* y2 - A1: ANGPTL2, CAPG, GPX3, HTRA3, IL17D, ITIH5, OGN, PLXDC2, PRELP, PRG4, **FBLN2**, **ISLR**, **LOX**, **MFAP5**, **PCOLCE**, **PODN**, **SERPINF1**, **SLC29A1**, **SMOC2**, **SVEP1**, **THBS3**, **TNXB**, *CILP*, *COL14A1*, *FAP*, *MGP*, *SSC5D*, ***ANGPTL1*** [10, **11**, *5*, ***1***] - 28


* B4 - y4: ARHGDIB, COL4A2, CYGB, DACT1, EGFR, EPHA3, EPS8, F3, FAM13A, FOXP1, GSN, HGF, IGFBP3, IGFBP7, ITM2B, MGST1, NGFR, NID2, OLFML2B, SERPING1, SRPX, TMEM176B, TNFSF13B, TSHZ2, **ABCA8**, **APOC1**, **APOD**, **C7**, **CXCL12**, **FGF7**, **FMO1**, **FZD4**, **GGT5**, **GPX3**, **HHIP**, **IGF1**, **NFIB**, **NTRK2**, **PODN**, **VIT**, **ZFHX4**, *ABCC9*, *ADAMTSL3*, *APOE*, *BMPER*, *BNC2*, *C3*, *COL4A4*, *FMO2*, *IL33*, *INMT*, *NR2F2*, NRP1, *PEAR1*, *PPL*, *PTCH2*, *SFRP4*, *TMEM176A*, ***CHRDL1***, ***GDF10***, ***MGP*** [24, **17**, *17*, ***3***] - 61
* B4 - y2: ARPC1B, CAPN6, COL15A1, COL4A2, COL4A4, CRLF1, EGFR, EPS8, F3, FHL2, HIC1, HSPG2, HTRA3, IGFBP7, ITM2A, ITM2B, NID2, SPRY1, VWA1, **ABCA9**, **APOD**, **CYGB**, **FMO1**, **FZD4**, **GGT5**, **GPX3**, **GSN**, **IGF1**, **LSP1**, **MGP**, **MYOC**, **NFIB**, **PODN**, **TSHZ2**, **ZFHX4**, *ADAMTSL3*, *BMPER*, *INMT*, *PEAR1*, ***ABCA8*** [19, **16**, *4*, ***1***] - 40
* B4 - y3: ARHGDIB, DEPTOR, DHRS3, EPHA3, F3, FHL2, GHR, GPC3, IGFBP3, IGFBP6, IGFBP7, KCNJ8, NPY1R, PEAR1, SFRP4, SLIT2, SUSD2, TIMP3, **ABCA8**, **APOD**, **CYGB**, **FGF7**, **FMO1**, **GPX3**, **GSN**, **IGF1**, **ITM2A**, **MGP**, **MYOC**, **NFIB**, **PHLDA3**, **PODN**, **TXNIP**, **ZFHX4**, *ABCC9*, *FMO2* [18, **16**, *2*, ***0***] - 36
* B4 - y1: APBB1IP, ARPC1B, COL4A2, EBF1, EGFR, GPC3, HIC1, IGFBP3, LGALS3BP, LGMN, MEDAG, NFIB, NR1H3, PRSS23, SLIT2, TGFBR2, THY1, TMEM135, **CXCL12**, **CYGB**, **EFEMP1**, **FMO1**, **FZD4**, **GGT5**, **GPX3**, **IGFBP6**, **ITM2A**, **LSP1**, **MGST1**, **RARRES2**, **ZFHX4**, *BMPER*, *FABP4*, *NOVA1*, *PLAT*, ***PPARG*** [18, **13**, *4*, ***1***] - 36

* C2 - w/x: ADAMTS9, CBFA2T3, COL11A1, EDNRA, ENHO, HTRA1, SLC48A1, SRPX, TBXA2R, TENM3, **CPNE5**, **CRABP1**, **FZD1**, **GPM6B**, **MEOX2**, **NCAM1**, **NFATC2**, **PLXDC1**, **PTH1R**, **RSPO4**, **SLC40A1**, **TBX15**, **TCF4**, **TRIB2**, **TRPS1**, *HS3ST6*, *KIF26B*, *MEGF6*, *NR2F1*, *TSHZ3*, ***CHST15***, ***COCH***, ***CYP1B1***, ***DKK2***, ***EMID1***, ***FIBIN***, ***FMOD***, ***MAFB***, ***MKX***, ***NRP2***, ***PTGFR***, ***TNMD*** [10, **15**, *5*, ***12***] - 42
* C2 - w1|w2: CNTN1, CTTNBP2, PRLR, PTGER3, PTPRD, SFRP1, TSPAN7, **ATP1B1**, **BTBD11**, **CLEC14A**, **DKK2**, **EMB**, **GPM6B**, **IGFBP5**, **MEIS2**, **NCAM1**, **SPARCL1**, *ALX4*, *CRABP2*, *DUSP10*, *LEF1*, *NDP*, *PDE1A*, *PTK7*, *RSPO4*, *RUNX3*, *SDC1*, *WNT5A*, ***CRABP1***, ***DAAM2***, ***NDNF***, ***NOTUM***, ***TRPM3***, ***TRPS1*** [7, **10**, *11*, ***6***] - 34
* C2 - y3: CALM2, MDFIC, OGN, PDE4B, PMEPA1, PRELP, RERG, SERTAD4, SUSD2, **EMB**, **FIBIN**, **FMOD**, **FZD1**, **KCNAB1**,  **PTPRD**, **PTPRK**, **TBX15**, *CAVIN2*, *F3*, *FGF9*, *LTBP4*, *PAQR6*, *SFRP1*, ***ASPN***, ***CADM1*** [9, **8**, *6*, ***2***] - 25


* C3 - w/x: ADAMTS9, AQP1, CDH11, CHCHD10, CNN2, COL11A1, COL16A1, CRABP1, DKK3, EDNRA, EGFLAM, EMP2, FIBIN, FZD1, GPM6B, HMCN1, MEF2C, MFAP4, MMP11, MXRA8, PLXDC1, PTMA, RSPO4, SESN3, SLC40A1, TBX15, TCF4, THY1, TMEM119, TMEM204, TPM2, TRPS1, **F2R**, **HTRA1**, **MFAP2**, **POSTN**, **TENM3**, *CPXM2*, *DKK2*, *EMID1*, *KIF26B*, *MEGF6*, *NR2F1*, *NREP*, *NRP2*, *RFLNB*, *TSHZ3*, ***COL7A1***, ***MAFB***, ***MMP16***, ***RASL11B*** [32, **5**, *10*, ***4***] - 51
* C3 - w4: BGN, BHLHE41, CALD1, CDH11, CNN2, EMP2, FABP5, ITGB1, ITGBL1, MGLL, MYH9, NPNT, PALLD, PPIC, PRELP, SORCS2, SPARC, **KIAA1217**, **LMO7**, **PMEPA1**, **SEMA5A**, *ACAN*, *ACTN1*, *ADAMTS9*, *AQP1*, *CGNL1*, *COL11A1*, *COL12A1*, *CPXM2*, *EDNRA*, *EGFLAM*, *LBH*, *MEF2C*, *MME*, *SLC40A1*,  *TAGLN*, *THBS4*, *TMEM119*, *TPM2*, ***COL7A1***, ***COL8A2***, ***LRRC15***, ***POSTN***, ***TENM3***   [17, **4**, *18*, ***5***] - 44
* C3 - x1: BMP1, CHPF, COL3A1, COPZ2, GLT8D2, ITGBL1, LUM, MARVELD1, MFAP4, MMP14, MMP2, P4HA2, PYCR1, RCN3, RRBP1, SLC16A3, SPARC, SPHK1, SPON2, **BGN**, **C1QTNF6**, **COL5A1**, **COL5A2**, **ELN**, **HTRA1**, *ADAMTS2*, *AEBP1*, *COL16A1*, *COL1A1*, *COL1A2*, *EMID1*, *PPIC*, *PPP1R14A*, *SULF2*, ***LTBP2*** [19, **6**, *9*, ***1***] - 35

* C1 - w4: ARHGAP28, ATP10A, CDH11, COL27A1, COL4A1, KIAA1217, KLF5, LMO7, MGLL, NPNT, NUAK1, NXN, PALLD, PMEPA1, POSTN, RNF152, SPARC, **MME**, **CNN2**, **SEMA5A**, *ADAMTS6*, *CD200*, *COL7A1*, *FOXD2*, *LMO4*, *LRRC15*, *NID2*, *NTRK3*, *PARD6G*, *PAWR*, *PRR5L*, *SATB2*, *TMEM119*, *TNMD*, *TPM2*, ***ACAN***, ***ACTA2***, ***ADAMTS18***, ***ADAMTS9***, ***BCL11B***, **CALD1**, ***CCND1***, ***COL11A1***, ***COL12A1***, ***COL8A2***, ***CPXM2***, ***EDNRA***, ***EDNRB***, ***EGFL6***, ***MEF2C***, ***RAMP1***, ***TAGLN***, ***TENM3*** [17, **3**, *15*, ***18***] - 53
* C1 - w3: AFAP1L2, COL6A3, FNBP1L, FOXD2, JAG1, KIF26B, LMO4, MDK, MGLL, MICAL2, NPM1, PALLD, PARD6G, PAWR, PMEPA1, SHOX2, SOX18, SOX4, TMEM119, TNS3, TPM2, **ACAN**, **ACTA2**, **ADAMTS18**, **ALX4**, **BCL11B**, **CALD1**, **CD200**, **CDH11**, **CNN2**, **COL11A1**, **DKK3**, **EDNRA**, **EVA1A**, **F2R**, **KIAA1217**, **LAMC3**, **MEF2C**, **PTCH1**, **ROBO2**, **RUNX2**, **STMN1**, *LRRC15*, *MMP11*, *NTRK3*, *TAGLN*, ***CCDC3***, ***EGFL6*** [21, **21**, *4*, ***2***] - 48


* C5 - w3: CFL1, COL11A1, DKK3, EDNRA, EEF1A1, F2R, PFN2, PMEPA1, PTMA, SMS, SOX4, STMN1, **ALX4**, **CXCR4**, **HEY2**, **JAG1**, **LMO4**, **MARCKSL1**, **PRDM1**, **PREX2**, **PTCH1**, **SOX18**, *KIF26B*, *LAMC3*, ***CDH11***, ***ROBO2*** [12, **10**, *2*, ***2***] - 26
* C5 - w1|w2: C4orf48, FST, LAMC3, MARCKSL1, PLK2, PTCH1, PTMA, RAB34, SPARCL1, **CXCR4**, **IGFBP3**, **PDE3A**, **PRDM1**, **PREX2**, **ROBO1**, *EDN3*, *HEY2*, *SDC1*, *SLC26A7*, *SPON1*, *TRPS1*, ***ALX4***, ***BMP7***, ***INHBA***, ***LEF1***, ***PGM2L1***, ***SOX18***, ***TFAP2A***, ***WNT5A*** [9, **6**, *6*, ***8***] - 29



* D1|D2 - v1: BHLHE40, CSPG4, CSRP1, DMD, DOCK9, ENDOD1, FRMD4B, JAG1, KTN1, MDFIC, MFAP5, NDRG1, PHLDA3, PLSCR4, PTCH1, SFRP1, STMN1, SYNE2, TLN2, TUBA4A, **ADAMTSL3**, **AKAP12**, **CAVIN2**, **CD200**, **CTNNAL1**, **DACT1**, **DUSP5**, **EFNA1**, **ETV1**, **IGFBP6**, **ISYNA1**, **MRAS**, **MTUS1**, **NR2F2**, **OLFML2A**, **PEAR1**, **PLEKHA4**, **SLC2A1**, **SORBS1**, **SOX9**, **STXBP6**, **TGFBI**, **TJP1**, **VIT**, *CCDC3*, *DDIT4*, *EZR*, *PERP*, *TPD52*, *TRIB2*, ***AQP3***, ***BNC2***, ***CAV1***, ***CAV2***, ***CLDN1***, ***EBF2***, ***EFNB1***, ***EGR3***, ***ETV4***, ***GAB1***, ***GPC1***, ***ITGA6***, ***ITGB4***, ***KLF5***, ***KRT19***, ***MTSS1***, ***NDRG2***, ***SBSPON***, ***SLC12A2***, ***TENM2*** [20, **24**, *6*, ***20***] - 70
* D1|D2 - y5: ARL4A, CDKN2B, CNN3, CSPG4, GPC3, ITM2A, LUM, NID1, PHLDA1, SMOC2, SPARCL1, **ABCA8**, **ENTPD2**, **ETV1**, **TGFBI**, **TM4SF1**, ***APOD***, ***COL8A1***, ***EBF2***, ***FOXS1***, ***MATN2***, ***NR2F2***, ***P2RY14***, ***SOX9***, ***VIT*** [11, **5**, *0*, ***9***] - 25
* D1|D2 - y4: CCDC3, COL4A2, CYGB, EPS8, FMO1, IGFBP7, ITM2B, MFAP5, NR2F1, PHLDA1, SPARCL1, WFDC1, **ABCA8**, **APOD**, **BNC2**, **CAV1**, **CYP1B1**, **DACT1**, **EBF2**, **EGFR**, **ENTPD2**, **GPC6**, **ITGA6**, **KLF5**, **NGFR**, **P2RY14**, **S100B**, **SCN7A**, **SFRP4**, *CHRDL1*, *LTBP4*, ***ADAMTSL3***, ***FMO2***, ***INMT***, ***MEOX2***, ***NDRG2***, ***NR2F2***, ***PEAR1***, ***VIT*** [12, **17**, *2*, ***8***] - 39


* y5 - B4: ABCA8, F3, GPC3, HSPG2, IL33, ITM2A, LXN, MGP, RARRES1, **COL15A1**, **GSN**, **NTRK2**, *ABCA6*, *APOD*, *FOXS1*, *NR2F2*, *PHGDH*, *VIT*, *VWA1*, ***MYOC***, ***TSHZ2*** [9, **3**, *7*, ***2***] - 21
* y5 - D1: ARL4A, CNN3, CSPG4, ITM2A, LUM, NID1, **EBF2***, **ENTPD2**, **ETV1**, **GPC3**, **PHLDA1**, **SPARCL1**, **TM4SF1**, *CDKN2B*, *MATN2*, ***ABCA8**, ***APOD***, ***COL8A1***, ***FOXS1***, ***NR2F2***, ***P2RY14***, ***SOX9***, ***TGFBI***, ***VIT*** [6, **7**, *2*, ***9***] - 24
* y5 - D2: ABCA8, CNN3, EBF2, ETV1, ITM2A, NR2F2, SMOC2, TGFBI, TM4SF1, VIT, *FOXS1*, ***MATN2***, ***P2RY14*** [10, **0**, *1*, ***2***] - 13


* v1 - D1: CAVIN2, CSPG4, CSRP1, CTNNAL1, DOCK9, FRMD4B, JAG1, MTUS1, NDRG1, PEAR1, PLEKHA4, PTCH1, STMN1, SYNE2, TLN2, TUBA4A, **CD200**, **DUSP5**, **EFNA1**, **EGR3**, **ETV1**, **MRAS**, **NR2F2**, **OLFML2A**, **SLC12A2**, **SOX9**, **TENM2***, **TGFBI**, **VIT**, *BNC2*, *EZR*, *ITGA6*, *ITGB4*, *KLF5*, *SBSPON*, *SLC2A1*, ***AKAP12***, ***CLDN1***, ***EBF2***, ***EFNB1***, ***ETV4***, **MTSS1***, ***NDRG2*** [16, **13**, *7*, ***7***] - 43
* v1 - D2: BHLHE40, DMD, ENDOD1, ETV1, FRMD4B, JAG1, KTN1, MDFIC, MFAP5, NDRG1, NR2F2, PHLDA3, PLSCR4, PTCH1, SFRP1, STMN1, STXBP6, SYNE2, TLN2, **ADAMTSL3**, **AKAP12**, **CAVIN2**, **CSRP1**, **DACT1**, **DUSP5**, **IGFBP6**, **ISYNA1**, **MRAS**, **OLFML2A**, **PEAR1**, **PLEKHA4**, **SORBS1**, **TGFBI**, **TJP1**, **VIT**, *CCDC3*, *DDIT4*, *EZR*, *PERP*, *TPD52*, *TRIB2*, ***AQP3***, ***BNC2***, ***CAV1***, ***CAV2***, ***CLDN1***, ***DOCK9***, ***EBF2***, ***EFNB1***, ***GAB1***, ***GPC1***, ***ITGA6***, ***ITGB4***, ***KLF5***, ***KRT19***, ***MTSS1***, ***NDRG2***, ***SBSPON***, ***SLC2A1***, ***TENM2*** [19, **16**, *6*, ***19***] - 60


* y4 - D1: APOD, BNC2, EBF2, EGFR, EPS8, FMO1, IGFBP7, ITGA6, KLF5, LTBP4, MEOX2, NDRG2, NGFR, NR2F2, PEAR1, S100B, SCN7A, VIT, **CYP1B1**, **P2RY14**, **PHLDA1**, **SPARCL1**, **WFDC1**, *ABCA8*, *CYGB*, *INMT*, *SFRP4*, ***ENTPD2***, ***FMO2*** [18, **5**, *4*, ***2***] - 29
* y4 - D2: ABCA8, ADAMTSL3, CCDC3, COL4A2, EBF2, EGFR, CYP1B1, IGFBP7, ITM2B, LTBP4, MEOX2, MFAP5, NDRG2, NR2F1, NR2F2, P2RY14, SCN7A, VIT, **BNC2**, **CAV1**, **DACT1**, **GPC6**, **ITGA6**, **KLF5**, **NGFR**, **PEAR1**, *CHRDL1*, *SFRP4*, ***INMT*** [18, **8**, *2*, ***1***] - 29

* E1 - y2: APOD, CAPG, COL15A1, FHL2, G0S2, GSN, IGF1, ITGA11, ITM2A, ITM2B, JUP, LAMA2, LPL, LSP1, MFAP5, SCN7A, SFRP1, SPRY1, TMEM204, **CMKLR1**, **MGP**, **PLEKHA6**, **RGMA**, *COL14A1*, *CRLF1*, *GPM6B*, *INMT*, *PEAR1*, *TGFBI*, *TSHZ2*, *VWA1*, ***MEOX1*** [19, **4**, *8*, ***1***] - 32

* E1 - w3|w4: ACOT7, ALX4, BHLHE40, BHLHE41, C1QTNF7, CALD1, CCDC34, CDC42EP3, CDK6, DKK3, EFNB1, EMP2, EVA1A, FHL2, HES1, HEY1, ITGB1, JUP, KLF5, LAMC3, LMO7, MAF, MFGE8, MSI2, RASGRP2, RGCC, RUNX3, SEMA3G, SEMA5A, SHOX2, TBX18, **EGR2**, **KIAA1217**, **OLFML2A**, **RAMP1**, **TCF7L2**, *AQP1*, *ARHGDIB*, *CDH11*, *COL8A2*, *CSRP1*, *EDNRA*, *F2R*, *FOXD2*, *PARD6G*, *RUNX2*, *STMN2*, ***IGFBP2***, ***NTRK3*** [31, **5**, *11*, ***2***] - 49

#### Comments on human-mouse
Representation of genes: MARKER, **MARKER IN HUMAN**, *MARKER IN MOUSE*, ***MARKER IN HUMAN AND MOUSE***  [1, **3**, *6*, ***3***]  -  13

Of note, the selection of genes is based on the [LOC (fb, UMAP)] column from [this table](https://docs.google.com/spreadsheets/d/1lfI6sgjEyg37BGL7VRMfW7KgwGKwX5QrCtnKYk1DXY4/edit?usp=sharing). To make this table I based on the UMAPs from 4H and 4M notabooks. The notation is X ~ Y if two clusters or axes are equally *relevant*, and X > Y if X is *more relevant* than Y. **This notation is subjective** but to some extent necessary.

The logic for a marker on the following list to be chosen **is also subjective** and is based in two factors: either (1) the marker is exclusive of the cluster it is said to be represented (e.g. if it says is of cluster b5 and the UMAP show it is only expressed in cluster b5 -- b5 -- or it is expressed in b5 preferentially -- b5 > X --) or (2) it is a marker of that cluster, among others (b5 ~ X or b ~ X > Y).
    
---

* A4: both a2 and a1 show a great overlap with A4. Looking at the markers, a2 has 2 fully overlapping markers (mouse/human) and a1 none, but this is not very relevant. As good markers I would choose IGFBP6, SEMA3E and PCOLCE2.
    * A4 - a1:  ADGRD1, FLNC, VASN, **ACE**, *HAS1*, *METRNL*, *TNFAIP6*, *UGDH*  -  [3, **1**, *4*, ***0***]  -  8
    * A4 - a2: ADAMTSL4,  LIMS2, **CLEC3B**, **IGFBP5**, **ISLR**, **PAMR1**, **PTGIS**, **SFRP4**, *ADGRG2*, ***ACKR3***,  ***RAB32***  -  [2, **6**, *1*, ***2***]  -  11
    * A4 - a1 - a2: CD55, CHRDL1, CREB5, DPP4, GAP43, HEG1, PRSS23, SCARA5, TIMP2, **CD248**, **DBN1**, **EMILIN2**, **SEMA3C**, *ALDH1A3*, *BASP1*, *FNDC1*, *GFPT2*, *MFAP5*, *NPR1*, ***AIF1L***, ***IGFBP6***, ***PCOLCE2***, ***SEMA3E***, ***WNT2***  -  [9, **4**, *6*, ***5***]  -  24
    
    
* A1: in general there is a good gene overlap within all groups. Maybe b/c is the less overlapping group. Curiously, the "good markers" between A1 and a2, are different to the ones between A1 and b3 or A1 and c1. This migh t be that each mouse cluster has independent functions, which are shared with A1.
    * A1 - a2 [endo/immune, A4-like]: DBN1, LIMS2, PAMR1, PTGIS, **CD55**, **CHRDL1**, **CLEC3B**, **ECM1**, **ISLR**, **SCARA5**, **SFRP2**, **TIMP2**, **TUBB4A**, *EMILIN2*, *IGFBP6*, *SEMA3C*, *UCHL1* , ***ACKR3***, ***CD248***, ***DPP4***, ***MFAP5***, ***PI16***, ***PRKG2***  -  [4, **9**, *4*, ***6***]  -  23
    
    * A1 - b3 [B-like]: C1QTNF3, FGL2, GPX3, **CD151**, **FBLN1**, **FBLN2**, **PCOLCE**, **SERPINF1**, **SMOC2**, *COL12A1*, *ELN*, *MFAP4*, *MGP*, *PAM*, *PTGIS*, ***ABCC9***, **ANGPTL1**, ***CCN5***, ***CLU***, ***LGR5***, ***LOX***, ***OMD***, ***PDGFRL***, ***PODN***, ***SFRP2***   -  [3, **7**, *6*, ***9***]  -  25
    
    * A1 - b/c: CCDC80, CD34, COL12A1, GALNT15, GPX3, MGST1, **AEBP1**, **ANGPTL1**, **CADM3**, **CPZ**, **CTSK**, **DCN**, **HPGD**, **PCOLCE**, **PDGFRL**, **SEMA3B**, **SERPINF1**, **THBS3**, **TNXB**, *C1QTNF3*, *CYP4B1*, *MMP27*, *PLTP*, ***ABCC9***, ***LGR5***, ***THBS2***  -  [6, **13**, *4*, ***3***]  -  26
    
    * A1 - c1 [ECM breakdown, C-like]: COL1A1, SPARC, **AEBP1**, **ANGPTL1**, **COL1A2**, **DCN**, **PDGFRL**, **SCARA5**, *C1QTNF3*, *ELN*, *ITGBL1*, *MFAP4*, *PCOLCE2*, ***CPZ***, ***CTSK***, ***CYBRD1***, ***HPGD***, ***MMP27***, ***SEMA3B***,  -  [2, **6**, *5*, ***6***]  -  19
    
    
* A3: Considering it as a "bridge" population it is clear that there are not very good overlappping markers.
    * A3 - b2: CD81, CD9, COL6A1, COL6A2, CP, ENPP3, FBLN2, LOX, MFAP5, SMOC2, **COL14A1**, **SERPINF1**, **SVEP1**, **THBS3**, *ISLR*, *PODN*, *SSC5D*, ***ITIH5***, ***THBS4***  -  [10, **4**, *3*, ***2***]  -  19
    
    * A3 - c1: AEBP1, COL1A1, COL1A2, COL5A1, COPZ2, DCN, HPGD, HTRA1, KDELR3, P4HA2, PDGFRL, RCN3, RSPO1, SERPINH1, SPARC, TSPAN4, **COL3A1**, *ADAMTS2*, *ALDH3A1*, *C1QTNF3*, *CGREF1*, *CTSK*, *CYBRD1*, *ECM2*, *MFAP4*, *PCOLCE2*, *PPIC*, *SEMA3B*,  ***ELN***, ***MMP27***  -  [16, **1**, *11*, ***2***]  -  30
    
    * A3 - b2 - c1: **ANGPTL1**, **BGN**  -  [0, **2**, *0*, ***0***]  -  2

---
    
* A2
    * A2 - c2: COL3A1, EMX2, LAMC3, PTK7, SCARF2, SMIM3, SPON1, TCF4, TGFBI, ZNF608, **COL23A1**, **CYP26B1**, **NKD1**, **NKD2**, **PREX1**, **RSPO1**, *AXIN2*, *CD9*, *COL7A1*, *HS3ST6*, *IGFBP2*, *MAMDC2*, ***AHRR***, ***F13A1***, ***GREM2***, ***KCNK2***, ***LSAMP***, ***PTPRE***, ***SPRY1***, ***STC1***, ***TNFRSF19***, ***TWIST2***   -  [10, **6**, *6*, ***10***]  -  32

---

* B1 and B3 vs b6 : Some of the genes (CCL2, CXCL2) are also expressed in a1. Also, it is likely that B1 is more related to b6 than B3, but this would also be somewhat expected because it is a bridge cluster.
    * B1 - b6: ARID5A, ARL5B, BCL3, ETS2, GFPT2, PTGS2, TNFAIP2, **CEBPB**, **ERRFI1**, **FOSL1**, **GCH1**, **IER3**, **KDM6B**, **MAFF**, **NR4A3**, **PNRC1**, *CSRNP1*, *LIF*, ***ELL2***, ***IL6***, ***NFKB1***, ***TNFAIP6***  -  [7, **9**, *2*, ***4***]  -  22
    * B3 - b6: ATF3, BTG2, CSF1, FOSB, GADD45B, OSMR, RARRES2, RNF122, **CXCL12**, **EGR1**, **JUNB**, **TMEM176A**, **TMEM176B**, **VCAM1**, **ZFP36**, ***CYP7B1***  -  [8, **0**, *7*, ***1***]  -  16
    * B1 - B3 - b6: ADAMTS1, BTG1, CCNL1, MYC, NNMT, UGCG, **ADAMTS4**, **CCL2**, **CXCL2**, **ICAM1**, **IRF1**, **SOCS3**, **SOD2**, **TNFAIP3**, *NFKBIZ*, ***BIRC3***, ***NFKBIA***   -  [6, **8**, *1*, ***2***]  -  17

---

* B2 and B4 vs b4: there is not a clear relationship between b4 and B2/B4. 
    * B2 - b4: C1S, COL4A2, FRMD6, IL11RA, P2RY14, **IL33**, *AVPR1A*, *ID4*, *NDRG2*, *PTCH2*, *SNED1*, ***COL4A4***, ***TMEM176B***  -  [5, **1**, *5*, ***2***]  -  13
    * B4 - b4: BMPER, FZD4, PPL, SERPING1, SRPX, **GPX3**, **GSN**, **NFIB**, **TSHZ2**, *ADAMTSL3*, *ADCYAP1R1*, *F3*, *VIT*, ***GDF10***, ***MGP***, ***NTRK2***  -  [5, **4**, *4*, ***3***]  -  16
    * B2 - B4 - b4: C3, **GGT5**, *APOE*, *IGFBP7*, *NRP1*, *TNFSF13B*, ***C7***, ***CXCL12***, ***CYGB***, ***TMEM176A***  -  [1, **1**, *4*, ***4***]  -  10


* B4 vs b1 and b4
    * B4 - b1: EBF1, GPC3, LGALS3BP, NOVA1, **EFEMP1**, **FMO1**, **ITM2A**, **MGST1**, *CD36*, *FABP4*, ***C6***, ***PPARG***  -  [4, **4**, *2*, ***2***]  -  12
    * B4 - b4: PPL, SERPING1, SRPX, **GSN**, **NFIB**, *ADAMTSL3*, *ADCYAP1R1*, *APOE*, *C3*, *C7*, *F3*, *NRP1*, *TMEM176A*, *TNFSF13B*, *VIT*, ***GDF10***, ***MGP***, ***NTRK2***, **TSHZ2**  -  [3, **2**, *10*, ***3***]  -  18
    * B4 - b1 - b4: BMPER, FZD4, **GGT5**, **GPX3**, *IGFBP7*, ***CXCL12***, ***CYGB***  -  [2, **3**, *1*, ***2***]  -  8

---
Since we "already know" by Joost et al. 2020 that these populations belong to DS and DP, they are not THAT relevant to be studied as novel populations. However, considering the nice overlap between each pair (C1-d3, C5-d1/d2, C2-c/d), maybe doing a human-mouse pair to pair comparison would be nice.

* C1
    * C1 - d3: KIAA1217, MDK, MICAL2, STMN1, TNN, TNS3, **ALX4**, **EDNRA**, **EDNRB**, **LAMC3**, **PTCH1**, **ROBO2**, **RUNX2**, **TENM3**, *CD200*, *EGFL6*, *NTRK3*, *PALLD*, *TAGLN*, *TMEM119*, *TPM2*,  ***ADAMTS18***, ***BCL11B***, ***CDH11***, ***CNN2***, ***COL11A1***, ***F2R***, ***KIF26B***, ***MEF2C***  -  [6, **8**, *7*, ***8***]  -  29



* C5 
    * C5 - d1 (& d2): PTMA, *ALX4*, *CRABP1*, *SDC1*, *SPON1*, *TNN*, ***BMP7***, ***FBXO32***, ***IGFBP3***, ***INHBA***, ***MRPS6***, ***PGM2L1***, ***RSPO3***, ***TFAP2A***, ***TRPS1***  -  [1, **0**, *5*, ***9***]  -  15


* C2 and C3 vs c/d: C2 might be more clear than C3 to be associated to c/d, possibly because C3 is also a bridge cluster. 
    * C2 - c/d: NR2F1, PTH1R, SRPX, **CPNE5**, **CRABP1**, **MEOX2**, **NCAM1**, **RSPO4**, *CHST15*, *CYP1B1*, *TBXA2R*, *TRIB2*, ***CCK***, ***COCH***, ***FIBIN***, ***FMOD***, ***MKX***, ***PLXDC1***, ***PTGFR***, ***TNMD***  -  [3, **5**, *4*, ***8***]  -  20
    * C3 - c/d: TPM2, **COL8A2**, **MFAP2**, **TRIL**, *EGFLAM*, *NREP*, *RFLNB*, ***COL7A1***, ***F2R***, ***MMP16***, ***RASL11B***  -  [1, **3**, *3*, ***4***]  -  11
    * C2 - C3 - c/d: COL11A1, KIF26B, NRP2, TRPS1, **EMID1**, **TBX15**, *ADAMTS9*, *EDNRA*, *TSHZ3*, ***DKK2***, ***GPM6B***, ***MAFB***, ***TCF4***, ***TENM3***  -  [4, **2**, *3*, ***5***]  -  14

---

It looks like b5 and e1 are related to either D1 and D2, but b5 is more related to D1, while e1 to D2

* D1 and D2 vs b5
    * D1 - b5: CNN3, LUM, **ABCA8**, **ETV1**, **PHLDA1**, *ITM2A*, *SPARCL1*, ***APOD***, ***COL8A1***, ***ENTPD2***, ***GPC3***, ***P2RY14***, ***SOX9***  -  [2, **3**, *2*, ***6***]  -  13
    * D2 - b5: BHLHE40, PHLDA3, **CSRP1**, *DDIT4*, ***SBSPON***,  -  [2, **1**, *1*, ***1***]  -  5
    * D1 - D2 - b5: KLF5, **CCL2**, **MEOX2**, **TGFBI**, **TM4SF1**, *MATN2*, ***CLDN1***, ***EBF2***, ***FOXS1***, ***NR2F2***, ***VIT***  -  [1, **4**, *1*, ***5***]  -  11
    
     
* D1 and D2 vs e1  [En e1 y b1 también salen bastantes genes de d2]
    * D1 - e1: CTNNAL1, PTCH1, *EGR3*, *HMGA1*, ***ETV4***, ***SOX9***, ***TIAM1***  -  [2, **0**, *2*, ***3***]  -  7
    * D2 - e1: CCDC3, MTSS1, **CSRP1**, **LMO7**, **SBSPON**, *CAVIN2*, *FRMD4B*, *PHLDA3*, *SYNE2*, *TPD52*, ***BNC2***, ***CAV1***, ***CAV2***, ***EFNB1***, ***ITGA6***, ***ITGB4***, ***KRT19***, ***PALMD***, ***SLC2A1***  -  [2, **3**, *5*, ***9***]  -  19
    * D1 - D2 - e1: EZR, **NR2F2**, *KLF5*, ***CLDN1***, ***EBF2***, ***NDRG2***, ***TENM2***   -  [1, **1**, *1*, ***4***]  -  7

In [None]:
os.system('jupyter nbconvert --to html 7HM_mouse_human_comparison.ipynb')