# Mouse / human comparison

For this notebook **you need to run the 4M and 4H notbeooks previously!!**.

In this notebook we are going to analyse to map the papillary and reticular transcriptional programs into mouse and human datasets. With that, and the information from mouse-human overlaps we are going to see if the papillary/reticular transcriptional programs are consistent across species, or each species follows a different pattern.

## imports

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import scanpy as sc
import scanpy.external as sce
import pandas as pd
import numpy as np
import os
import triku as tk
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
from tqdm.notebook import tqdm
import scipy.sparse as spr
import matplotlib.cm as cm
import networkx as nx

In [None]:
!pip install munkres
from munkres import Munkres

In [None]:
# local imports and imports from other notebooks
from cellassign import assign_cats
from fb_functions import make_gene_scoring_with_expr, plot_score_graph, plot_UMAPS_gene, plot_adata_cluster_properties, make_dicts_fraction_mean, plot_dotplot_gene
%store -r dict_colors_human
%store -r dict_colors_mouse

dict_colors_human_mouse = {**dict_colors_human , **dict_colors_mouse}

%store -r seed
%store -r magma
%store -r data_dir

In [None]:
%store -r dict_make_gene_scoring_robust
%store -r dict_make_gene_scoring_axis_robust

In [None]:
mpl.rcParams['figure.dpi'] = 120
pd.options.display.float_format = "{:,.2f}".format

In [None]:
print(sorted(['COL11A1', 'FMO1', 'CD36', 'ACTA2', 'CNN1', 'PPARG', 'ADIPOQ', 'CEBPA', 'CD146']))

In [None]:
# HUMAN SKIN

# Janson 2012 
# First list extracted from Fig2 (heatmap) and from Fig4 and Fig5
papillary_janson_2012_1 = ['ADH1A', 'ADRA2A', 'AXIN2', 'CASP1', 'ACKR4', 'CD302', 'CTSC', 'DENND2A', 'HRSP12', 'LRIG1', 'MAF', 'MOXD1', 'NTN1', 'PDPN', 'RGL1', 'SEPP1', 'SIPA1L2', 'STEAP1', 'TFAP2C', 'TMEM140']
papillary_janson_2012_2 = ['ACKR4', 'GPER1', 'ITM2C', 'NTN1', 'PDPN', 'STEAP1', 'TNFRSF19']

reticular_janson_2012_1 = ['A2M', 'CDH2', 'DACT1', 'DBNDD2', 'FCRLB', 'FNDC1', 'FSTL3', 'GLS', 'KRT19', 'KRTAP1-5', 'MAP1B', 'MGP', 'NEXN', 'SULF1', 'TAGLN', 'TMEM200A', 'TPM1', 'VGLL4']
reticular_janson_2012_2 = ['CDH2', 'CNN1', 'MAP1B', 'MGP', 'PPP1R14A', 'TAGLN', 'TGM2', 'TMEM200A']


# Nauroy 2017
# This list is extracted from the Fig S4a table
papillary_nauroy_2017 = ['ANGPT1', 'BMP2', 'CCL2', 'CCL8', 'CD109', 'COL10A1', 'COL18A1', 'COL7A1', 'COLEC12', 'CSPG4', 'CTSC', 'CTSK', 'CTSS', 'CXCL1', 'DCN', 'FGF13', 'IL15', 'INHBB', 'LOXL3', 'MPP1', 'NTF3', 
                         'PDGFC', 'PLXNC1', 'S100A8', 'SRPX2', 'TGFB2', 'TNFSF4', 'WNT5A']
reticular_nauroy_2017 = ['A2M', 'ACAN', 'ADAMTSL1', 'ANGPTL1', 'BMP6', 'COL11A1', 'COL14A1', 'COMP', 'CRLF1', 'EFEMP1', 'ELN', 'FBLN2', 'FGF18', 'FGF7', 'GPC4', 'IGF1', 'MFAP5', 'MGP', 'PCOLCE2', 'PCSK5', 
                         'PDGFD', 'PLXDC2', 'SFRP4', 'SLIT3', 'SPOCK1', 'TGM2', 'THBS2', 'WNT4']



# Philippeos 2018
papillary_philippeos_2018 = ['APCDD1', 'AXIN2', 'C8orf22', 'CCL14', 'CCL15', 'CCL5', 'CLEC10A', 'CLEC2A', 'CLEC7A', 'COL18A1', 'COL23A1', 'COL6A5', 'CTSW', 'DIRAS3', 'ESRG', 'FCER1A', 'FREM1', 'HIGD1B', 'HSPB3', 
                             'IFNG', 'IGLL5', 'LYZ', 'PTGDS', 'PTGS1', 'PTK7', 'ROBO2', 'RSPO1', 'SGCA', 'SGCG', 'SPON1', 'TRAT1', 'WIF1', 'XCL1', ]
reticular_philippeos_2018 = ['AQP5', 'AZGP1', 'CA6', 'CD36', 'CEACAM5', 'CEACAM6', 'CLDN10', 'CLDN7', 'CRISP3', 'DCD', 'DNER', 'ELF3', 'FABP9', 'GABRP', 'GRB14', 'KRT25', 'KRT27', 'KRT28', 'KRT35', 'KRT7', 'KRT71', 
                             'KRT8', 'MUCL1', 'OBP2A', 'OBP2B', 'PART1', 'PRR9', 'ROPN1B', 'S100A1', 'SCL6A14', 'SLC13A2', 'STAC2', 'TCHH', ]


# Korosec 2019 
markers_korosec_2019 = ['FAP', 'THY1']  # FAP+CD90- = papillary  |  FAP+/-CD90+ = reticular
papillary_korosec_2019 = ['DPP4', 'NTN1', 'PDPN', 'SFRP2']
reticular_korosec_2019 = ['ACTA2', 'ADIPOQ', 'CD146', 'MCAM', 'CEBPA', 'CNN1', 'COL11A1', 'FMO1', 'PPARG']


# Haydont 2019
papillary_haydont_2019 = ['APCDD1', 'AXIN2', 'COL10A1', 'COL23A1', 'COL7A1', 'COLEC12', 'CSPG4', 'CTSC', 'DCN', 'HSPB3', 'IL15', 'INHBB', 'LOXL3', 'NPTX2', 'NTF3', 'PLXNC1', 'PTGDS', 'ROBO2', 'SFRP2', 'TGFB2', 
                          'THBS2', 'TNFRSF19', 'WNT5A', ]
reticular_haydont_2019 = ['A2M', 'ACAN', 'ADAMTSL1', 'ANGPTL1', 'BMP6', 'CCL2', 'CDH2', 'COL11A1', 'COL14A1', 'COMP', 'CRLF1', 'CXCL1', 'DIRAS3', 'DNER', 'EFEMP1', 'FGF7', 'GPC4', 'GPER', 'GRB14', 'IGF1', 
                          'MAP1B', 'MFAP5', 'MGP', 'PCOLCE2', 'PCSK5', 'PPP1R14A', 'SFRP4', 'SPON1', 'STEAP1', 'TAGLN', 'TGM2', 'TMEM200A', ]


# Haydont 2020
# Extracted from Fig 5
papillary_haydont_2020 = ['CADM1', 'EFHD1', 'TOX', 'UCP2']
reticular_haydont_2020 = ['ACAN', 'COL11A1', 'DIRAS3', 'EMCN', 'FGF9', 'LIMCH1', 'MGST1', 'NPR3', 'SOST', 'SOX11', 'VCAM1']

In [None]:
%store -r list_all_datasets_human
%store -r list_accepted_clusters_human
%store -r list_names_human

**TO NO MAKE A HUGE BUNCH OF CODE, PASTE EACH LIST OF GENES AND PLOT IT TO MAKE THE ANALYSIS**

In [None]:
genes =  papillary_haydont_2020 + reticular_haydont_2020
dict_fraction_cells, dict_mean_exp = make_dicts_fraction_mean(genes, list_all_datasets=list_all_datasets_human, list_accepted_clusters=list_accepted_clusters_human, 
                                                              list_names=list_names_human, clusterby='cluster_robust')

In [None]:
for gene in genes:
    print(gene)
    plot_dotplot_gene(gene, dict_fraction_cells, dict_mean_exp)
    plot_UMAPS_gene(gene, list_datasets=list_all_datasets_human, list_names=list_names_human, n_cols=5)
    plt.show()

### Janson 2012
* papillary: **ADRA2A** (B2 ~ B3), **AXIN2** (A2 > C2), **CCRL1/ACKR4** (A), **CTSC** (A2), **CTSC** (A2), **LRIG** (C2), **MAF** (E1? ~ C1?), **MOXD1** (A2), **NTN1** (A), **STEAP1** (A1 ~ A3), **TFAP2C** (C5), ADH1A, CASP1, CD302, DENND2A, HRSP12, PDPN, RGL1, SEPP1, SIPA1L2, TMEM140 

* papillary (focused): **CCRL1/ACKR4** (A), **NTN1** (A), **STEAP1** (A1 ~ A3), **TNFRSF19** (A2), GPER/GPER1, ITM2C, PDPN

* reticular: **A2M** (C1 ~ D1 ~ D2 ~ E1), **DACT1** (D2), **FNDC1** (A4 ~ C1), **KRT19** (D2), **MGP** (A1 ~ B4), **SULF1** (A4?), **TAGLN** (D2), **TPM1** (C1 ~ D2 > A1 ~ A4), CDH2, DBNDD2, FCRLB, FSTL3, GLS, KRTAP1-5, MAP1B, NEXN, TMEM200A, VGLL4

* reticular (focused): **CNN1** (D2?), **MGP** (A1 ~ B4), **PPP1R14A** (C1 > C2), **TAGLN** (D2), CDH2, MAP1B, TGM2, TMEM200A

### Nauroy 2017
* papillary: **CCL2** (B1 ~ B3), **CCL8** (B3), **CD109** (A2), **COL10A1** (C3?), **COL18A1** (A2), **COL7A1** (A2 ~ C1), **CTSC** (A2 ~ B2 > B1), **CTSK** (A1 ~ C2), **CTSS** (B3 > A1), **CXCL1** (B1 > B3), **DCN** (A1/A3/A4 > B4), **FGF13** (C2), **IL15** (B3), **LOXL3** (A2?), **PDGFC** (A), **PLXNC1** (A1 ~ A4), **SRPX2** (D1), **WNT5A** (C5), ANGPT1, BMP2, COLEC12, CSPG4, INHBB, MPP1, NTF3, S100A8, TGFB2, TNFSF4

* reticular: **A2M** (C1 ~ D1 ~ D2 ~ E1), **ACAN** (C1), **ADAMTSL1** (A1), **ANGPTL1** (A), BMP6, **COL11A1** (C1), **COL14A1** (A ~ C3), **COMP** (A2 ~ C3), **EFEMP1** (B4), **ELN** (A1/A3 ~ C3), **FBLN2** (A1 ~ A4), **FGF7** (B), **IGF1** (B4 ~ E1), **MFAP5** (A1 ~ A4 ~ C1), **MGP** (A1 ~ B4), **PCOLCE2** (A4 > A1), **PCSK5** (A1/A3/A4), **PDGFD** (E1 > C2), **SFRP4** (A4 > D1 ~ D2), **THBS2** (A), CRLF1, FGF18, GPC4, PLXDC2, SLIT3, SPOCK1, TGM2, WNT4

### Philippeos 2018
* papillary_philippeos_2018 = **APCDD1** (A2), **AXIN2** (A2 > C2), **C8orf22** (B2 ~ B3 ~ B4), **CCL14** (B1?), **CCL5** (B3), **CLEC2A** (A2), **COL18A1** (A2), **COL23A1** (A2 ~ C5), **COL6A5** (A2), **DIRAS3** (A2 ~ C2), **HSPB3** (A2),  **PTGDS** (A2 ~ D1 ~ B2), **PTGS1** (A2), **PTK7** (A2 ~ C2), **ROBO2** (A2 ~ C2), **RSPO1** (A2), **SGCA** (A1 ~ A3), **SGCG** (A), **SPON1** (A2 ~ C), **WIF1** (A3), CCL15, CLEC10A, CLEC7A, ESRG, CTSW, FCER1A, FREM1, HIGD1B, IFNG, IGLL5, LYZ, TRAT1, XCL1

* reticular_philippeos_2018 = AQP5, AZGP1, CA6, CD36, CEACAM5, CEACAM6, CLDN10, CLDN7, CRISP3, DCD, DNER, ELF3, FABP9, GABRP, GRB14, KRT25, KRT27, KRT28, KRT35, KRT7, KRT71, KRT8, MUCL1, OBP2A, OBP2B, PART1, PRR9, ROPN1B, S100A1, SCL6A14, SLC13A2, STAC2, TCHH


### Korosec 2019 
* papillary: **FAP** (A ~ C3), **CD26/DPP4** (A), **NTN1** (A), **SFRP2** (A1 ~ A2 ~ A3), PDPN

* reticular: **CD90/THY1** (A1 ~ A4 ~ C3 ??), **ACTA2** (C1), **CD146/MCAM** (B1?), **CNN1** (D2?), **COL11A1** (C1), **FMO1** (B4 > B1 ~ B2), **PPARG** (B4 > B1), ADIPOQ, CD36, CEBPA


### Haydont 2019
* papillary: **APCDD1** (A2), **AXIN2** (A2), **COL10A1** (C3?), **COL23A1** (A2 ~ C5), **COL7A1** (A2 ~ C1), **CTSC** (A2 ~ B2 > B1), **DCN** (A1/A3/A4 > B4), **HSPB3** (A2), **IL15** (B3), **LOXL3** (A2?), **NPTX2** (A2), **PLXNC1** (A1 ~ A4), **PTGDS** (A2 ~ D1 ~ B2), **ROBO2** (A2 ~ C2), **SFRP2** (A1 ~ A2 ~ A3), **THBS2** (A), **TNFRSF19** (A2), **WNT5A** (C5), COLEC12, CSPG4, INHBB, NTF3, TGFB2

* reticular: **A2M** (C1 ~ D1 ~ D2 ~ E1), **ACAN** (C1), **ADAMTSL1** (A1), **ANGPTL1** (A), **CCL2** (B1 ~ B3), **COL11A1** (C1), **COL14A1** (A ~ C3), **COMP** (A2 ~ C3), **CXCL1** (B1 > B3), **DIRAS3** (A2 ~ C2), **EFEMP1** (B4), **FGF7** (B), **IGF1** (B4 ~ E1), **MFAP5** (A1 ~ A4 ~ C1), **MGP** (A1 ~ B4), **PCOLCE2** (A4 > A1), **PCSK5** (A1/A3/A4), **PPP1R14A** (C1 > C2), **SFRP4** (A4 > D1 ~ D2), **SPON1** (A2 ~ C ~ E1), **STEAP1** (A1 ~ A3), **TAGLN** (D2), BMP6, CDH2, CRLF1, DNER, GPC4, GPER/GPER1, GRB14, MAP1B, TGM2, TMEM200A 


### Haydont 2020
* papillary: **CADM1** (C2), **TOX** (C1 ~ C3), EFHD1, UCP2

* reticular: **ACAN** (C1), **COL11A1** (C1), **DIRAS3** (A2 ~ C2), **FGF9** (C2?), **LIMCH1** (C2), **MGST1** (A1 ~ B4), **SOX11** (C5 > C1 ~ C3), **VCAM1** (B3), EMCN, NPR3, SOST