# 2. Understanding multicellular programs

### Loading data

ReCoN generally uses single-cell RNA-seq data and/or single-cell ATAC-seq data.
```{tip}
The data for this tutorial can be downloaded together at _____.
```

In [21]:
cd /pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN/

/pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN


In [22]:
pip uninstall recon -y

[0mNote: you may need to restart the kernel to use updated packages.


In [24]:
pip install .[grn-lite]

Processing /pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting hummuspy@ git+https://github.com/cantinilab/hummus@dask_update#subdirectory=hummuspy (from recon==0.1.0)
  Cloning https://github.com/cantinilab/hummus (to revision dask_update) to /local/scratch/tmp/pip-install-qn35djbp/hummuspy_8028b1ad78414a96942a6c9aea56edb3
  Running command git clone --filter=blob:none --quiet https://github.com/cantinilab/hummus /local/scratch/tmp/pip-install-qn35djbp/hummuspy_8028b1ad78414a96942a6c9aea56edb3
  Running command git checkout -b dask_update --track origin/dask_update
  Switched to a new branch 'dask_update'
  branch 'dask_update' set up to track 'origin/dask_update'.
  Resolved https://github.com/cantinilab/hummus to commit 8dff0c239882d752d9c131d208436d6b30bef717
  Installing build dependencies ... [?25ldone


In [35]:
pip install gseapy

Collecting gseapy
  Downloading gseapy-1.1.11-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (11 kB)
Downloading gseapy-1.1.11-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (605 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m605.7/605.7 kB[0m [31m36.9 MB/s[0m  [33m0:00:00[0m
[?25hInstalling collected packages: gseapy
Successfully installed gseapy-1.1.11
Note: you may need to restart the kernel to use updated packages.


In [25]:
import numpy as np
import scanpy as sc  # single cell data
import pandas as pd  # data manipulation
import liana as li  # cell communication
import recon  # multilayer and perturbation prediction
import recon.data

In [26]:
rna = sc.read_h5ad("./data/perturbation_tuto/rna.h5ad")

Let's check what cell types are present in this dataset

In [27]:
rna.obs["celltype"].unique().tolist()[:5]

['B_cell', 'ILC', 'Macrophage', 'MigDC', 'Monocyte']

## Create ReCoN's multilayer network

### Importing GRNs

You can either generate GRNs direclty with ReCoN or import a previously generated one.<br>

```{tip}
If you wish to generate it directly with ReCoN, please follow the turorial [________]_______.
```

```{warning}
You'll then require a python=3.10 conda environment, cf [Installation]_____.
```




In [29]:
grn_path = "./data/perturbation_tuto/grn.csv"
grn = pd.read_csv(grn_path, index_col=0)
grn = grn.sort_values(by="weight", ascending=False)[:500_000]
grn["source"] = grn["source"].str.capitalize()
grn["target"] = grn["target"].str.capitalize()
grn.head(3)

Unnamed: 0,source,target,weight
0,Pax5,Mbd1,9.5e-05
1,Pax5,Smad1,9.2e-05
2,Pax5,Smad5,9.2e-05


### Computing cell communication

The cell-cell communication is inferred through LIANA+, an external pakage dedicatedto this task

```{tip}
For information, you can check LIANA+ documentation: https://liana-py.readthedocs.io/en/latest/
```


In [30]:
li.method.cellphonedb(rna, 
            # NOTE by default the resource uses HUMAN gene symbols
            resource_name="mouseconsensus",
            expr_prop=0.00,
            use_raw=False,
            groupby="celltype",
            verbose=True, key_added='cpdb_res')


Using resource `mouseconsensus`.
Using `.X`!
15364 features of mat are empty, they will be removed.
Make sure that normalized counts are passed!
0.36 of entities in the resource are missing from the data.


Generating ligand-receptor stats for 1296 samples and 937 features


100%|██████████| 1000/1000 [00:01<00:00, 627.52it/s]


```{warning}
ReCoN simply requires to rename the columns of the output dataframe of LIANA.
```

We rename ligand and receptors as 'source' and 'ligands', connected cell types as 'celltype_source' and 'celltype_target', and the scores as 'weight'.


In [31]:
ccc_network = rna.uns["cpdb_res"].copy()
ccc_network = ccc_network[["ligand", "receptor", "lr_means", "source", "target"]]
ccc_network = ccc_network.rename(columns={
    "lr_means": "weight",
    "source": "celltype_source",
    "target": "celltype_target",
    "ligand": "source",
    "receptor": "target"
})
ccc_network = ccc_network[ccc_network['weight'] != 0]

In [32]:

ccc_network.head(3)

Unnamed: 0,source,target,weight,celltype_source,celltype_target
406685,App,Cd74,102.485008,cDC2,cDC1
405645,Copa,Cd74,102.370003,cDC1,cDC1
410237,Copa,Cd74,102.366211,eTAC,cDC1


### Add receptor & receptor - target genes informations

In [33]:
receptor_genes = recon.data.load_data.load_receptor_genes("mouse_receptor_gene_from_NichenetPKN")

genes = np.unique(grn['source'].tolist() + grn['target'].tolist())
receptor_genes = receptor_genes[receptor_genes['target'].isin(genes)]
receptor_genes.head()

Unnamed: 0,source,target,weight
2,A1bg,Abca1,0.005156
3,A1bg,Abcb1a,0.005877
4,A1bg,Abcb1b,0.005877
7,A1bg,Acsl1,0.005915
8,A1bg,Adk,0.005092


## Case 1: Multicellular program around a cell-type-specific pathway

If you have a pathway of interest in a given cell type, you can predict the reaction of the other cell types t this activation.
You can also look "upstream", to predict what and how the environemnt triggered the activation of your pathway of interest.
 
```{tip}
You can check the cardiac fibrosis showcase f our manuscript for an exende example :) []____
```

Lets start by choosing a gene set (e.g. a hallmark about TNF alpha activation), and a cell type hypothetically expressing it (e.g. macrophages)

In [39]:
import gseapy as gp
from gseapy import Msigdb 

# we can use gseapy to download the hallmarks from MSigDB
msig = Msigdb()
hallmarks = msig.get_gmt(category='mh.all', dbver="2024.1.Mm")
print(list(hallmarks.keys())[:5])

# Let's pick a hallmark gene set as seeds for our perturbation analysis,
# and filter it to keep only genes present in our network
gene_seeds = [gene for gene in hallmarks['HALLMARK_TNFA_SIGNALING_VIA_NFKB'] if gene in genes]

# Create a dictionary with seed genes as keys and initial score 1, since all are equally important here
gene_seeds = {seed:1 for seed in gene_seeds}

['HALLMARK_ADIPOGENESIS', 'HALLMARK_ALLOGRAFT_REJECTION', 'HALLMARK_ANDROGEN_RESPONSE', 'HALLMARK_ANGIOGENESIS', 'HALLMARK_APICAL_JUNCTION']


In [40]:
len(gene_seeds)

157

We now, add the cell type information to these genes. Since here, all should be expressed in macrophage, we simply add '-macrophage' to all of them.

In [41]:
seeds = {f"{gene}::Macrophage": score for gene, score in gene_seeds.items() if score > 0}

print({k: seeds[k] for k in list(seeds.keys())[:5]})  # print first 5 items

{'Abca1::Macrophage': 1, 'Atf3::Macrophage': 1, 'Atp2b1::Macrophage': 1, 'B4galt1::Macrophage': 1, 'B4galt5::Macrophage': 1}


```{warning}
In ReCoN, molecules are named differently depending on the layer they belong to.
In intracellular layers, molecules (e.g. TFs, target genes, receptors) are named by their gene symbol and cell type, separated by 2 double points (e.g. "Nfkb1::Macrophage").
In extracellular layers, molecules (e.g. ligands, receptors) are named by their gene symbol and cell type, separated by a hyphen (e.g. "CD40-Macrophage").
```

```{note}
You can of course use your own gene sets as seeds, for example from your own differential expression
analysis. If you do so, make sure to filter the genes to keep only those present in your network.
Additionally, you can assign different scores to different genes (e.g. based on logFC or confidence).
⚠️ All gene weights must be positive !
```

## Assemble the multicellular network

There is many modifiable parameters:

In [42]:
cell_communication_graph_directed = False
cell_communication_graph_weighted = True
restart_proba = 0.6
ccc_proba = 0.5
grn_graph_weighted = True
grn_graph_directed = False

receptor_layer = recon.layers.ReceptorLayer.from_receptor_genes(
    receptor_genes,
    directed=grn_graph_directed,
    weighted=grn_graph_weighted
)



In [43]:
receptor_layer = pd.DataFrame(
    {"source": receptor_genes['source'].unique()+'_receptor',
    "target": ["fake_receptor" for r in range(len(receptor_genes['source'].unique()))]
    }
)

In [44]:
import recon.explore

In [45]:
celltypes=["B_cell", "pDC", "Macrophage", "NK_cell", "T_cell_CD4", "T_cell_CD8"]    # list of cell types to include in the analysis

generic_multicell = recon.explore.Multicell(
    celltypes = {celltype: recon.explore.Celltype(
#        receptor_graph = receptor_layer,
        grn_graph = grn,
        receptor_grn_bipartite = receptor_genes,
        celltype_name = celltype,
        receptor_graph_directed=False,
        receptor_graph_weighted=False,
        grn_graph_directed=grn_graph_directed,
        grn_graph_weighted=grn_graph_weighted,
        receptor_grn_bipartite_graph_directed=False,
        receptor_grn_bipartite_graph_weighted=True,
        seeds = seeds)  # we can either pass a dictionary of Celltype objects, or build them on the fly
        for celltype in celltypes},
    cell_communication_graph = ccc_network.iloc[ccc_network["celltype_source"].isin(celltypes).values & ccc_network["celltype_target"].isin(celltypes).values, :],
    cell_communication_graph_directed=cell_communication_graph_directed,
    cell_communication_graph_weighted=cell_communication_graph_weighted,
    # bipartite parameters can be -1, 0, 1 here
    bipartite_grn_cell_communication_directed=False,
    bipartite_grn_cell_communication_weighted=False,
    bipartite_cell_communication_receptor_directed=False,
    bipartite_cell_communication_receptor_weighted=False,
    seeds = seeds,
)

                No receptor_graph provided,
                an empty receptor graph will be created.
                
The keys of the dictionary will be the celltype names.


Now, we need to precise the kind of exploration we want to do : upstream/downstream of gene set activation, and intracellular only or extracellular.

In [46]:
generic_multicell.lamb = recon.explore.set_lambda(
    generic_multicell,
    direction="upstream",
    strategy="intercell",
)

```{tip}
Alternatively, you can modify lambda transition probabilities freely, to modulate GRN and CCC exploration
```

In [47]:
# layers = generic_multicell.lamb.index
# is_grn = layers.str.endswith("_grn")
# generic_multicell.lamb.loc[is_grn, is_grn] = generic_multicell.lamb.loc[is_grn, is_grn]
# generic_multicell.lamb.loc[is_grn, "cell_communication"] = generic_multicell.lamb.loc[is_grn, "cell_communication"]*1
generic_multicell.lamb

Unnamed: 0,cell_communication,B_cell_receptor,B_cell_grn,pDC_receptor,pDC_grn,Macrophage_receptor,Macrophage_grn,NK_cell_receptor,NK_cell_grn,T_cell_CD4_receptor,T_cell_CD4_grn,T_cell_CD8_receptor,T_cell_CD8_grn
cell_communication,0.142857,0.0,0.142857,0.0,0.142857,0.0,0.142857,0.0,0.142857,0.0,0.142857,0.0,0.142857
B_cell_receptor,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B_cell_grn,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
pDC_receptor,0.5,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
pDC_grn,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Macrophage_receptor,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Macrophage_grn,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0
NK_cell_receptor,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
NK_cell_grn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0
T_cell_CD4_receptor,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0


### We now run the analysis through random walks with restart

In [49]:
# Create multiXrank object
multilayer = generic_multicell.Multixrank(
    restart_proba=restart_proba
)

# Run random walk with restart
results = multilayer.random_walk_rank()

Seeds are provided as a dictionary with weights per seed.
Creating a multixrank object with seeds as a dictionary.
cell_communication
receptor
gene
receptor
gene
receptor
gene
receptor
gene
receptor
gene
receptor
gene
Identifying produced ligands in response to the perturbation.


In [50]:
results

Unnamed: 0,multiplex,node,layer,score
0,cell_communication,Abca1-B_cell,cell_communication,1.100054e-06
1,cell_communication,Abca1-Macrophage,cell_communication,1.607757e-05
2,cell_communication,Abca1-NK_cell,cell_communication,9.655072e-07
3,cell_communication,Abca1-T_cell_CD4,cell_communication,7.274640e-07
4,cell_communication,Abca1-T_cell_CD8,cell_communication,9.304301e-07
...,...,...,...,...
10163,T_cell_CD8_grn,Zxdc::T_cell_CD8,gene,4.349167e-11
10164,T_cell_CD8_grn,Zyg11b::T_cell_CD8,gene,3.027501e-11
10165,T_cell_CD8_grn,Zyx::T_cell_CD8,gene,7.329956e-11
10166,T_cell_CD8_grn,Zzef1::T_cell_CD8,gene,2.635670e-10


In [52]:
# Format results as gene profiles per cell type
cell_type_profiles = recon.explore.format_multicell_results(
    multicell_multixrank_results=results,
    celltypes=celltypes,
    keep_layers="gene"
)

cell_type_profiles.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


celltype,B_cell,Macrophage,NK_cell,T_cell_CD4,T_cell_CD8,pDC
gene,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A3galt2,2.018279e-12,2.740803e-08,1.228604e-11,5.891528e-12,6.195689e-13,8.664049e-13
A4galt,1.387521e-11,1.88082e-07,1.472483e-11,6.816724e-12,6.870771e-12,1.111332e-11
Aa467197,9.843796e-12,1.202237e-06,4.097082e-11,2.940558e-11,2.343176e-11,1.974579e-11
Aaas,2.592065e-11,1.515622e-06,5.04568e-11,4.397842e-11,3.124886e-11,6.148052e-11
Aacs,9.098847e-11,9.323522e-06,2.129388e-10,1.335567e-10,1.056153e-10,1.56025e-10
