# grn - Inferring GRN through ReCoN - HuMMuS

⚠️ Checking that bedtools is available

In [1]:
!bedtools

bedtools is a powerful toolset for genome arithmetic.

Version:   v2.31.1
About:     developed in the quinlanlab.org and by many contributors worldwide.
Docs:      http://bedtools.readthedocs.io/
Code:      https://github.com/arq5x/bedtools2
Mail:      https://groups.google.com/forum/#!forum/bedtools-discuss

Usage:     bedtools <subcommand> [options]

The bedtools sub-commands include:

[ Genome arithmetic ]
    intersect     Find overlapping intervals in various ways.
    window        Find overlapping intervals within a window around an interval.
    closest       Find the closest, potentially non-overlapping interval.
    coverage      Compute the coverage over defined intervals.
    map           Apply a function to a column for each overlapping interval.
    genomecov     Compute the coverage over an entire genome.
    merge         Combine overlapping/nearby intervals into a single interval.
    cluster       Cluster (but don't merge) overlapping/nearby intervals.
    complement

In [2]:
import hummuspy.loader
import circe as ci
import recon

In [3]:
import muon as mu
import scanpy as sc

In [4]:
# import data
mudata = mu.read_h5mu("circe_benchmark/data/datasets/pbmc10x/pbmc10x.h5mu")

In [5]:
rna = mudata.mod['rna'][:, :1000]
atac = mudata.mod['atac'][:, :5000]

In [6]:
tfs_list = hummuspy.loader.load_tfs("human_tfs_r_hummus")

In [7]:
rna_network = recon.infer_grn.compute_rna_network(rna, tf_names=tfs_list, n_cpu=15)

Calculating TF-to-gene importance


Running using 15 cores:   0%|          | 0/1000 [00:00<?, ?it/s]

Running using 15 cores: 100%|██████████| 1000/1000 [00:05<00:00, 195.65it/s]


In [None]:
atac.var_names = atac.var_names.str.replace("-", "_")
atac.var_names = atac.var_names.str.replace(":", "_")

atac = ci.add_region_infos(atac)

ci.compute_atac_network(atac, njobs=20)

atac_network = ci.extract_atac_links(atac, key='atac_network', )
atac_network = atac_network.rename(columns={
    "Peak1": "source",
    "Peak2": "target",
    "score": "weight"
})

Output()

Output()

Calculating co-accessibility scores...
Concatenating results from all chromosomes...


In [None]:
tf_network = recon.infer_grn.compute_tf_network(rna, tfs_list=tfs_list)

In [24]:
tf_network.head()

Unnamed: 0,source,target
0,TFAP2E_TF,fake_TF
1,HES2_TF,fake_TF
2,ID3_TF,fake_TF
3,TAL1_TF,fake_TF
4,HES4_TF,fake_TF


In [None]:
atac_rna_links = recon.infer_grn.compute_atac_to_rna_links(atac, rna, ref_genome="hg19")

In [30]:
tf_atac_links = recon.infer_grn.compute_tf_to_atac_links(atac, ref_genome="hg19", n_cpus=30)

Peaks before filtering:  5000
Peaks with invalid chr_name:  0
Peaks with invalid length:  0
Peaks after filtering:  5000
No motif data entered. Loading default motifs for your species ...
 Default motif for vertebrate: gimme.vertebrate.v5.0. 
 For more information, please see https://gimmemotifs.readthedocs.io/en/master/overview.html 

Initiating scanner... 



2025-09-24 10:46:00,616 - DEBUG - using background: genome hg19 with size 200


Calculating FPR-based threshold. This step may take substantial time when you load a new ref-genome. It will be done quicker on the second time. 

Motif scan started .. It may take long time.



Scanning:   0%|          | 0/5000 [00:00<?, ? sequences/s]

Filtering finished: 702652 -> 156632
1. Converting scanned results into one-hot encoded dataframe.


  0%|          | 0/4962 [00:00<?, ?it/s]

2. Converting results into dictionaries.


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1093 [00:00<?, ?it/s]

In [33]:
tf_atac_links = tf_atac_links[tf_atac_links['source'].str.replace("_TF", "").isin(tfs_list)]

In [34]:
tf_atac_links

Unnamed: 0,source,target
0,NEUROD1_TF,chr1_1000608_1001108
2,BHLHA15_TF,chr1_1000608_1001108
8,ATOH1_TF,chr1_1000608_1001108
9,OLIG2_TF,chr1_1000608_1001108
11,PTF1A_TF,chr1_1000608_1001108
...,...,...
2999838,SP7_TF,chr1_999910_1000410
2999842,SP5_TF,chr1_999910_1000410
2999843,SP8_TF,chr1_999910_1000410
2999848,SP6_TF,chr1_999910_1000410


In [47]:
grn = recon.infer_grn.generate_grn(
    rna_network=rna_network,
    atac_network=atac_network,
    tf_network=tf_network,
    atac_to_rna_links=atac_rna_links,
    tf_to_atac_links=tf_atac_links,
    n_jobs=20
)
grn = grn[grn.layer == 'RNA']

TF_0
ATAC_0
RNA_0
Initializing Dask cluster with 20 workers...
http://192.168.152.117:8787/status


Running RWR per seed: 100%|██████████| 29/29 [00:02<00:00, 13.07it/s]


Building Dask DataFrame graph with delayed parallel tasks...


In [48]:
grn

Unnamed: 0,index,layer,target,path_layer,score,seed
3639,883,RNA,TMEM35B,RNA_0,1.130846e-04,PAX7_TF
3645,440,RNA,LINC01355,RNA_0,1.130419e-04,PAX7_TF
996,658,RNA,PLEKHG5,RNA_0,1.030718e-04,JUN_TF
3657,139,RNA,CELA2B,RNA_0,7.939168e-05,ZNF684_TF
3664,6,RNA,ACOT7,RNA_0,7.936830e-05,ZNF684_TF
...,...,...,...,...,...,...
3317,856,RNA,TAS1R1,RNA_0,6.172440e-08,PAX7_TF
3431,596,RNA,NOL9,RNA_0,5.801554e-08,FOXO6_TF
3432,856,RNA,TAS1R1,RNA_0,5.801554e-08,FOXO6_TF
3439,596,RNA,NOL9,RNA_0,5.736415e-08,ZNF684_TF


In [49]:
grn.to_csv("recon_hummus_grn.csv", index=False)

This work uses **ReCoN**’s wrapper of different packages, which are all cited in the Methods section:

- The **RNA layer** uses [**arboreto¹**](#arboreto)  
- The **ATAC layer** uses [**CIRCE²**](#circe)  
- The **TF-ATAC** and **ATAC-RNA bipartites** use [**CellOracle³**](#celloracle)  
- The exploration of the **multilayer** uses [**HuMMuS⁴**](#hummus) and [**MultiXrank⁵**](#multixrank)  

---

### References
<a id="arboreto"></a>
[1] Moerman, T., Aibar, S., Bravo González-Blas, C., Simm, J., Moreau, Y., Aerts, J., & Aerts, S. (2019).  
**GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks.**  
*Bioinformatics*, 35(12), 2159–2161. doi:[10.1093/bioinformatics/bty916](https://doi.org/10.1093/bioinformatics/bty916)

<a id="circe"></a>
[2] Trimbour, R., Saez Rodriguez J., Cantini L. (2025).
**Circe: Co-accessibility network from ATAC-seq data in Python (software).**  
Version 0.3.6. [GitHub](https://github.com/cantinilab/Circe)

<a id="celloracle"></a>
[3] Kamimoto, K., Stringa, B., Hoffmann, C. M., Jindal, K., Solnica-Krezel, L., Morris, S. A., et al. (2023).  
**Dissecting cell identity via network inference and in silico gene perturbation.**  
*Nature*, 614, 742–751. doi:[10.1038/s41586-022-05688-9](https://doi.org/10.1038/s41586-022-05688-9)

<a id="hummus"></a>
[4] Trimbour, R., Deutschmann I. M., Cantini L. (2024).  
**Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS.**  
*Bioinformatics*, 40(5), btae143. doi:[10.1093/bioinformatics/btae143](https://doi.org/10.1093/bioinformatics/btae143)

<a id="multixrank"></a>
[5] Baptista, A., González, A., & Baudot, A. (2022).  
**Universal multilayer network exploration by random walk with restart.**  
*Communications Physics*, 5, 170. doi:[10.1038/s42005-022-00937-9](https://doi.org/10.1038/s42005-022-00937-9)
