# SCING Network Inference
This tutorial is optimized for building a SCING network on one cell type. SCING performs cell aggregation for averaged expression (supercells), bootstrapping and gradient boosting for intermediate GRN inference, and merging for aggregating consistent and robust edges across intermediate GRNs.

For workflows that build networks on multiple cell types in parallel, please refer to the <a href="#"> tutorial for multiple cell types</a>, which packages this notebook into scripts to be run on the command line or SGE high performce computing cluster.

In [1]:
import os
import warnings
import scanpy as sc
from scing import supercells, build, merge
import numpy as np

In [2]:
# Set number of threads to use
nthreads = 12
os.environ["MKL_NUM_THREADS"] = str(nthreads)
os.environ["NUMEXPR_NUM_THREADS"] = str(nthreads)
os.environ["OMP_NUM_THREADS"] = str(nthreads)
# filter warnings
warnings.filterwarnings("ignore")

In [3]:
adata = sc.read_h5ad('../data/microglia.h5ad')

In [4]:
adata

AnnData object with n_obs × n_vars = 4126 × 10159
    obs: 'Barcode', 'SampleID', 'Diagnosis', 'Batch', 'Cell.Type', 'cluster', 'Age', 'Sex', 'PMI', 'Tangle.Stage', 'Plaque.Stage', 'RIN'
    var: 'gene_ids', 'feature_types', 'genome'

## Supercells
This step performs scanpy preprocessing and leiden clustering on each individual cell type. Genes in the cells within a subcluster are averaged to mitigate gene sparsity.
- `ngenes`: number of highly variable genes to use to determine subclusters
- `npcs`: number of PCs for preprocessing
- `ncell`: number of supercells to generate

Note:  In this tutorial, we only have microglia, so we can supercell the entire file. For datasets with multiple cell types, run the supercell pipeline on the cell type-subsetted adata.

In [5]:
adata_merged = supercells.supercell_pipeline(adata,
                                ngenes=2000,
                                npcs=20,
                                ncell=500,
                                verbose=True)

preprocessing data...
finding optimal resolution...
There are  496  supercells
merging cells...


We will subset our supercells to 2000 random genes for the sake of computation in this tutorial. On your own dataset, keep all genes.

In [6]:
# limit genes to 2000 random genes
# this is to speed up computation for the example
np.random.seed(0)
adata_merged = adata_merged[:,np.random.choice(np.arange(adata_merged.shape[1]),
                                       2000,
                                       replace=False)]

In [7]:
adata_merged

View of AnnData object with n_obs × n_vars = 496 × 2000
    var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'

# Build Intermediate Networks
This step builds intermediate networks through bootstrap subsampling and gradient boosting. The grnBuilder class takes the following arguments:
- `adata`: supercell adata object
- `ngenes`: number of highly variable genes to build network on. -1 means use all genes.
- `nneighbors`: number of nearest neighbor (KNN) gene predictor features for each target gene. These feature genes are used to train a gradient boosting regressor to predict target gene expression.
- `npcs`: number of PCs: number of PCs to use for KNN calculation
- `subsample_perc`: percentage of supercells to subsample for each bootstrap run.
- `prefix`: output prefix
- `outdir`: output directory (the networks will be saved as `{outdir}/{prefix}.csv.gz`)
- `ncore`: number of cores for multiprocessing
- `mem_per_core`: memory per core

In [8]:
save = True
all_edges = [] # list for intermediate networks
for i in range(10):
    print(i)
    # adata_saved = adata_merged.copy()
    
    grn = build.grnBuilder(adata=adata_merged, 
                           ngenes=-1, 
                           nneighbors=100,
                           npcs=10,
                           subsample_perc=0.7,
                           prefix=f"net.{i}",
                           outdir='../temp_data/intermediate_networks/',
                           ncore=12,
                           mem_per_core=int(2e9),
                           verbose=True)
    grn.subsample_cells()

    grn.filter_genes()
    grn.filter_gene_connectivities()
    grn.build_grn()
    if save:
        grn.save_edges()
    # append intermediate network
    all_edges.append(grn.edges)

0


building local cluster


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Loading data into memory...
Building dask graph...
Computing dask graph...
Closing client...
Saving file in ../temp_data/intermediate_networks/net.0.csv.gz
1
building local cluster


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Loading data into memory...
Building dask graph...
Computing dask graph...
Closing client...
Saving file in ../temp_data/intermediate_networks/net.1.csv.gz
2
building local cluster


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Loading data into memory...
Building dask graph...
Computing dask graph...
Closing client...
Saving file in ../temp_data/intermediate_networks/net.2.csv.gz
3
building local cluster


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Loading data into memory...
Building dask graph...
Computing dask graph...
Closing client...
Saving file in ../temp_data/intermediate_networks/net.3.csv.gz
4
building local cluster


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Loading data into memory...
Building dask graph...
Computing dask graph...
Closing client...
Saving file in ../temp_data/intermediate_networks/net.4.csv.gz
5
building local cluster


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Loading data into memory...
Building dask graph...
Computing dask graph...
Closing client...
Saving file in ../temp_data/intermediate_networks/net.5.csv.gz
6
building local cluster


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Loading data into memory...
Building dask graph...
Computing dask graph...
Closing client...
Saving file in ../temp_data/intermediate_networks/net.6.csv.gz
7
building local cluster


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Loading data into memory...
Building dask graph...
Computing dask graph...
Closing client...
Saving file in ../temp_data/intermediate_networks/net.7.csv.gz
8
building local cluster


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Loading data into memory...
Building dask graph...
Computing dask graph...
Closing client...
Saving file in ../temp_data/intermediate_networks/net.8.csv.gz
9
building local cluster


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Loading data into memory...
Building dask graph...
Computing dask graph...
Closing client...
Saving file in ../temp_data/intermediate_networks/net.9.csv.gz


# Merge Intermediate Networks
This step combines the bootstrapped networks and filters inconsistent and confounding edges. The `NetworkMerger` class contains the following arguments:
- `adata`: supercells adata object
- `networks`: list(pd.DataFrame) of intermediate networks
- `minimum_edge_appearance_threshold`: consistency threshold for an edge to be kept in the final network. (0.2 means an edge must be in at least 20% of the intermediate networks)
- `prefix`: output prefix
- `outdir`: output directory (the final network will be saved as `{outdir}/{prefix}.network.merged.csv`)
- `ncore`: number of cores for multiprocessing
- `mem_per_core`: memory per core

In [9]:
merger = merge.NetworkMerger(adata=adata_merged,
                             networks=all_edges,
                             minimum_edge_appearance_threshold=0.2,
                             prefix='final',
                             outdir='../temp_data/final_network',
                             ncore=12,
                             mem_per_core=int(2e9),
                             verbose=True)

In [10]:
merger.preprocess_network_files()
merger.remove_reversed_edges()
merger.remove_cycles()
merger.get_triads()
merger.remove_redundant_edges()

Preprocessing in network files...
Summarizing networks...
Removing bidirectional edges...
Removing cycles...
Getting triads to remove redundant edges...
Removing redundant edges...


Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid st

Creating client...
Loading data...
Building dask graph...
Computing dask graph...
Removing edges...


In [11]:
merger.save_network()

Saving data to ../temp_data/final_network/


In [12]:
merger.edge_df.sort_values(by='importance',
                          ascending=False)

Unnamed: 0,source,target,importance
10,AC091551.1,MAGEF1,266.259045
1511,STAC3,MRPS30,243.979339
800,TSPAN31,ADCY10,215.426775
1681,AC073861.1,RPL32,207.363622
444,RPL32,RPL23,204.892652
...,...,...,...
10293,RASL10A,ADRB2,11.377648
743,TTPAL,CANT1,11.372547
9273,PTPRJ,AKAP13,11.363624
10532,ZRANB1,FAM111A,11.272955
