# PySCNet: A tool for reconstructing and analyzing gene regulatory network from single-cell RNA-Seq data

PySCNet includes four modules:

1) **Pro-precessing**: initialize a gnetData object consisting of Expression Matrix, Cell Attributes, Gene Attributes and Network Attributes; <br/>
2) **BuildNet**: reconstruct GRNs by various methods implemented in docker;<br/>
3) **NetEnrich**: network analysis including consensus network detection, gene module identification and trigger path prediction as well as network fusion;<br/>
4) **Visulization**: network illustration.<br/>

A python package - [STREAM](https://github.com/pinellolab/STREAM) was designed for reconstructing cell trajectory for single cell transcriptomic data. This tutorial guides how to integrate STREAM with pyscnet for gene regulatory network along the cell differential trajectory.


In [165]:
from __future__ import absolute_import
import warnings
warnings.filterwarnings("ignore")

import sys
import os
import itertools
import scanpy as sc

sc.settings.verbosity = 3             # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_versions()
sc.settings.set_figure_params(dpi=60, facecolor='white')

from pyvis.network import Network
import pandas as pd
import copy
import numpy as np
from pyscnet.Preprocessing import gnetdata
from pyscnet.BuildNet import gne_dockercaller as gdocker
from pyscnet.NetEnrich import graph_toolkit as gt
from pyscnet.Plotting import show_net as sn

scanpy==1.5.1 anndata==0.7.4 umap==0.4.6 numpy==1.19.1 scipy==1.5.1 pandas==1.0.5 scikit-learn==0.23.1 statsmodels==0.11.1


### Data resource
Data was obtained from [Nestorowa, S. et al](https://doi.org/10.1182/blood-2016-05-716480). The cell trajectory was build according to [STREAM Tutorial](https://nbviewer.jupyter.org/github/pinellolab/STREAM/blob/master/tutorial/1.1.STREAM_scRNA-seq%20%28Bifurcation%29.ipynb?flush_cache=true). As STREAM is also built on AnnData structure. It can be directly imported into pyscnet.

In [2]:
#the file is too large
# import _pickle as pk
# with open('data/stream_adata.pk', 'rb') as input:
#     adata = pk.load(input)

import stream as st
adata = st.read('data/stream_adata.pklz')


  from pandas.core.index import Index as PandasIndex


Working directory is already specified as './result' 
To change working directory, please run set_workdir(adata,workdir=new_directory)


In [3]:
#
adata

AnnData object with n_obs × n_vars = 1656 × 35077
    obs: 'label', 'label_color', 'n_counts', 'n_genes', 'pct_genes', 'pct_mt', 'kmeans', 'node', 'branch_id', 'branch_id_alias', 'branch_lam', 'branch_dist', 'S0_pseudotime', 'S3_pseudotime', 'S1_pseudotime', 'S2_pseudotime'
    var: 'n_counts', 'n_cells', 'pct_cells'
    uns: 'workdir', 'label_color', 'assay', 'var_genes', 'trans_se', 'params', 'epg', 'flat_tree', 'seed_epg', 'seed_flat_tree', 'ori_epg', 'epg_obj', 'ori_epg_obj', 'branch_id_alias_color', 'stream_S3', 'scaled_marker_expr', 'leaf_markers_all', 'leaf_markers', 'transition_markers', 'de_markers_greater', 'de_markers_less', 'markers_label_all', 'markers_label', 'pca', 'neighbors', 'umap', 'label_colors'
    obsm: 'var_genes', 'X_se', 'X_dr', 'X_spring', 'X_stream_S3', 'X_pca', 'X_umap'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'

In [6]:
stream_gne = gnetdata.load_from_scanpy(adata)

In [7]:
# adata.uns['leaf_markers'][('S1', 'S2')].index
cell_info = stream_gne.CellAttrs['CellInfo']
cell = list(cell_info.loc[cell_info['branch_id_alias'].isin(["('S2', 'S1')"])].index)

In [8]:
stream_gne = gdocker.rundocker(stream_gne.deepcopy, method='GENIE3', 
                               feature=adata.uns['leaf_markers'][('S1', 'S2')].index,
                              cell_clusterid="('S2', 'S1')", select_by='branch_id_alias')

stream_gne = gdocker.rundocker(stream_gne.deepcopy, method='PIDC', 
                               feature=adata.uns['leaf_markers'][('S1', 'S2')].index,
                              cell_clusterid="('S2', 'S1')", select_by='branch_id_alias')

stream_gne = gdocker.rundocker(stream_gne.deepcopy, method='CORR', 
                               feature=adata.uns['leaf_markers'][('S1', 'S2')].index,
                              cell_clusterid="('S2', 'S1')", select_by='branch_id_alias')

stream_gne = gdocker.rundocker(stream_gne.deepcopy, method='GRNBOOST2', 
                               feature=adata.uns['leaf_markers'][('S1', 'S2')].index,
                              cell_clusterid="('S2', 'S1')", select_by='branch_id_alias')

GENIE3_links added into NetAttrs
PIDC_links added into NetAttrs
CORR_links added into NetAttrs
GRNBOOST2_links added into NetAttrs


In [21]:
#find consensus links based on ensemble classification
stream_gne = gt.find_consensus_graph(stream_gne, method='ensemble', threshold=5)

#build graph for consensus links
stream_gne = gdocker.buildnet(stream_gne, key_links='consensus')

there are 111 consensus edges found!
graph added into NetAttrs


In [54]:
# stream_gne.NetAttrs['consensus']
