
# Gene Regulatory Network (GRN) inference using Single-Cell RNA (scRNA) data.
 
Before proceeding please follow the steps in  <a href="readme.html"> README </a>  file. <br> 
After this steps, you have now installed all the required packages of the specific versions in the conda environment named ATRACTionRHU4 <br>
To verify please make sure, the kernel on the right is ATRACTionRHU4
    
Now, check if the Seurat objects (.rds) are in <a href="./RDS_data/"> RDS_data </a> folder <br> 
  
### Import the packages and modules by running follwing cell


In [1]:
#import packages
import pandas as pd
from ipywidgets import *
from codes import form_extract_expression_mat as F
from codes import pyscenic_input as I
import os

Colour codes through out the jupyter book <br>
* <span style="color:orange"> <b> Warning  </b> </span> <br>
* <span style="color:purple">  Example  </span>  

     
### Run the following script to extract gene expression matrix and meta data table

Requirements: 

* R code:  path of the R script <span style="color:purple">  ( e.g. codes/Extract_expression_mats.R ) </span>
* RDS_file:  path of the RDS file  <span style="color:purple">   ( e.g. "RDS_data/00_final_IBD_exp3.rds" ) </span>
* Exp: the name/number of the experiment, additional info to create sub folders <span style="color:purple">  ( e.g. 3 )</span>

In [2]:
RDS_file = "RDS_data/00_final_IBD_exp3.rds"
Exp = 3
#!Rscript codes/Extract_expression_mats.R $RDS_file $Exp

<span style="color:orange"> <b> ** Copy and paste the path of Meta-data table and Gene-expession matrix files below ** </b></span>

In [3]:
metadata_file = "data/Exp3/00_final_IBD_exp3_metadata.csv.gz"
expdat_file = "data/Exp3/00_final_IBD_exp3_expdata.csv.gz"

In [4]:
#reading above files
MetaData = pd.read_csv(metadata_file,  compression='gzip', index_col = 0)
ExpDat = pd.read_csv(expdat_file,  compression='gzip', index_col = 0)

In [5]:
[Sample_column, Groups_column,Celltype_column, button] = F.A(MetaData)
box = widgets.HBox([Sample_column, Groups_column,Celltype_column,button])
display(box)

HBox(children=(Dropdown(description='Sample column:', options=('orig.ident', 'nCount_RNA', 'nFeature_RNA', 'pe…

### Select the column names of metadata from above dropdowns 

* Sample column: select the column name of the meta data table that has sample names (patient IDs) <span style="color:purple"> (e.g. SampleID) </span>
* Groups column: select the column name of the meta data table that has sample type names <span style="color:purple"> (e.g. group) </span>
* Cell type column : select the column name of the meta data table that has cell annotations which you want to consider to infer GRN <span style="color:purple">(e.g. cellType_curated) </span>

To save the changes click on OK button

In [6]:
print("The column with Sample IDs is ---" + Sample_column.value + "---")
print("The column with group name is ---" + Groups_column.value + "---")
print("The column with cell types is ---" + Celltype_column.value + "---")

The column with Sample IDs is ---SampleID---
The column with group name is ---group---
The column with cell types is ---cellType_curated---


<span style="color:orange"> <b>** press the control/command key to select multiple values ** </b></span>

In [7]:
[Sample_IDs, Sample_type, Cell_types, Process_cells,button] = F.B(MetaData, Sample_column.value, Groups_column.value,Celltype_column.value)
box = widgets.HBox([Sample_IDs, Sample_type,Cell_types,Process_cells,button])
display(box)

HBox(children=(SelectMultiple(description='Sample IDs:', options=('All', 'C20_59_01_P022', 'C20_59_01_P026', '…

### Select the column attributes and give instruction to process the cells from above dropdowns 

<span style="color:orange"> <b> For Sample IDs, Sample type and Cell types you can choose "All" to consider all samples from all conditions and all cell types </b> </span>

* Sample IDs: Select the sample/s of your interest (you can choose multiple options using control/command tab) <span style="color:purple"> (e.g. C20_59_01_P022) </span>
* Sample type: select the sample type/s of your interest (you can choose multiple options using control/command tab) <span style="color:purple"> (e.g. JIA) </span>
* Cell types: select the cell type/s of your interest (you can choose multiple options using control/command tab) <span style="color:purple"> (e.g. TCD4, TCD8, etc.) </span>
* Processing cells: 
    1. <span style="color:purple"> "combine" </span>, if you would like to merge all the selected samples and celltypes and extect one expression matrix
    2. <span style="color:purple"> "keep_seperate" </span>, if you want to extract gene expression matrix seperately for selected cell types


In [10]:
print("The selected sample names are ---" + str(list(Sample_IDs.value)) + "---")
print("The selected group names are ---" + str(list(Sample_type.value)) + "---")
print("The selected cell types are ---" + str(list(Cell_types.value)) + "---")
print("You choose to ---" + str(Process_cells.value) + "--- the cell types")

The selected sample names are ---['All']---
The selected group names are ---['All']---
The selected cell types are ---['NK']---
You choose to ---Combine--- the cell types


### Create Input files for infering GRN using pySCENIC

In [11]:
I.create_scenic_input(Sample_column, Groups_column,Celltype_column, Sample_IDs, Sample_type, Cell_types, Process_cells, ExpDat, MetaData, Exp)

The input files for pySECNIC are successfully created and saved in folder: data/Exp3


<span style="color:orange"> <b> Copy and paste the folder path of input files below </b> </span> <br>
<span style="color:purple"> e.g. "data/Exp1" </span>

In [12]:
FolderPath = "data/Exp3"

In [13]:
[file,button] = F.C(FolderPath)
box = widgets.HBox([file,button])
display(box)

HBox(children=(Dropdown(description='Choose file: ', options=('ExpDat_e3_gAll_sAll_cNK.csv.gz', 'ExpDat_e3_gIB…

<span style="color:orange"> <b> Select the file for which you want to infer GRN and click OK button</span> 

### Infer GRN

In [14]:
FILE = os.path.join(FolderPath, file.value)

In [15]:
FILE

'data/Exp3/ExpDat_e3_gAll_sAll_cNK.csv.gz'

In [None]:
import warnings
warnings.filterwarnings('ignore')

!bash codes/Infer_GRN_celltypes.sh $FILE

data/Exp3/ExpDat_e3_gAll_sAll_cNK
results/data/Exp3/ExpDat_e3_gAll_sAll_cNK_1_GRN.csv
Loaded expression matrix of 1066 cells and 23303 genes in 7.5971057415008545 seconds...
Loaded 1839 TFs...
starting grnboost2 using 40 processes...
  6%|██                                  | 1314/23303 [18:32<9:46:51,  1.60s/it]