# Example 01 - GRNBoost2 local

In this example notebook, we illustrate a basic usage scenario where we will 
infer the gene regulatory network from a single dataset on the local machine.

In [1]:
import os
import pandas as pd

from arboretum.algo import grnboost2, genie3
from arboretum.utils import load_tf_names

## 1. Load the input data

* We use the [Pandas](http://pandas.pydata.org/) library to read the data from a tab-separated text file.
* Arboretum expects the `expression_data` matrix to have observations as rows and genes as columns.

In [2]:
wd = os.getcwd().split('arboretum')[0] + 'arboretum/resources/dream5/'

net1_ex_path = wd + 'net1/net1_expression_data.tsv'
net1_tf_path = wd + 'net1/net1_transcription_factors.tsv'

In [3]:
ex_matrix = pd.read_csv(net1_ex_path, sep='\t')

* Let's quickly check the the input matrix by inspecting its shape and top 5 rows.

In [4]:
ex_matrix.shape

(805, 1643)

In [5]:
ex_matrix.head()

Unnamed: 0,G1,G2,G3,G4,G5,G6,G7,G8,G9,G10,...,G1634,G1635,G1636,G1637,G1638,G1639,G1640,G1641,G1642,G1643
0,0.425448,0.017829,0.907989,0.448247,0.172324,0.273489,0.843766,0.648201,1.004533,0.365305,...,0.011979,0.963306,1.16987,0.331381,0.3506,0.822844,0.304483,0.319917,0.36428,0.765945
1,0.4424,0.050525,0.869368,0.445851,0.173311,0.274889,0.764049,0.74787,1.022589,0.434106,...,0.022247,1.014137,0.888465,0.281649,0.48594,0.915617,0.317507,0.238074,0.50913,0.691403
2,1.056847,0.208454,0.467448,0.505077,0.244883,0.208451,0.665355,1.192092,0.824068,0.146987,...,0.422066,0.895203,1.028826,0.825126,0.444819,0.349069,0.04231,0.165208,0.952178,0.678781
3,1.117226,0.003001,0.317654,0.387204,0.253792,0.17936,0.939244,0.868668,0.963028,0.233785,...,0.001163,1.04654,1.058098,0.484225,0.150689,0.449126,0.125197,4.7e-05,0.878127,0.566691
4,0.971068,0.001056,0.354651,0.474532,0.207718,0.102833,0.745871,0.909753,1.151865,0.318988,...,0.000845,1.041745,1.061129,0.384363,0.326859,0.51227,0.26141,0.000156,0.883981,0.646715


* We load the transcription factor (TF) list from a file using the `load_tf_names` utility function which simply reads a file line per line where every line contains one TF name.

In [6]:
tf_names = load_tf_names(net1_tf_path)

* Some quick inspections

In [7]:
tf_names[:5]

['G1', 'G2', 'G3', 'G4', 'G5']

In [8]:
len(tf_names)

195

## 2. Launch gene regulatory network inference

In [9]:
%%time
network = grnboost2(expression_data=ex_matrix,
                    tf_names=tf_names)

CPU times: user 31.2 s, sys: 15.5 s, total: 46.7 s
Wall time: 51.4 s


In [10]:
network.head()

Unnamed: 0,TF,target,importance
108,G109,G1406,135.832105
15,G16,G1440,132.702686
15,G16,G687,119.453194
187,G188,G938,119.043848
9,G10,G1312,117.662291


In [11]:
len(network)

318847

## 3. Write the GRN link list to file `[TF, target, importance]`.

In [12]:
network.to_csv('ex_01_network.tsv', sep='\t', header=False, index=False)