In [1]:
import genie3 as g

In [2]:
data = g.loadtxt('data.txt',skiprows=1)
f = open('data.txt')
gene_names = f.readline()
f.close()
gene_names = gene_names.rstrip('\n').split('\t')

### Run GENIE3 with its default parameters

The only mandatory input argument to the function GENIE3() is the expression matrix.

GENIE3() returns an array VIM containing the scores of the putative regula- tory links. VIM(i,j) is the weight of the link directed from the i-th gene to j-th gene.

In [3]:
VIM = g.GENIE3(data)

Tree method: RF
K: sqrt
Number of trees: 1000


running single threaded jobs
Gene 1/10...
Gene 2/10...
Gene 3/10...
Gene 4/10...
Gene 5/10...
Gene 6/10...
Gene 7/10...
Gene 8/10...
Gene 9/10...
Gene 10/10...
Elapsed time: 4.36 seconds


Restrict the candidate regulators to a subset of genes

In [4]:
# Genes that are used as candidate regulators
regulators = ['CD19', 'CDH17','RAD51','OSR2','TBX3']
VIM2 = g.GENIE3(data,gene_names=gene_names,regulators=regulators)

Tree method: RF
K: sqrt
Number of trees: 1000


running single threaded jobs
Gene 1/10...
Gene 2/10...
Gene 3/10...
Gene 4/10...
Gene 5/10...
Gene 6/10...
Gene 7/10...
Gene 8/10...
Gene 9/10...
Gene 10/10...
Elapsed time: 4.00 seconds


Change the tree-based method and its settings

In [6]:
# Use Extra-Trees method
tree_method='ET'
# Number of randomly chosen candidate regulators at each node of a tree 
K=7
# Number of trees per ensemble
ntrees = 50
# Run the method with these settings
VIM3 = g.GENIE3(data,tree_method=tree_method,K=K,ntrees=ntrees)

Tree method: ET
K: 7
Number of trees: 50


running single threaded jobs
Gene 1/10...
Gene 2/10...
Gene 3/10...
Gene 4/10...
Gene 5/10...
Gene 6/10...
Gene 7/10...
Gene 8/10...
Gene 9/10...
Gene 10/10...
Elapsed time: 0.17 seconds


### Get the predicted ranking of all the regulatory links

Each line corresponds to a regulatory link. The first column shows the regulator, the second column shows the target gene, and the last column indicates the score of the link.

If the gene names are not provided, the i-th gene is named "Gi".

Note that the ranking that is obtained will be slightly different from one run to another. This is due to the intrinsic randomness of the Random Forest and Extra-Trees methods. The variance of the ranking can be decreased by increas- ing the number of trees per ensemble.

**Important note on the interpretation of the scores:** The weights of the links returned by GENIE3() do not have any statistical meaning and only provide a way to rank the regulatory links. There is therefore no standard threshold value, and caution must be taken when choosing one.


In [8]:
g.get_link_list(VIM)

G1	G5	0.515879
G5	G1	0.515484
G6	G8	0.385897
G8	G6	0.367422
G9	G10	0.315189
G2	G8	0.250074
G9	G7	0.241565
G7	G4	0.211394
G10	G9	0.204164
G2	G6	0.202257
G7	G9	0.200738
G5	G9	0.200737
G3	G4	0.194461
G4	G3	0.189364
G6	G2	0.184534
G8	G2	0.172268
G7	G2	0.170404
G3	G7	0.156099
G7	G3	0.147557
G4	G10	0.144872
G2	G1	0.131506
G9	G3	0.127681
G4	G7	0.127329
G5	G4	0.126016
G1	G2	0.125425
G5	G10	0.122471
G2	G5	0.115212
G1	G4	0.114741
G2	G7	0.106373
G1	G3	0.102602
G3	G10	0.097068
G9	G4	0.096996
G5	G2	0.096756
G10	G4	0.095135
G2	G3	0.091942
G8	G3	0.090899
G5	G6	0.087709
G5	G3	0.086710
G1	G7	0.086297
G7	G8	0.082485
G5	G7	0.081463
G6	G3	0.080062
G3	G2	0.079995
G3	G9	0.079807
G7	G10	0.079758
G1	G8	0.075637
G8	G1	0.075557
G7	G6	0.074039
G4	G5	0.073532
G2	G9	0.073463
G10	G3	0.073209
G8	G5	0.072352
G1	G10	0.071471
G8	G7	0.067435
G8	G9	0.066435
G6	G7	0.064495
G10	G7	0.063690
G4	G9	0.063440
G1	G6	0.062937
G9	G2	0.062928
G5	G8	0.060601
G6	G4	0.058870
G7	G5	0.057262
G8	G4	0.056758
G8	G10	0.056537
G3	G6	0.056262

Show the names of the genes

In [9]:
g.get_link_list(VIM,gene_names=gene_names)

TBX3	XRCC2	0.515879
XRCC2	TBX3	0.515484
CD93	CREB5	0.385897
CREB5	CD93	0.367422
CD19	RAD51	0.315189
GATA5	CREB5	0.250074
CD19	OSR2	0.241565
OSR2	CDH17	0.211394
RAD51	CD19	0.204164
GATA5	CD93	0.202257
OSR2	CD19	0.200738
XRCC2	CD19	0.200737
ZNF394	CDH17	0.194461
CDH17	ZNF394	0.189364
CD93	GATA5	0.184534
CREB5	GATA5	0.172268
OSR2	GATA5	0.170404
ZNF394	OSR2	0.156099
OSR2	ZNF394	0.147557
CDH17	RAD51	0.144872
GATA5	TBX3	0.131506
CD19	ZNF394	0.127681
CDH17	OSR2	0.127329
XRCC2	CDH17	0.126016
TBX3	GATA5	0.125425
XRCC2	RAD51	0.122471
GATA5	XRCC2	0.115212
TBX3	CDH17	0.114741
GATA5	OSR2	0.106373
TBX3	ZNF394	0.102602
ZNF394	RAD51	0.097068
CD19	CDH17	0.096996
XRCC2	GATA5	0.096756
RAD51	CDH17	0.095135
GATA5	ZNF394	0.091942
CREB5	ZNF394	0.090899
XRCC2	CD93	0.087709
XRCC2	ZNF394	0.086710
TBX3	OSR2	0.086297
OSR2	CREB5	0.082485
XRCC2	OSR2	0.081463
CD93	ZNF394	0.080062
ZNF394	GATA5	0.079995
ZNF394	CD19	0.079807
OSR2	RAD51	0.079758
TBX3	CREB5	0.075637
CREB5	TBX3	0.075557
OSR2	CD93	0.074039
CDH17	XRCC2	0.07

Show only the links that are directed from the candidate regulators:

In [10]:
g.get_link_list(VIM,gene_names=gene_names,regulators=regulators)

TBX3	XRCC2	0.515879
CD19	RAD51	0.315189
CD19	OSR2	0.241565
OSR2	CDH17	0.211394
RAD51	CD19	0.204164
OSR2	CD19	0.200738
CDH17	ZNF394	0.189364
OSR2	GATA5	0.170404
OSR2	ZNF394	0.147557
CDH17	RAD51	0.144872
CD19	ZNF394	0.127681
CDH17	OSR2	0.127329
TBX3	GATA5	0.125425
TBX3	CDH17	0.114741
TBX3	ZNF394	0.102602
CD19	CDH17	0.096996
RAD51	CDH17	0.095135
TBX3	OSR2	0.086297
OSR2	CREB5	0.082485
OSR2	RAD51	0.079758
TBX3	CREB5	0.075637
OSR2	CD93	0.074039
CDH17	XRCC2	0.073532
RAD51	ZNF394	0.073209
TBX3	RAD51	0.071471
RAD51	OSR2	0.063690
CDH17	CD19	0.063440
TBX3	CD93	0.062937
CD19	GATA5	0.062928
OSR2	XRCC2	0.057262
CDH17	CD93	0.055799
TBX3	CD19	0.053416
RAD51	GATA5	0.052051
CDH17	TBX3	0.051643
CD19	CREB5	0.049883
OSR2	TBX3	0.047149
RAD51	CD93	0.046966
RAD51	TBX3	0.043857
CD19	TBX3	0.043670
CD19	XRCC2	0.043535
CD19	CD93	0.042698
CDH17	GATA5	0.041563
RAD51	CREB5	0.036962
RAD51	XRCC2	0.034390
CDH17	CREB5	0.025304


Output the first 5 links:

In [11]:
g.get_link_list(VIM,gene_names=gene_names,regulators=regulators,maxcount=5)

TBX3	XRCC2	0.515879
CD19	RAD51	0.315189
CD19	OSR2	0.241565
OSR2	CDH17	0.211394
RAD51	CD19	0.204164
