In [1]:
import sys
sys.path.append('../')

from gears import PertData, GEARS

  from .autonotebook import tqdm as notebook_tqdm


Load data. We use norman as an example.

In [2]:
pert_data = PertData('./data')
pert_data.load(data_name = 'norman')
pert_data.prepare_split(split = 'simulation', seed = 1)
pert_data.get_dataloader(batch_size = 32, test_batch_size = 128)

Found local copy...
No gene path provided, using all genes
Found local copy...
These perturbations are not in the GO graph and is thus not able to make prediction for...
['RHOXF2BB+ctrl' 'LYL1+IER5L' 'ctrl+IER5L' 'KIAA1804+ctrl' 'IER5L+ctrl'
 'RHOXF2BB+ZBTB25' 'RHOXF2BB+SET']
Local copy of pyg dataset is detected. Loading...
Done!
Local copy of split is detected. Loading...
Simulation split test composition:
combo_seen0:9
combo_seen1:43
combo_seen2:19
unseen_single:36
Done!
Creating dataloaders....
Done!


Create a model object; if you use [wandb](https://wandb.ai), you can easily track model training and evaluation by setting `weight_bias_track` to true, and specify the `proj_name` and `exp_name` that you like.

In [3]:
pert_data

<gears.pertdata.PertData at 0x7f8f1ef74b50>

In [4]:
gears_model = GEARS(pert_data, device = 'cuda:0', 
                        weight_bias_track = False, 
                        proj_name = 'pertnet', 
                        exp_name = 'pertnet',)
gears_model.model_initialize(hidden_size = 64, go_path="./data/norman/go.csv")

You can find available tunable parameters in model_initialize via

In [5]:
gears_model.tunable_parameters()

{'hidden_size': 'hidden dimension, default 64',
 'num_go_gnn_layers': 'number of GNN layers for GO graph, default 1',
 'num_gene_gnn_layers': 'number of GNN layers for co-expression gene graph, default 1',
 'decoder_hidden_size': 'hidden dimension for gene-specific decoder, default 16',
 'num_similar_genes_go_graph': 'number of maximum similar K genes in the GO graph, default 20',
 'num_similar_genes_co_express_graph': 'number of maximum similar K genes in the co expression graph, default 20',
 'coexpress_threshold': 'pearson correlation threshold when constructing coexpression graph, default 0.4',
 'uncertainty': 'whether or not to turn on uncertainty mode, default False',
 'uncertainty_reg': 'regularization term to balance uncertainty loss and prediction loss, default 1',
 'direction_lambda': 'regularization term to balance direction loss and prediction loss, default 1'}

Train your model:

Note: For the sake of demo, we set epoch size to 1. To get full model, set `epochs = 20`.

In [6]:
gears_model.train(epochs = 20, lr = 1E-3)

Start Training...
Epoch 1 Step 1 Train Loss: 1.5037
Epoch 1 Step 51 Train Loss: 0.8678
Epoch 1 Step 101 Train Loss: 0.6175
Epoch 1 Step 151 Train Loss: 0.5405
Epoch 1 Step 201 Train Loss: 0.5443
Epoch 1 Step 251 Train Loss: 0.4987
Epoch 1 Step 301 Train Loss: 0.5114
Epoch 1 Step 351 Train Loss: 0.7144
Epoch 1 Step 401 Train Loss: 0.6537
Epoch 1 Step 451 Train Loss: 0.4845
Epoch 1 Step 501 Train Loss: 0.4590
Epoch 1 Step 551 Train Loss: 0.6324
Epoch 1 Step 601 Train Loss: 0.4472
Epoch 1 Step 651 Train Loss: 0.4866
Epoch 1 Step 701 Train Loss: 0.5326


KeyboardInterrupt: 

Save and load pretrained models:

In [6]:
gears_model.save_model('test_model')
gears_model.load_pretrained('test_model')

Make prediction for new perturbation:

In [7]:
gears_model.predict([['FEV'], ['FEV', 'SAMD11']])

{'FEV': array([1.0191492e-03, 3.7346520e-02, 9.2324615e-02, ..., 3.4014411e+00,
        1.1585764e-02, 6.3127303e-04], dtype=float32),
 'FEV_SAMD11': array([1.9250450e-03, 6.5041207e-02, 1.6452499e-01, ..., 3.1653712e+00,
        2.0137992e-02, 1.4946696e-03], dtype=float32)}

Gene list can be found here:

In [8]:
gears_model.gene_list[:5]

['RP11-34P13.8', 'RP11-54O7.3', 'SAMD11', 'PERM1', 'HES4']