### GraphAnnDataModule Use case

This is an interactive notebook that allows the user to train their choice of model on different Squidpy datasets. By walking through the notebook step-by-step, the user can see how the GraphAnnDataModule class can be applied to train different types of models. For more information on the example models to choose from, see https://github.com/theislab/ncem

In [None]:
%load_ext autoreload
%autoreload 2

In [26]:
import os
from argparse import ArgumentParser, Namespace
from random import choices
import pytorch_lightning as pl
from typing import Callable, List, Optional, Sequence, Union
import squidpy as sq
import torch
from torch_geometric.loader import RandomNodeSampler
import pandas as pd
from torch_geometric.data import Data
from anndata import AnnData
from gpu_spatial_graph_pipeline.utils import adata2data
from gpu_spatial_graph_pipeline.data.datamodule import GraphAnnDataModule
from gpu_spatial_graph_pipeline.models.linear_ncem import LinearNCEM
from gpu_spatial_graph_pipeline.models.non_linear_ncem import NonLinearNCEM
from gpu_spatial_graph_pipeline.models.graph_embedding import GraphEmbedding

from ipywidgets import widgets

### Choose/Upload dataset

Run the cell below to choose the dataset to train on

In [13]:
dropvals1 = widgets.Dropdown(options=[('custom', 1),('mibitof', 2)], value=1,description="dataset")
dropvals1


Dropdown(description='dataset', options=(('custom', 1), ('mibitof', 2)), value=1)

Run the cell below to choose the learning type for splitting the data

In [14]:
dropvals2= widgets.Dropdown(options=[('nodewise', 1),('graphwise', 2)], value=1,description="learning type")
dropvals2

Dropdown(description='learning type', options=(('nodewise', 1), ('graphwise', 2)), value=1)

Run the cell below to instantiate your datamodule

In [23]:
if dropvals1.value==1:

    raise NotImplementedError


elif dropvals1.value==2:
    adata = sq.datasets.mibitof()
    feature_names=['Cluster','batch']

    def anndata2data(adata):
        return adata2data(adata,feature_names)

    
    #input of datamodule
    num_features=(len(set(adata.obs[feature_names[0]])),len(set(adata.obs[feature_names[1]])))

    num_genes=adata.X.shape[1]

dm = GraphAnnDataModule(adata=adata, adata2data_fn=anndata2data, num_workers = 8, batch_size=40,learning_type=dropvals2.options[dropvals2.value-1][0])

dm.setup()
itr = 2
print(f"Sample of batches from custom datamodule:")
for batch in dm.train_dataloader():
    print(batch)
    itr -= 1
    if itr<0:
        break


Sample of batches from custom datamodule:
DataBatch(x=[268, 11], edge_index=[2, 240], y=[268, 36], batch=[268], ptr=[4], train_mask=[268], val_mask=[268], test_mask=[268], batch_size=40)
DataBatch(x=[273, 11], edge_index=[2, 240], y=[273, 36], batch=[273], ptr=[4], train_mask=[273], val_mask=[273], test_mask=[273], batch_size=40)
DataBatch(x=[272, 11], edge_index=[2, 240], y=[272, 36], batch=[272], ptr=[4], train_mask=[272], val_mask=[272], test_mask=[272], batch_size=40)


## Choose model

Run the cell below to choose the model to train

In [24]:
dropvals_model = widgets.Dropdown(options=[('custom', 1),('linear_ncem', 2),('nonlinear_ncem', 3),('graph_embedding', 4)], value=1,description="model")
dropvals_model

Dropdown(description='model', options=(('custom', 1), ('linear_ncem', 2), ('nonlinear_ncem', 3), ('graph_embed…

Run the cell below to instantiate your datamodule

In [35]:
if dropvals_model.value==1:
    raise NotImplementedError
elif dropvals_model.value==2:
    model = LinearNCEM(in_channels=num_features,out_channels=num_genes, model_type='spatial', lr=0.0001,weight_decay=0.000001)
elif dropvals_model.value==3:
    model = NonLinearNCEM(in_channels=num_features,encoder_hidden_dims=10,latent_dim=30,decoder_hidden_dims=10,out_channels=num_genes, lr=0.0001,weight_decay=0.000001)
elif dropvals_model.value==4:
    model = GraphEmbedding(num_features=34,latent_dim=30,lr=0.0001,weight_decay=0.000001)

model

GraphEmbedding(
  (model): GraphAE(
    (conv1): SAGEConv(34, 60)
    (conv2): SAGEConv(60, 30)
    (conv3): SAGEConv(30, 60)
    (conv4): SAGEConv(60, 34)
  )
  (loss_fn): MSELoss()
)

Run the cell below to choose accelerator 

In [32]:
dropvals_gpu= widgets.Dropdown(options=[('gpu', 1),('cpu', 2)], value=1,description="accelerator")
dropvals_gpu

Dropdown(description='accelerator', options=(('gpu', 1), ('cpu', 2)), value=1)

Run the cell below to train your model

In [36]:
if dropvals_gpu.value==1:
    trainer:pl.Trainer = pl.Trainer(accelerator='gpu',max_epochs=10,log_every_n_steps=10)
else:
    trainer:pl.Trainer = pl.Trainer(accelerator='cpu',max_epochs=10,log_every_n_steps=10)
    
trainer.fit(model,datamodule=dm)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name    | Type    | Params
------------------------------------
0 | model   | GraphAE | 15.5 K
1 | loss_fn | MSELoss | 0     
------------------------------------
15.5 K    Trainable params
0         Non-trainable params
15.5 K    Total params
0.062     Total estimated model params size (MB)
2022-09-16 16:39:21.479942: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-16 16:39:21.480244: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


Sanity Checking: 0it [00:00, ?it/s]

RuntimeError: mat1 and mat2 shapes cannot be multiplied (222x11 and 34x60)

Run the cell below to test your model

In [None]:
trainer.test(model, datamodule=dm)