# Configure the model architecture and training parameters

This tutorial provides a step-by-step guide on configuring the model architectures, training hyperparameters, and analysis of time-series single dataset using UNAGI. We demonstrate the capabilities of UNAGI by applying it to scRNA-seq data sampled from a single-nuclei RNA sequencing data.




In [1]:
import warnings
warnings.filterwarnings('ignore')
from UNAGI import UNAGI
unagi = UNAGI()

## Part 1: Setup and load the datasets

After loading UNAGI package, we need to setup the data for UNAGI training.

-   We need to specify the data path of your h5ad files after stage segmentation. e.g. '../data/small/0.h5ad'. Then UNAGI will load all h5ad files in the target directory. 

-   UNAGI requires the total number of time-points the dataset has as the input. e.g. total_stage=4

-   UNAGI requires the key of time-points attribute in the annData.obs table.

-   If the dataset is not splited into individual stages, you can specify the splited_dataset as False to segment the dataset.

-   To build the K-Nearest Neighbors (KNN) connectivity matrix in Graph convolution training, the neighbors number of KNN should be defined. The default value is 25. 

-   You can also specify how many threads you want to use when using UNAGI. The default number of threads is 20. 

In [2]:
unagi.setup_data('../UNAGI/data/example',total_stage=4,stage_key='stage')

mkdir: cannot create directory ‘../UNAGI/data/example/0’: File exists


## Part 2: Configure the model architecture of UNAGI and training hyper-parameters

First, it's mandatory to specify the **task** your are executing. (e.g. we call the example dataset as task='small_sample') The **task** is the identifier of your experiments and you can reterive the trained model and the results of each iteration at '../data/**task**/' directory. 

Next, you will need to specify the distribution of you single cell data. UNAGI provides negative binomial (NB), zero-inflated negative binomial (ZINB), zero-inflated log normal, and normal distribution to model your single cell data.

You can use the *device* keyword to specify the device you want to use for training.

'epoch_initial': the number of training epochs for the first iteration.

'epoch_iter': the number of training epochs for the iterative training.

'max_iter': the total number of iterations UNAGI will run

'BATCHSIZE': the batch size of a mini-batch

'lr': the learning rate of Graph VAE

'lr_dis': the learning rate of the adversarial discriminator

'latent_dim': the dimension of Z space

'hiddem_dim': the neuron size of each fully connected layers

'graph_dim': the dimension of graph representation

After settingt the training hyper parameters and model architectures, you can use `unagi.run_UNAGI()` to start training. 

In [None]:
unagi.setup_training(task='example',dist='ziln',device='cuda:0',GPU=True,epoch_iter=5,epoch_initial=2,max_iter=3,BATCHSIZE=560)
unagi.run_UNAGI(idrem_dir = '../../idrem',CPO=True) #specify the path to the idrem (installed separately) directory, set CPO to True if you want to use CPO for the training (extra time needed)

## Part 3: Perform in-silico perturbations and downstream analysis

After training the UNAGI model, you can perfrom downstream tasks including hierarchical static marker discovries 
parameters: 
data_path: the directory of the dataset generated by UNAGI
iteration: the iteration of the dataset belongs to
progressionmarker_background_sampling_times: the number of sampling times to generate the dynamic marker backgrounds
target_dir: the directory to store the downstream analysis results and h5ad files
customized_drug: the directory to customized drug profile
cmap_dir: the directory to the precomputed CMAP database which contains the drug/compounds and their regualted genes and regualated directions.

In [None]:
import warnings
warnings.filterwarnings('ignore')
from UNAGI import UNAGI
unagi = UNAGI()
unagi.analyse_UNAGI('../UNAGI/data/example/2/stagedata/org_dataset.h5ad',2,10,target_dir=None,customized_drug='../UNAGI/data/jasper_target_pair.npy',cmap_dir='../../CMAPDirectionDf.npy')