# MCA-subset integration using SemiLT

In this tutorial, we will illustrate the utility of scJoint step by step using MCA-subset data from Tabula Muris(https://tabula-muris.ds.czbiohub.org/) and Mouse sci-ATAC-seq Atlas(https://atlas.gs.washington.edu/mouse-atac/). We can use SemiLT to perform integration analysis of scRNA-seq and scATAC-seq by concatenating gene expression or gene activity vectors. After preprocessing, quality control and cell type annotation, the data contains 22660 cells from scRNA-seq data and 36770 cells from scATAC-seq data.

## Preparing input for SemiLT in setting.py

```python
DB = 'MCA-subset'
if DB == "MCA-subset":
    self.number_of_class = 14 
    self.input_size = 17057 
    self.rna_paths = ['data_MCA/adata_mca_gem.h5ad'] 
    self.atac_paths = ['data_MCA/adata_mca_gam.h5ad'] 
    self.rna_protein_paths = [] 
    self.atac_protein_paths = [] 
    self.peak_paths = ['data_MCA/adata_mca_peak.h5ad']
    self.atac_labels = True

    # Training setting            
    self.batch_size = 256
    self.lr = 0.008
    self.lr_decay_epoch = 20
    self.epochs = 20
    self.embedding_size = 64
    self.momentum = 0.9
    self.seed = 1
    self.checkpoint = ''
```

## Running SemiLT in main.py

In [1]:
import torch
import os
from datetime import datetime
from SemiLT.trainingprocess import Training
from SemiLT.transfer import Transfer
import time
from setting import Setting
import random
random.seed(1)

def main():
    # hardware constraint for speed test
    start_time = time.time()
    torch.set_num_threads(1)
    os.environ['OMP_NUM_THREADS'] = '1'
    
    # initialization 
    setting = Setting()    
    torch.manual_seed(setting.seed)
    print('Start time: ', datetime.now().strftime('%H:%M:%S'))
    
    # Training
    print('SemiLT start:')
    model_stage1= Training(setting)    
    for epoch in range(setting.epochs):
        print('Epoch:', epoch)
        model_stage1.train(epoch)
    
    print('Write embeddings')
    model_stage1.write_embeddings()
    print('SemiLT finished: ', datetime.now().strftime('%H:%M:%S'))
    
    # Label transfer
    print('Label transfer:')
    Transfer(setting, neighbors = 10, knn_rna_samples=50000)
    print('Label transfer finished: ', datetime.now().strftime('%H:%M:%S'))
    
    end_time = time.time()
    run_time = end_time - start_time
    hours = int(run_time / 3600)
    minutes = int((run_time - hours * 3600) / 60)
    seconds = int(run_time - hours * 3600 - minutes * 60)
    print(f"Run time：{hours}: {minutes}: {seconds}")
    
if __name__ == "__main__":
    main()

Start time:  05:03:21
SemiLT start:
num_workers: 0
load h5ad matrix: /users/PCON0022/wangxiaoying/czt/myJoint/5-SemiLT/data_MCA/adata_mca_gem.h5ad
load h5ad matrix: /users/PCON0022/wangxiaoying/czt/myJoint/5-SemiLT/data_MCA/adata_mca_gam.h5ad
load h5ad matrix: /users/PCON0022/wangxiaoying/czt/myJoint/5-SemiLT/data_MCA/adata_mca_peak.h5ad
Epoch: 0
LR is set to 0.008
LR is set to 0.008
Epoch: 1
Epoch: 2
Epoch: 3
Epoch: 4
Epoch: 5
Epoch: 6
Epoch: 7
Epoch: 8
Epoch: 9
Epoch: 10
Epoch: 11
Epoch: 12
Epoch: 13
Epoch: 14
Epoch: 15
Epoch: 16
Epoch: 17
Epoch: 18
Epoch: 19
Write embeddings
SemiLT finished:  05:38:41
Label transfer:
[Label transfer] Read RNA data
[Label transfer] Read ATAC data
[Label transfer] Build Space
[Label transfer] finished
ARI:0.989108
Recall：0.987517
Precision:0.983884
F1-score：0.985154
Label transfer finished:  05:39:23
Run time：0: 36: 2
