# CITE-seq and ASAP-seq integration using SemiLT

In this tutorial, we will illustrate the utility of SemiLT step by step using CITE-seq and ASAP-seq dataset (data-7) from a T cell stimulation experiment conducted by by Mimitou et al. 2020. We can use SemiLT to perform integration analysis of CITE-seq and ASAP-seq by concatenating gene expression or gene activity vectors with protein measurements. The data can be downoladed from GSE156478 or (https://github.com/SydneyBioX/scJoint/tree/main). After preprocessing, quality control and cell type annotation, the data contains 4644 cells from CITE-seq data and 4502 cells from ASAP-seq data.

In [1]:
import torch
import os
from datetime import datetime
from SemiLT.trainingprocess import Training
from SemiLT.transfer import Transfer
import time
from setting import Setting
import random
random.seed(1)
setting = Setting()

## Preparing input for SemiLT in setting.py

```python
DB = 'CITE-ASAP'
if DB == "CITE_ASAP":
    self.number_of_class = 7 # Number of cell types in demo data
    self.input_size = 17219 # Number of common genes and proteins between reference data and target data
    self.rna_paths = ['data_demo/adata_ref_rna.h5ad'] # GEM from reference data
    self.atac_paths = ['data_demo/adata_tar_atac.h5ad'] # GAM from target data
    self.rna_protein_paths = ['data_demo/adata_ref_adt.h5ad'] # Protein expression from reference data
    self.atac_protein_paths = ['data_demo/adata_tar_adt.h5ad'] # Protein expression from target data
    self.peak_paths = ['data_demo/adata_tar_peak.h5ad'] #adata_tar_peak adata_tar_peak_tfidf
    self.atac_labels = True

    # Training setting            
    self.batch_size = 256
    self.lr = 0.008 #0.01
    self.lr_decay_epoch = 20
    self.epochs = 20
    self.embedding_size = 64
    self.momentum = 0.9
    self.seed = 1
    self.checkpoint = ''
```

## View data

In [2]:
import scanpy as sc
adata_ref_rna = sc.read(setting.rna_paths[0])
print(adata_ref_rna)
adata_tar_atac = sc.read(setting.atac_paths[0])
print(adata_tar_atac)
adata_ref_adt = sc.read(setting.rna_protein_paths[0])
print(adata_ref_adt)
adata_tar_adt = sc.read(setting.atac_protein_paths[0])
print(adata_tar_adt)

AnnData object with n_obs × n_vars = 4644 × 16994
    obs: 'cell_type'
AnnData object with n_obs × n_vars = 4502 × 16994
    obs: 'cell_type'
AnnData object with n_obs × n_vars = 4644 × 225
AnnData object with n_obs × n_vars = 4502 × 225


## Running SemiLT in main.py

In [3]:
def main():
    # hardware constraint for speed test
    start_time = time.time()
    torch.set_num_threads(1)
    os.environ['OMP_NUM_THREADS'] = '1'
    
    # initialization 
    setting = Setting()    
    torch.manual_seed(setting.seed)
    print('Start time: ', datetime.now().strftime('%H:%M:%S'))
    
    # Training
    print('SemiLT start:')
    model_stage1= Training(setting)    
    for epoch in range(setting.epochs):
        print('Epoch:', epoch)
        model_stage1.train(epoch)
    
    print('Write embeddings')
    model_stage1.write_embeddings()
    print('SemiLT finished: ', datetime.now().strftime('%H:%M:%S'))
    
    # Label transfer
    print('Label transfer:')
    Transfer(setting, neighbors = 10, knn_rna_samples=50000)
    print('Label transfer finished: ', datetime.now().strftime('%H:%M:%S'))
    
    end_time = time.time()
    run_time = end_time - start_time
    hours = int(run_time / 3600)
    minutes = int((run_time - hours * 3600) / 60)
    seconds = int(run_time - hours * 3600 - minutes * 60)
    print(f"Run time：{hours}: {minutes}: {seconds}")
    
if __name__ == "__main__":
    main()

Start time:  19:54:16
SemiLT start:
num_workers: 0
load h5ad matrix: /users/PCON0022/wangxiaoying/czt/myJoint/5-SemiLT/data_demo/adata_ref_rna.h5ad
load h5ad matrix: /users/PCON0022/wangxiaoying/czt/myJoint/5-SemiLT/data_demo/adata_ref_adt.h5ad
load h5ad matrix: /users/PCON0022/wangxiaoying/czt/myJoint/5-SemiLT/data_demo/adata_tar_atac.h5ad
load h5ad matrix: /users/PCON0022/wangxiaoying/czt/myJoint/5-SemiLT/data_demo/adata_tar_adt.h5ad
load h5ad matrix: /users/PCON0022/wangxiaoying/czt/myJoint/5-SemiLT/data_demo/adata_tar_peak.h5ad
Epoch: 0
LR is set to 0.008
LR is set to 0.008
Epoch: 1
Epoch: 2
Epoch: 3
Epoch: 4
Epoch: 5
Epoch: 6
Epoch: 7
Epoch: 8
Epoch: 9
Epoch: 10
Epoch: 11
Epoch: 12
Epoch: 13
Epoch: 14
Epoch: 15
Epoch: 16
Epoch: 17
Epoch: 18
Epoch: 19
Write embeddings
SemiLT finished:  20:02:30
Label transfer:
[Label transfer] Read RNA data
[Label transfer] Read ATAC data
[Label transfer] Build Space
[Label transfer] finished
ARI:0.856386
Recall：0.943709
Precision:0.945964
F1-score