### Notebook to format TB PBMCS and healthy PBMCS for label transfer with `scNym`

- **Developed by**: Carlos Talavera-López Ph.D
- **Institute of Computational Biology - Computational Health Centre - Helmholtz Munich**
- v221017

### Import required modules

In [None]:
import anndata
import numpy as np
import pandas as pd
import scanpy as sc

### Read in query and reference objects

In [None]:
query = sc.read_h5ad('/home/cartalop/data/single_cell/lung/tb/merged/CaiY_PBMC-TB_QCed_pre-processed_ctl221017.h5ad') 
query

In [None]:
query.var.head()

In [None]:
query.obs['status'].cat.categories

In [None]:
meyer = sc.read_h5ad('/home/cartalop/data/single_cell/lung/yoshida_2022/pbmc/meyer_nikolic_covid_pbmc_raw.h5ad') 
meyer

In [None]:
meyer_pbmc = meyer[meyer.obs['COVID_status'].isin(['Healthy'])]
meyer_pbmc

### Format data as reference for `scNym`

In [None]:
query.obs['domain_label'] = query.obs['sample'].copy()
query.obs['domain_label'] = 'target_' + query.obs['domain_label'].astype(str)
query.obs['domain_label'] = query.obs['domain_label'].astype('category')
query.obs['domain_label'].cat.categories

In [None]:
query.obs['cell_states'] = 'Unlabeled'

### Format data as query for `scNym`

In [None]:
meyer_pbmc

In [None]:
meyer_pbmc.obs['cell_states'] = meyer_pbmc.obs['annotation_detailed'].copy()
meyer_pbmc.obs['status'] = 'Healthy'

In [None]:
meyer_pbmc.obs['domain_label'] = meyer_pbmc.obs['sample_id'].copy()
meyer_pbmc.obs['domain_label'] = 'train_' + meyer_pbmc.obs['domain_label'].astype(str)
meyer_pbmc.obs['domain_label'] = meyer_pbmc.obs['domain_label'].astype('category')
meyer_pbmc.obs['domain_label'].cat.categories

### Merge two objects 

In [None]:
tb_pbmc = query.concatenate(meyer_pbmc, batch_key = 'object', batch_categories = ['query', 'reference'], join = 'inner')
tb_pbmc

### Clean up object

- Clean up `adata.obs`

In [None]:
tb_pbmc.obs.drop(tb_pbmc.obs.columns.difference(['domain_label','cell_states', 'object']), 1, inplace = True)
tb_pbmc

- Clean up `adata.var`

In [None]:
tb_pbmc.var.drop(tb_pbmc.var.columns.difference(['gene_id-query']), 1, inplace = True)
tb_pbmc

### Save object for `scNym`

In [None]:
tb_pbmc.write('/home/cartalop/data/single_cell/lung/tb/merged/CaiY_PBMC_TB_pre-scnym_ctl221017.h5ad')