### Notebook for `merging` the reference data sets of human PBMCs and human heart 

#### Environment: Scanpy

- **Developed by**: Alexandra Cirnu
- **Modified by**: Alexandra Cirnu
- **Würzburg Institute for Systems Immunology & Julius-Maximilian-Universität Würzburg**
- **Date of creation**: 240222
- **Date of modification**: 240222

### Import required modules

In [1]:
import anndata
import numpy as np
import pandas as pd
import scanpy as sc

## Read in all datasets

In [2]:
def X_is_raw(adata): return np.array_equal(adata.X.sum(axis=0).astype(int), adata.X.sum(axis=0))

### Read in PBMC data set - keep only healthy donors

COVID-19 airway and matched PBMCs H5AD(raw) downloaded from [here](https://www.covid19cellatlas.org/index.patient.html)

In [3]:
pbmc = sc.read_h5ad('/Users/alex/data/ACM_cardiac_leuco/Reference_data/Annotated_PBMC/meyer_nikolic_covid_pbmc_raw.h5ad')
pbmc

AnnData object with n_obs × n_vars = 422220 × 33559
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ADT', 'nFeature_ADT', 'Age_group', 'BMI', 'COVID_severity', 'COVID_status', 'Ethnicity', 'Group', 'Sex', 'Smoker', 'annotation_broad', 'annotation_detailed', 'annotation_detailed_fullNames', 'patient_id', 'sample_id', 'sequencing_library', 'Protein_modality_weight'
    var: 'name'

In [4]:
pbmc.obs

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,nCount_ADT,nFeature_ADT,Age_group,BMI,COVID_severity,COVID_status,Ethnicity,Group,Sex,Smoker,annotation_broad,annotation_detailed,annotation_detailed_fullNames,patient_id,sample_id,sequencing_library,Protein_modality_weight
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
CV001_KM10202384-CV001_KM10202394_AAACCTGAGGCAGGTT-1,CV001_KM10202384-CV001_KM10202394,5493.0,1767,5297.0,184,Adult,Unknown,Healthy,Healthy,EUR,Adult,Female,Non-smoker,Monocyte,Monocyte CD14,Classical monocyte,AN5,AN5,CV001_KM10202384-CV001_KM10202394,0.359517
CV001_KM10202384-CV001_KM10202394_AAACCTGAGTGTCCCG-1,CV001_KM10202384-CV001_KM10202394,4868.0,1577,2169.0,165,Adult,Unknown,Healthy,Healthy,EUR,Adult,Female,Non-smoker,T CD4+,T CD4 helper,T CD4 helper,AN5,AN5,CV001_KM10202384-CV001_KM10202394,0.577522
CV001_KM10202384-CV001_KM10202394_AAACCTGCAGATGGGT-1,CV001_KM10202384-CV001_KM10202394,3178.0,1257,1330.0,163,Adult,Unknown,Healthy,Healthy,EUR,Adult,Male,Non-smoker,T CD4+,T CD4 helper,T CD4 helper,AN3,AN3,CV001_KM10202384-CV001_KM10202394,0.369143
CV001_KM10202384-CV001_KM10202394_AAACCTGGTATAGTAG-1,CV001_KM10202384-CV001_KM10202394,4745.0,1477,1255.0,161,Adult,Unknown,Healthy,Healthy,EUR,Adult,Female,Non-smoker,T CD8+,T CD8 naive,T CD8 naive,AN5,AN5,CV001_KM10202384-CV001_KM10202394,0.785563
CV001_KM10202384-CV001_KM10202394_AAACCTGGTGTGCGTC-1,CV001_KM10202384-CV001_KM10202394,1902.0,954,1711.0,166,Adult,Unknown,Healthy,Healthy,EUR,Adult,Female,Non-smoker,T CD4+,T CD4 naive,T CD4 naive,AN5,AN5,CV001_KM10202384-CV001_KM10202394,0.564174
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CV001_KM9294396-CV001_KM9294404_TTTGTCAGTTCTGTTT-1,CV001_KM9294396-CV001_KM9294404,1001.0,554,876.0,147,Adult,27.17,Severe,Post-COVID-19,EUR,Adult,Male,Non-smoker,NK,NK,NK,PC9,PC9,CV001_KM9294396-CV001_KM9294404,0.429398
CV001_KM9294396-CV001_KM9294404_TTTGTCATCAACCAAC-1,CV001_KM9294396-CV001_KM9294404,838.0,430,1287.0,162,Adult,27.17,Severe,Post-COVID-19,EUR,Adult,Male,Non-smoker,Monocyte,Monocyte CD14,Classical monocyte,PC9,PC9,CV001_KM9294396-CV001_KM9294404,0.677910
CV001_KM9294396-CV001_KM9294404_TTTGTCATCATTATCC-1,CV001_KM9294396-CV001_KM9294404,641.0,405,1245.0,160,Adult,27.17,Severe,Post-COVID-19,EUR,Adult,Male,Non-smoker,Monocyte,Monocyte CD14,Classical monocyte,PC9,PC9,CV001_KM9294396-CV001_KM9294404,0.422796
CV001_KM9294396-CV001_KM9294404_TTTGTCATCCTATGTT-1,CV001_KM9294396-CV001_KM9294404,551.0,341,1199.0,170,Adult,27.17,Severe,Post-COVID-19,EUR,Adult,Male,Non-smoker,DC,pDC,pDC,PC9,PC9,CV001_KM9294396-CV001_KM9294404,0.471905


In [5]:
X_is_raw(pbmc)

True

In [6]:
pbmc.obs['COVID_status'].value_counts()

COVID_status
Healthy          173684
COVID-19         151312
Post-COVID-19     97224
Name: count, dtype: int64

In [7]:
pbmc.obs['n_counts'] = pbmc.obs['nCount_RNA'].copy()
pbmc.obs['n_genes'] = pbmc.obs['nFeature_RNA'].copy()
pbmc.obs['age_group'] = pbmc.obs['Age_group'].copy()
pbmc.obs['donor'] = pbmc.obs['patient_id'].copy()
pbmc.obs['sample'] = pbmc.obs['sample_id'].copy()
pbmc.obs["cell_source"] = "Yoshida"
pbmc.obs['cell_states'] = pbmc.obs['annotation_detailed_fullNames'].copy()
pbmc.obs['gender'] = pbmc.obs['Sex'].copy()

pbmc = pbmc[pbmc.obs['COVID_status'].isin(["Healthy"])]

pbmc.obs

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,nCount_ADT,nFeature_ADT,Age_group,BMI,COVID_severity,COVID_status,Ethnicity,...,sequencing_library,Protein_modality_weight,n_counts,n_genes,age_group,donor,sample,cell_source,cell_states,gender
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
CV001_KM10202384-CV001_KM10202394_AAACCTGAGGCAGGTT-1,CV001_KM10202384-CV001_KM10202394,5493.0,1767,5297.0,184,Adult,Unknown,Healthy,Healthy,EUR,...,CV001_KM10202384-CV001_KM10202394,0.359517,5493.0,1767,Adult,AN5,AN5,Yoshida,Classical monocyte,Female
CV001_KM10202384-CV001_KM10202394_AAACCTGAGTGTCCCG-1,CV001_KM10202384-CV001_KM10202394,4868.0,1577,2169.0,165,Adult,Unknown,Healthy,Healthy,EUR,...,CV001_KM10202384-CV001_KM10202394,0.577522,4868.0,1577,Adult,AN5,AN5,Yoshida,T CD4 helper,Female
CV001_KM10202384-CV001_KM10202394_AAACCTGCAGATGGGT-1,CV001_KM10202384-CV001_KM10202394,3178.0,1257,1330.0,163,Adult,Unknown,Healthy,Healthy,EUR,...,CV001_KM10202384-CV001_KM10202394,0.369143,3178.0,1257,Adult,AN3,AN3,Yoshida,T CD4 helper,Male
CV001_KM10202384-CV001_KM10202394_AAACCTGGTATAGTAG-1,CV001_KM10202384-CV001_KM10202394,4745.0,1477,1255.0,161,Adult,Unknown,Healthy,Healthy,EUR,...,CV001_KM10202384-CV001_KM10202394,0.785563,4745.0,1477,Adult,AN5,AN5,Yoshida,T CD8 naive,Female
CV001_KM10202384-CV001_KM10202394_AAACCTGGTGTGCGTC-1,CV001_KM10202384-CV001_KM10202394,1902.0,954,1711.0,166,Adult,Unknown,Healthy,Healthy,EUR,...,CV001_KM10202384-CV001_KM10202394,0.564174,1902.0,954,Adult,AN5,AN5,Yoshida,T CD4 naive,Female
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CV001_KM9166642-CV001_KM9166650_TTTGTCAGTCGCATCG-1,CV001_KM9166642-CV001_KM9166650,3513.0,1297,513.0,137,Adolescent,24.7,Healthy,Healthy,EUR,...,CV001_KM9166642-CV001_KM9166650,0.343914,3513.0,1297,Adolescent,NP32,NP32,Yoshida,T CD8 naive,Male
CV001_KM9166642-CV001_KM9166650_TTTGTCAGTGTAAGTA-1,CV001_KM9166642-CV001_KM9166650,1888.0,1296,631.0,145,Adolescent,24.7,Healthy,Healthy,EUR,...,CV001_KM9166642-CV001_KM9166650,0.264285,1888.0,1296,Adolescent,NP32,NP32,Yoshida,NK IFN stim,Male
CV001_KM9166642-CV001_KM9166650_TTTGTCATCATGTCCC-1,CV001_KM9166642-CV001_KM9166650,1798.0,814,2227.0,169,Child,18.2,Healthy,Healthy,AFR,...,CV001_KM9166642-CV001_KM9166650,0.353094,1798.0,814,Child,NP31,NP31,Yoshida,Classical monocyte,Male
CV001_KM9166642-CV001_KM9166650_TTTGTCATCGAGGTAG-1,CV001_KM9166642-CV001_KM9166650,4407.0,1351,1014.0,153,Child,18.2,Healthy,Healthy,AFR,...,CV001_KM9166642-CV001_KM9166650,0.611991,4407.0,1351,Child,NP31,NP31,Yoshida,T CD8 naive,Male


In [8]:
pbmc

View of AnnData object with n_obs × n_vars = 173684 × 33559
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ADT', 'nFeature_ADT', 'Age_group', 'BMI', 'COVID_severity', 'COVID_status', 'Ethnicity', 'Group', 'Sex', 'Smoker', 'annotation_broad', 'annotation_detailed', 'annotation_detailed_fullNames', 'patient_id', 'sample_id', 'sequencing_library', 'Protein_modality_weight', 'n_counts', 'n_genes', 'age_group', 'donor', 'sample', 'cell_source', 'cell_states', 'gender'
    var: 'name'

In [9]:
del(pbmc.obs['orig.ident'])
del(pbmc.obs['nCount_RNA'])
del(pbmc.obs['nFeature_RNA'])
del(pbmc.obs['Age_group'])
del(pbmc.obs['patient_id'])
del(pbmc.obs['sample_id'])
del(pbmc.obs['annotation_detailed_fullNames'])
del(pbmc.obs['Sex'])
del(pbmc.obs['BMI'])
del(pbmc.obs['COVID_severity'])
del(pbmc.obs['Ethnicity'])
del(pbmc.obs['Group'])
del(pbmc.obs['Smoker'])
del(pbmc.obs['sequencing_library'])
del(pbmc.obs['Protein_modality_weight'])
del(pbmc.obs['nCount_ADT'])
del(pbmc.obs['nFeature_ADT'])

In [10]:
pbmc.obs

Unnamed: 0_level_0,COVID_status,annotation_broad,annotation_detailed,n_counts,n_genes,age_group,donor,sample,cell_source,cell_states,gender
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
CV001_KM10202384-CV001_KM10202394_AAACCTGAGGCAGGTT-1,Healthy,Monocyte,Monocyte CD14,5493.0,1767,Adult,AN5,AN5,Yoshida,Classical monocyte,Female
CV001_KM10202384-CV001_KM10202394_AAACCTGAGTGTCCCG-1,Healthy,T CD4+,T CD4 helper,4868.0,1577,Adult,AN5,AN5,Yoshida,T CD4 helper,Female
CV001_KM10202384-CV001_KM10202394_AAACCTGCAGATGGGT-1,Healthy,T CD4+,T CD4 helper,3178.0,1257,Adult,AN3,AN3,Yoshida,T CD4 helper,Male
CV001_KM10202384-CV001_KM10202394_AAACCTGGTATAGTAG-1,Healthy,T CD8+,T CD8 naive,4745.0,1477,Adult,AN5,AN5,Yoshida,T CD8 naive,Female
CV001_KM10202384-CV001_KM10202394_AAACCTGGTGTGCGTC-1,Healthy,T CD4+,T CD4 naive,1902.0,954,Adult,AN5,AN5,Yoshida,T CD4 naive,Female
...,...,...,...,...,...,...,...,...,...,...,...
CV001_KM9166642-CV001_KM9166650_TTTGTCAGTCGCATCG-1,Healthy,T CD8+,T CD8 naive,3513.0,1297,Adolescent,NP32,NP32,Yoshida,T CD8 naive,Male
CV001_KM9166642-CV001_KM9166650_TTTGTCAGTGTAAGTA-1,Healthy,NK,NK IFN stim,1888.0,1296,Adolescent,NP32,NP32,Yoshida,NK IFN stim,Male
CV001_KM9166642-CV001_KM9166650_TTTGTCATCATGTCCC-1,Healthy,Monocyte,Monocyte CD14,1798.0,814,Child,NP31,NP31,Yoshida,Classical monocyte,Male
CV001_KM9166642-CV001_KM9166650_TTTGTCATCGAGGTAG-1,Healthy,T CD8+,T CD8 naive,4407.0,1351,Child,NP31,NP31,Yoshida,T CD8 naive,Male


### Read in Human Cell Atlas Cardiac Leucocytes data set

Data downloaded from [here](https://cellgeni.cog.sanger.ac.uk/heartcellatlas/data/hca_heart_immune_download.h5ad)

In [11]:
heart = sc.read_h5ad('/Users/alex/data/ACM_cardiac_leuco/Reference_data/Heart/hca_heart_immune_download.h5ad')
heart

AnnData object with n_obs × n_vars = 40868 × 33538
    obs: 'NRP', 'age_group', 'cell_source', 'cell_states', 'donor', 'gender', 'n_counts', 'n_genes', 'percent_mito', 'percent_ribo', 'region', 'sample', 'scrublet_score', 'type', 'version', 'scNym', 'scNym_confidence'
    uns: 'cell_states_colors', 'scNym_colors', 'scNym_probabilities'
    obsm: 'X_pca', 'X_scnym', 'X_umap'

In [12]:
del(heart.obs['age_group'])
del(heart.obs['percent_mito'])
del(heart.obs['percent_ribo'])
del(heart.obs['scrublet_score'])
del(heart.obs['type'])
del(heart.obs['version'])

In [13]:
heart.obs

Unnamed: 0,NRP,cell_source,cell_states,donor,gender,n_counts,n_genes,region,sample,scNym,scNym_confidence
AAAGTGAAGTCGGCCT-1-H0015_apex,No,Harvard-Nuclei,CD4+T_cytox,H5,Female,724.717285,588,AX,H0015_apex,CD4+T_cell,0.797180
AAATGGAAGGTCCCTG-1-H0015_apex,No,Harvard-Nuclei,CD4+T_cytox,H5,Female,668.059509,515,AX,H0015_apex,CD4+T_cell,0.999248
AAATGGAGTTGTCTAG-1-H0015_apex,No,Harvard-Nuclei,doublets,H5,Female,670.216309,504,AX,H0015_apex,NK,0.680673
AACAACCGTAATTGGA-1-H0015_apex,No,Harvard-Nuclei,DOCK4+MØ1,H5,Female,730.082947,578,AX,H0015_apex,CD14+Monocyte,0.538159
AAGACTCTCAGGACGA-1-H0015_apex,No,Harvard-Nuclei,Mast,H5,Female,612.323425,428,AX,H0015_apex,Mast,0.990977
...,...,...,...,...,...,...,...,...,...,...,...
TTTGATCGTGTCATGT-1-HCAHeart8102862,Yes,Sanger-CD45,CD4+T_cytox,D11,Female,631.149170,715,AX,HCAHeart8102862,CD8+T_cell,0.756579
TTTGATCGTTCTCCTG-1-HCAHeart8102862,Yes,Sanger-CD45,LYVE1+MØ1,D11,Female,819.040100,2526,AX,HCAHeart8102862,CD8+T_cell,0.269561
TTTGGAGGTCGCTCGA-1-HCAHeart8102862,Yes,Sanger-CD45,MØ_AgP,D11,Female,757.455505,1350,AX,HCAHeart8102862,M3,0.585436
TTTGGTTTCAGTGTTG-1-HCAHeart8102862,Yes,Sanger-CD45,LYVE1+MØ3,D11,Female,815.372131,2507,AX,HCAHeart8102862,MØ,0.968681


In [14]:
X_is_raw(heart)

True

In [15]:
heart.obs['cell_source'].value_counts()

cell_source
Sanger-CD45       19587
Harvard-Nuclei    10536
Sanger-Nuclei      8567
Sanger-Cells       2178
Name: count, dtype: int64

### Merge datasetsa to one reference data set

In [16]:
#reference = pbmc.concat(heart, join = 'inner')
reference = pbmc.concatenate(heart, batch_key = 'cell_source', batch_categories = ['Yoshida', 'Sanger-CD45', 'Harvard-Nuclei', 'Sanger-Nuclei', 'Sanger-Cells'], join = 'inner') 
reference.obs

  reference = pbmc.concatenate(heart, batch_key = 'cell_source', batch_categories = ['Yoshida', 'Sanger-CD45', 'Harvard-Nuclei', 'Sanger-Nuclei', 'Sanger-Cells'], join = 'inner')


Unnamed: 0,COVID_status,annotation_broad,annotation_detailed,n_counts,n_genes,age_group,donor,sample,cell_source,cell_states,gender,NRP,region,scNym,scNym_confidence
CV001_KM10202384-CV001_KM10202394_AAACCTGAGGCAGGTT-1-Yoshida,Healthy,Monocyte,Monocyte CD14,5493.000000,1767,Adult,AN5,AN5,Yoshida,Classical monocyte,Female,,,,
CV001_KM10202384-CV001_KM10202394_AAACCTGAGTGTCCCG-1-Yoshida,Healthy,T CD4+,T CD4 helper,4868.000000,1577,Adult,AN5,AN5,Yoshida,T CD4 helper,Female,,,,
CV001_KM10202384-CV001_KM10202394_AAACCTGCAGATGGGT-1-Yoshida,Healthy,T CD4+,T CD4 helper,3178.000000,1257,Adult,AN3,AN3,Yoshida,T CD4 helper,Male,,,,
CV001_KM10202384-CV001_KM10202394_AAACCTGGTATAGTAG-1-Yoshida,Healthy,T CD8+,T CD8 naive,4745.000000,1477,Adult,AN5,AN5,Yoshida,T CD8 naive,Female,,,,
CV001_KM10202384-CV001_KM10202394_AAACCTGGTGTGCGTC-1-Yoshida,Healthy,T CD4+,T CD4 naive,1902.000000,954,Adult,AN5,AN5,Yoshida,T CD4 naive,Female,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TTTGATCGTGTCATGT-1-HCAHeart8102862-Sanger-CD45,,,,631.149170,715,,D11,HCAHeart8102862,Sanger-CD45,CD4+T_cytox,Female,Yes,AX,CD8+T_cell,0.756579
TTTGATCGTTCTCCTG-1-HCAHeart8102862-Sanger-CD45,,,,819.040100,2526,,D11,HCAHeart8102862,Sanger-CD45,LYVE1+MØ1,Female,Yes,AX,CD8+T_cell,0.269561
TTTGGAGGTCGCTCGA-1-HCAHeart8102862-Sanger-CD45,,,,757.455505,1350,,D11,HCAHeart8102862,Sanger-CD45,MØ_AgP,Female,Yes,AX,M3,0.585436
TTTGGTTTCAGTGTTG-1-HCAHeart8102862-Sanger-CD45,,,,815.372131,2507,,D11,HCAHeart8102862,Sanger-CD45,LYVE1+MØ3,Female,Yes,AX,MØ,0.968681


In [17]:
reference

AnnData object with n_obs × n_vars = 214552 × 33514
    obs: 'COVID_status', 'annotation_broad', 'annotation_detailed', 'n_counts', 'n_genes', 'age_group', 'donor', 'sample', 'cell_source', 'cell_states', 'gender', 'NRP', 'region', 'scNym', 'scNym_confidence'
    var: 'name-Yoshida'

In [18]:
X_is_raw(reference)

True

In [19]:
reference.obs['seed_labels'] = reference.obs['cell_states'].copy()

In [20]:
reference.obs['cell_states'].cat.categories

Index(['AS-DC', 'B invariant', 'B naive', 'B naive IFN stim',
       'B non-switched mem', 'B non-switched mem IFN stim', 'B switched mem',
       'B_cells', 'Basophils & Eosinophils', 'CD4+T_cytox', 'CD4+T_tem',
       'CD8+T_cytox', 'CD8+T_tem', 'CD14+Mo', 'CD16+Mo', 'Classical monocyte',
       'Classical monocyte IFN stim', 'Classical monocyte IL6+', 'Cycling',
       'DC', 'DOCK4+MØ1', 'DOCK4+MØ2', 'Hematopoietic progenitors',
       'Hematopoietic progenitors IFN stim', 'IL17RA+Mo', 'ILC', 'LYVE1+MØ1',
       'LYVE1+MØ2', 'LYVE1+MØ3', 'MAIT', 'Mast', 'Mo_pi', 'MØ_AgP', 'MØ_mod',
       'NK', 'NK CD56 bright', 'NK IFN stim', 'NKT', 'Non-classical monocyte',
       'Non-classical monocyte IFN stim', 'Non-classical monocyte complement+',
       'NØ', 'Plasma cells', 'Plasmablasts', 'Platelets', 'Red blood cells',
       'T CD4 CTL', 'T CD4 helper', 'T CD4 naive', 'T CD4 naive IFN stim',
       'T CD8 CTL', 'T CD8 CTL IFN stim', 'T CD8 central mem',
       'T CD8 effector mem', 'T CD

In [21]:
trans_from = [['T CD4 naive IFN stim', 'T CD4 CTL', 'T CD4 helper','T CD4 naive','CD4+T_cytox', 'CD4+T_tem'],             #CD4
              ['T CD8 CTL IFN stim', 'MAIT', 'T CD8 CTL', 'T CD8 effector mem CD45RA+', 'T CD8 effector mem', 'T CD8 central mem', 'T CD8 naive', 'CD8+T_tem', 'CD8+T_cytox'],             #CD8
              ['T regulatory', 'T gamma/delta'],                                               #Treg
              ['NK IFN stim', 'NK CD56 bright', 'NK'],                                         #NK
              ['NKT'],                                                                         #NKT
              ['ILC'],                                                                         #ILC   
              ['Non-classical monocyte IFN stim', 'Classical monocyte IFN stim', 'Non-classical monocyte complement+', 'Non-classical monocyte', 'Classical monocyte IL6+', 'Classical monocyte', 'CD16+Mo', 'CD14+Mo', 'Mo_pi', 'IL17RA+Mo'],             #Monocytes
              ['DOCK4+MØ1','LYVE1+MØ1', 'LYVE1+MØ2', 'LYVE1+MØ3', 'DOCK4+MØ2', 'MØ_mod', 'MØ_AgP'],                #Macrophages
              ['B non-switched mem IFN stim', 'B naive IFN stim', 'B invariant', 'B switched mem', 'B non-switched mem', 'B naive', 'B_cells'],             #B cells
              ['Mast'],                                                                         #Mast_cells
              ['pDC'],                                                                         #pDCs
              ['cDC2', 'cDC1', 'AS-DC', 'DC'],                                                       #DC
              ['Plasmablasts', 'Plasma cells'],                                                #Plasma cells
              ['Hematopoietic progenitors IFN stim', 'Red blood cells', 'Cycling','Hematopoietic progenitors'],  #Hematopoetic
              ['Platelets'],                                                                   #Platelets
              ['Basophils & Eosinophils'],                                                      #Basophils_Eosinophils
              ['NØ'],                                                                           #Neutrophils
              ['doublets']]                                                                    #Doublets

trans_to = ['CD4+T', 'CD8+T', 'Treg', 'NK', 'NKT', 'ILC','Monocytes', 'Macrophages', 'B_cells', 'Mast_cells', 'pDC', 'DC', 'Plasma_cells', 'Hematopoetic', 'Platelets','Baso_Eosino', 'Neutrophils', 'Doublets']

reference.obs['seed_labels'] = [str(i) for i in reference.obs['cell_states']]
for leiden,celltype in zip(trans_from, trans_to):
    for leiden_from in leiden:
        reference.obs['seed_labels'][reference.obs['seed_labels'] == leiden_from] = celltype

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  reference.obs['seed_labels'][reference.obs['seed_labels'] == leiden_from] = celltype
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a

In [22]:
reference.obs['seed_labels'].value_counts()

seed_labels
CD4+T           53575
CD8+T           42060
Monocytes       35482
B_cells         27327
NK              25499
Macrophages     14519
Treg             6434
DC               2259
NKT              1683
Mast_cells       1543
Hematopoetic     1534
pDC               706
Platelets         626
Doublets          623
Plasma_cells      352
ILC               199
Neutrophils       121
Baso_Eosino        10
Name: count, dtype: int64

In [23]:
X_is_raw(reference)

True

## Save merged object

In [24]:
reference.write("/Users/alex/data/ACM_cardiac_leuco/Reference_data/Merged_healthy_reference_PBMC_Heart_ac240222.raw.h5ad")