### Notebook to create a merge object of diseased and healthy heart LVs.

- **Developed by**: Carlos Talavera-López Ph.D
- **Institute of AI for Health, HelmholtzZentrum münchen**
- v210830

### Load required modules

In [1]:
import anndata
import numpy as np
import pandas as pd
import scanpy as sc

### Set up working environment

In [2]:
sc.settings.verbosity = 3
sc.logging.print_versions()
sc.settings.set_figure_params(dpi = 200, color_map = 'RdPu', dpi_save = 300, vector_friendly = True, format = 'svg')

The `sinfo` package has changed name and is now called `session_info` to become more discoverable and self-explanatory. The `sinfo` PyPI package will be kept around to avoid breaking old installs and you can downgrade to 0.3.2 if you want to use it without seeing this message. For the latest features and bug fixes, please install `session_info` instead. The usage and defaults also changed slightly, so please review the latest README at https://gitlab.com/joelostblom/session_info.
-----
anndata     0.7.6
scanpy      1.8.1
sinfo       0.3.4
-----
PIL                 8.2.0
anyio               NA
appnope             0.1.2
attr                20.3.0
babel               2.9.0
backcall            0.2.0
bottleneck          1.3.2
brotli              NA
cairo               1.20.0
certifi             2020.12.05
cffi                1.14.5
chardet             4.0.0
cloudpickle         1.6.0
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
cytoolz             0.11.0
dask  

### Read in healthy heart

In [3]:
healthy_heart = sc.read_h5ad('/Volumes/Bf110/ct5/raw_data/single_cell/heart/hca_heart_global_ctl210226.h5ad') 
healthy_heart

AnnData object with n_obs × n_vars = 452506 × 22260
    obs: 'NRP', 'age_group', 'cell_source', 'cell_type', 'donor', 'gender', 'n_counts', 'n_genes', 'percent_mito', 'percent_ribo', 'region', 'sample', 'scrublet_score', 'source', 'type', 'version', 'cell_states', 'Used', 'Cells_Nuclei', 'combined'
    var: 'gene_ids-Harvard-Nuclei-full', 'feature_types-Harvard-Nuclei-full', 'gene_ids-Sanger-Nuclei-full', 'feature_types-Sanger-Nuclei-full', 'gene_ids-Sanger-Cells-full', 'feature_types-Sanger-Cells-full', 'gene_ids-Sanger-CD45-full', 'feature_types-Sanger-CD45-full', 'n_cells-myeloid', 'n_counts-myeloid'

### Select only Left Ventricle (LV)

In [4]:
healthy_LV = healthy_heart[healthy_heart.obs['region'].isin(['LV'])]
healthy_LV

View of AnnData object with n_obs × n_vars = 99487 × 22260
    obs: 'NRP', 'age_group', 'cell_source', 'cell_type', 'donor', 'gender', 'n_counts', 'n_genes', 'percent_mito', 'percent_ribo', 'region', 'sample', 'scrublet_score', 'source', 'type', 'version', 'cell_states', 'Used', 'Cells_Nuclei', 'combined'
    var: 'gene_ids-Harvard-Nuclei-full', 'feature_types-Harvard-Nuclei-full', 'gene_ids-Sanger-Nuclei-full', 'feature_types-Sanger-Nuclei-full', 'gene_ids-Sanger-Cells-full', 'feature_types-Sanger-Cells-full', 'gene_ids-Sanger-CD45-full', 'feature_types-Sanger-CD45-full', 'n_cells-myeloid', 'n_counts-myeloid'

### Select only nuclei

In [5]:
healthy_LV_sn = healthy_LV[healthy_LV.obs['cell_source'].isin(['Sanger-Nuclei', 'Harvard-Nuclei'])]
healthy_LV_sn

View of AnnData object with n_obs × n_vars = 82806 × 22260
    obs: 'NRP', 'age_group', 'cell_source', 'cell_type', 'donor', 'gender', 'n_counts', 'n_genes', 'percent_mito', 'percent_ribo', 'region', 'sample', 'scrublet_score', 'source', 'type', 'version', 'cell_states', 'Used', 'Cells_Nuclei', 'combined'
    var: 'gene_ids-Harvard-Nuclei-full', 'feature_types-Harvard-Nuclei-full', 'gene_ids-Sanger-Nuclei-full', 'feature_types-Sanger-Nuclei-full', 'gene_ids-Sanger-Cells-full', 'feature_types-Sanger-Cells-full', 'gene_ids-Sanger-CD45-full', 'feature_types-Sanger-CD45-full', 'n_cells-myeloid', 'n_counts-myeloid'

### Read in damaged heart

In [6]:
damaged_heart = sc.read_h5ad('/Volumes/Bf110/ct5/raw_data/single_cell/heart/Heart_iCell8_GSE121893_HF_ctl200512.RAW.h5ad') 
damaged_heart



AnnData object with n_obs × n_vars = 4933 × 25742
    obs: ' ≈ß≈çID', 'Barcode', 'Type', 'Individual', 'Age', 'Gender', 'Dispense.Order', 'X384.Well.Plate.Location', 'Chip.Row.ID', 'Chip.Column.ID', 'Image.ID', 'Barcode.Read.Pairs', 'Distinct.UMIs', 'ERCC.Read.Pairs', 'Trimmed.Read.Pairs', 'NoContam.Read.Pairs', 'Mitochondria.Alignments', 'Mitochondria.Read.Pairs', 'Total.Barcode.Alignments', 'Distinct.Genes.w..Alignments', 'Distinct.Gene.UMI.Combos', 'Aligned', 'Assigned', 'Ambiguity', 'Chimera', 'Duplicate', 'FragementLength', 'MappingQuality', 'MultiMapping', 'NoFeatures', 'Nonjunction', 'Secondary', 'Unmapped', 'mito.perc', 'CellType'

In [7]:
damaged_heart.obs['Type'].cat.categories

Index(['HF_LA_CM', 'HF_LA_NCM', 'HF_LV_CM', 'HF_LV_NCM', 'N_LA_CM', 'N_LA_NCM',
       'N_LV_CM', 'N_LV_NCM'],
      dtype='object')

In [8]:
damaged_LV = damaged_heart[damaged_heart.obs['Type'].isin(['HF_LV_CM', 'HF_LV_NCM','N_LV_CM', 'N_LV_NCM'])]
damaged_LV

View of AnnData object with n_obs × n_vars = 1942 × 25742
    obs: ' ≈ß≈çID', 'Barcode', 'Type', 'Individual', 'Age', 'Gender', 'Dispense.Order', 'X384.Well.Plate.Location', 'Chip.Row.ID', 'Chip.Column.ID', 'Image.ID', 'Barcode.Read.Pairs', 'Distinct.UMIs', 'ERCC.Read.Pairs', 'Trimmed.Read.Pairs', 'NoContam.Read.Pairs', 'Mitochondria.Alignments', 'Mitochondria.Read.Pairs', 'Total.Barcode.Alignments', 'Distinct.Genes.w..Alignments', 'Distinct.Gene.UMI.Combos', 'Aligned', 'Assigned', 'Ambiguity', 'Chimera', 'Duplicate', 'FragementLength', 'MappingQuality', 'MultiMapping', 'NoFeatures', 'Nonjunction', 'Secondary', 'Unmapped', 'mito.perc', 'CellType'

### Standardise labels for both objects

In [11]:
damaged_LV.obs['Individual'].cat.categories

Index(['C1', 'C2', 'D1', 'D2', 'D4', 'D5', 'N13', 'N14'], dtype='object')

In [None]:
damaged_LV.obs['cell_states'] = damaged_LV.obs['CellType']
damaged_LV.obs['donor'] = damaged_LV.obs['Individual']
damaged_LV.obs['cell_source'] = damaged_LV.obs['iCell8']

### Clean up labels

In [12]:
del(damaged_LV.obs[' ≈ß≈çID'])
del(damaged_LV.obs['Dispense.Order'])
del(damaged_LV.obs['X384.Well.Plate.Location'])
del(damaged_LV.obs['Chip.Row.ID'])
del(damaged_LV.obs['Chip.Column.ID'])
del(damaged_LV.obs['Image.ID'])
del(damaged_LV.obs['Barcode.Read.Pairs'])
del(damaged_LV.obs['Distinct.UMIs'])
del(damaged_LV.obs['ERCC.Read.Pairs'])
del(damaged_LV.obs['Trimmed.Read.Pairs'])
del(damaged_LV.obs['NoContam.Read.Pairs'])
del(damaged_LV.obs['Mitochondria.Alignments'])
del(damaged_LV.obs['Mitochondria.Read.Pairs'])
del(damaged_LV.obs['Total.Barcode.Alignments'])
del(damaged_LV.obs['Distinct.Genes.w..Alignments'])
del(damaged_LV.obs['Distinct.Gene.UMI.Combos'])
del(damaged_LV.obs['Aligned'])
del(damaged_LV.obs['Assigned'])
del(damaged_LV.obs['Ambiguity'])
del(damaged_LV.obs['Chimera'])
del(damaged_LV.obs['Duplicate'])
del(damaged_LV.obs['FragementLength'])
del(damaged_LV.obs['MappingQuality'])
del(damaged_LV.obs['MultiMapping'])
del(damaged_LV.obs['NoFeatures'])
del(damaged_LV.obs['Nonjunction'])
del(damaged_LV.obs['Secondary'])
del(damaged_LV.obs['Unmapped'])
del(damaged_LV.obs['mito.perc'])
del(damaged_LV.obs['CellType'])
damaged_LV

View of AnnData object with n_obs × n_vars = 1942 × 25742
    obs: 'Barcode', 'Type', 'Individual', 'Age', 'Gender'

### Merge both datasets

In [13]:
heart = healthy_LV_sn.concatenate(damaged_LV, batch_key = 'state', batch_categories = ['healthy', 'damaged'], join = 'inner')
heart

AnnData object with n_obs × n_vars = 84748 × 15224
    obs: 'NRP', 'age_group', 'cell_source', 'cell_type', 'donor', 'gender', 'n_counts', 'n_genes', 'percent_mito', 'percent_ribo', 'region', 'sample', 'scrublet_score', 'source', 'type', 'version', 'cell_states', 'Used', 'Cells_Nuclei', 'combined', 'Barcode', 'Type', 'Individual', 'Age', 'Gender', 'state'
    var: 'gene_ids-Harvard-Nuclei-full-healthy', 'feature_types-Harvard-Nuclei-full-healthy', 'gene_ids-Sanger-Nuclei-full-healthy', 'feature_types-Sanger-Nuclei-full-healthy', 'gene_ids-Sanger-Cells-full-healthy', 'feature_types-Sanger-Cells-full-healthy', 'gene_ids-Sanger-CD45-full-healthy', 'feature_types-Sanger-CD45-full-healthy', 'n_cells-myeloid-healthy', 'n_counts-myeloid-healthy'

### Create combined label

In [None]:
heart.obs['combined'] = [str(heart.obs.loc[i,'donor']) + str(heart.obs.loc[i,'Version']) for i in new_dataset.obs_names]
heart

### Export merged object

In [18]:
heart.write('/Users/carlos.lopez/INBOX/heart/heart_LV.10Xsn-iCell8.healthy-diseased.ctl210830.raw.h5ad')

... storing 'NRP' as categorical
... storing 'age_group' as categorical
... storing 'cell_source' as categorical
... storing 'cell_type' as categorical
... storing 'donor' as categorical
... storing 'gender' as categorical
... storing 'region' as categorical
... storing 'sample' as categorical
... storing 'source' as categorical
... storing 'type' as categorical
... storing 'version' as categorical
... storing 'cell_states' as categorical
... storing 'Used' as categorical
... storing 'Cells_Nuclei' as categorical
... storing 'combined' as categorical
... storing ' ≈ß≈çID' as categorical
... storing 'Barcode' as categorical
... storing 'Type' as categorical
... storing 'Individual' as categorical
... storing 'Gender' as categorical
... storing 'X384.Well.Plate.Location' as categorical
... storing 'CellType' as categorical
