# Notebook to download Human heart data from 'CellxGene'  

**Developed by** :Srivalli Kolla

**Created on** : 23 October , 2024

**Last modified** : 23 October, 2024

**Würzburg Institute for Systems Immunology & Julius-Maximilian-Universität Würzburg**

Link : https://cellxgene.cziscience.com/datasets

Env : cellxgene (Python 3.10)

# Importing packages

In [2]:
import cellxgene_census
import numpy as np
import pandas as pd
import scanpy as sc
import os
from datetime import datetime

timestamp = datetime.now().strftime("%d_%m_%Y")

# Data import and overview

## Query to check the census about dataset required

In [3]:
census = cellxgene_census.open_soma()
summary_table = census["census_info"]["summary_cell_counts"].read().concat().to_pandas()

summary_table.query("organism == 'Homo sapiens' & category == 'tissue_general' & label =='heart'")

The "stable" release is currently 2024-07-01. Specify 'census_version="2024-07-01"' in future calls to open_soma() to ensure data consistency.


Unnamed: 0,soma_joinid,organism,category,label,ontology_term_id,total_cell_count,unique_cell_count
1147,1147,Homo sapiens,tissue_general,heart,UBERON:0000948,3629952,1559974


## Loading data from census

In [4]:
heart_obs =  cellxgene_census.get_obs(
    census, "homo_sapiens", value_filter="tissue_general == 'heart' and is_primary_data == True" )
heart_obs

Unnamed: 0,soma_joinid,dataset_id,assay,assay_ontology_term_id,cell_type,cell_type_ontology_term_id,development_stage,development_stage_ontology_term_id,disease,disease_ontology_term_id,...,tissue,tissue_ontology_term_id,tissue_type,tissue_general,tissue_general_ontology_term_id,raw_sum,nnz,raw_mean_nnz,raw_variance_nnz,n_measured_vars
0,1702855,f15e263b-6544-46cb-a46e-e33ab7ce8347,10x 3' v3,EFO:0009922,immature innate lymphoid cell,CL:0001082,61-year-old human stage,HsapDv:0000155,myocardial infarction,MONDO:0005068,...,heart left ventricle,UBERON:0002084,tissue,heart,UBERON:0000948,4190.0,2093,2.001911,12.914910,20691
1,1702856,f15e263b-6544-46cb-a46e-e33ab7ce8347,10x 3' v3,EFO:0009922,fibroblast of cardiac tissue,CL:0002548,61-year-old human stage,HsapDv:0000155,myocardial infarction,MONDO:0005068,...,heart left ventricle,UBERON:0002084,tissue,heart,UBERON:0000948,6506.0,2750,2.365818,41.999268,20691
2,1702857,f15e263b-6544-46cb-a46e-e33ab7ce8347,10x 3' v3,EFO:0009922,fibroblast of cardiac tissue,CL:0002548,61-year-old human stage,HsapDv:0000155,myocardial infarction,MONDO:0005068,...,heart left ventricle,UBERON:0002084,tissue,heart,UBERON:0000948,2462.0,1420,1.733803,23.365312,20691
3,1702858,f15e263b-6544-46cb-a46e-e33ab7ce8347,10x 3' v3,EFO:0009922,immature innate lymphoid cell,CL:0001082,61-year-old human stage,HsapDv:0000155,myocardial infarction,MONDO:0005068,...,heart left ventricle,UBERON:0002084,tissue,heart,UBERON:0000948,1524.0,1067,1.428304,3.061224,20691
4,1702859,f15e263b-6544-46cb-a46e-e33ab7ce8347,10x 3' v3,EFO:0009922,cardiac muscle myoblast,CL:0000513,61-year-old human stage,HsapDv:0000155,myocardial infarction,MONDO:0005068,...,heart left ventricle,UBERON:0002084,tissue,heart,UBERON:0000948,6199.0,2001,3.097951,387.196401,20691
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1559969,66344455,f7c1c579-2dc0-47e2-ba19-8165c5a0e353,sci-RNA-seq,EFO:0010550,cardiac muscle cell,CL:0000746,12th week post-fertilization human stage,HsapDv:0000049,normal,PATO:0000461,...,heart,UBERON:0000948,tissue,heart,UBERON:0000948,633.0,485,1.305155,1.369498,44067
1559970,66344456,f7c1c579-2dc0-47e2-ba19-8165c5a0e353,sci-RNA-seq,EFO:0010550,cardiac muscle cell,CL:0000746,12th week post-fertilization human stage,HsapDv:0000049,normal,PATO:0000461,...,heart,UBERON:0000948,tissue,heart,UBERON:0000948,502.0,349,1.438395,1.953809,44067
1559971,66344457,f7c1c579-2dc0-47e2-ba19-8165c5a0e353,sci-RNA-seq,EFO:0010550,cardiac muscle cell,CL:0000746,12th week post-fertilization human stage,HsapDv:0000049,normal,PATO:0000461,...,heart,UBERON:0000948,tissue,heart,UBERON:0000948,459.0,364,1.260989,0.964755,44067
1559972,66344458,f7c1c579-2dc0-47e2-ba19-8165c5a0e353,sci-RNA-seq,EFO:0010550,endothelial cell of vascular tree,CL:0002139,12th week post-fertilization human stage,HsapDv:0000049,normal,PATO:0000461,...,heart,UBERON:0000948,tissue,heart,UBERON:0000948,247.0,220,1.122727,0.190349,44067


## Data overview

In [5]:
heart_obs['soma_joinid'].value_counts()

soma_joinid
66344459    1
1702855     1
1702856     1
1702857     1
1702858     1
           ..
1702864     1
1702865     1
1702866     1
1702867     1
1702868     1
Name: count, Length: 1559974, dtype: int64

In [6]:
heart_obs['dataset_id'].value_counts()

dataset_id
65badd7a-9262-4fd1-9ce2-eb5dc0ca8039    665955
d4e69e01-3ba2-4d6b-a15d-e7048f78f22e    486134
1c739a3e-c3f5-49d5-98e0-73975e751201    191795
f7c1c579-2dc0-47e2-ba19-8165c5a0e353    101749
4ed927e9-c099-49af-b8ce-a2652d069333     36574
                                         ...  
576f193c-75d0-4a11-bd25-8676587e6dc2         0
5829c7ba-697f-418e-8b98-d605b192dc48         0
58679288-9ecc-4647-9781-12a3a8f8c6fd         0
58c43cc2-e00e-43c4-94eb-8501369264e1         0
53bc5729-6202-4351-bc99-1f36139e9dc4         0
Name: count, Length: 678, dtype: int64

In [7]:
heart_obs['dataset_id'].unique()

['f15e263b-6544-46cb-a46e-e33ab7ce8347', '5500c673-1610-40a0-86d9-64d987ae50e6', '2adb1f8a-a6b1-4909-8ee8-484814e2d4bf', '4ed927e9-c099-49af-b8ce-a2652d069333', '1c739a3e-c3f5-49d5-98e0-73975e751201', 'd4e69e01-3ba2-4d6b-a15d-e7048f78f22e', 'd567b692-c374-4628-a508-8008f6778f22', '65badd7a-9262-4fd1-9ce2-eb5dc0ca8039', '53d208b0-2cfd-4366-9866-c3c6114081bc', 'f7c1c579-2dc0-47e2-ba19-8165c5a0e353']
Categories (678, object): ['0041b9c3-6a49-4bf7-8514-9bc7190067a7', '00476f9f-ebc1-4b72-b541-32f912ce36ea', '00e5dedd-b9b7-43be-8c28-b0e5c6414a62', '00ff600e-6e2e-4d76-846f-0eec4f0ae417', ..., 'fe4b89d5-461e-440c-a5a8-621b37b122c0', 'fe52003e-1460-4a65-a213-2bb1a508332f', 'ff45e623-7f5f-46e3-b47d-56be0341f66b', 'ff7d15fa-f4b6-4a0e-992e-fd0c9d088ded']

In [8]:
heart_obs['assay'].unique()

['10x 3' v3', '10x 3' v2', 'microwell-seq', 'Smart-seq2', 'sci-RNA-seq']
Categories (24, object): ['10x 3' transcription profiling', '10x 3' v1', '10x 3' v2', '10x 3' v3', ..., 'TruDrop', 'inDrop', 'microwell-seq', 'sci-RNA-seq']

In [9]:
heart_obs['disease'].unique()

['myocardial infarction', 'normal', 'dilated cardiomyopathy', 'arrhythmogenic right ventricular cardiomyopathy', 'non-compaction cardiomyopathy']
Categories (109, object): ['Alzheimer disease', 'B-cell acute lymphoblastic leukemia', 'B-cell non-Hodgkin lymphoma', 'Barrett esophagus', ..., 'tubular adenoma', 'tubulovillous adenoma', 'type 1 diabetes mellitus', 'type 2 diabetes mellitus']

In [10]:
heart_obs[["suspension_type"]].value_counts()

suspension_type
nucleus            1375447
cell                184527
Name: count, dtype: int64

In [11]:
heart_obs[["cell_type"]].value_counts()

cell_type                              
cardiac muscle cell                        284321
fibroblast of cardiac tissue               170070
mural cell                                 128390
regular ventricular cardiac myocyte        125289
endothelial cell                           118837
                                            ...  
enteroendocrine cell of colon                   0
enteroendocrine cell of small intestine         0
enucleate erythrocyte                           0
enucleated reticulocyte                         0
enteric smooth muscle cell                      0
Name: count, Length: 698, dtype: int64

In [12]:
heart_obs[["tissue"]].value_counts()

tissue                 
heart left ventricle       629658
heart right ventricle      262024
interventricular septum    259436
heart                      112532
apex of heart              110365
                            ...  
gonad                           0
gingiva                         0
gastrocnemius                   0
ganglionic eminence             0
hindgut                         0
Name: count, Length: 267, dtype: int64

In [13]:
heart_obs[["self_reported_ethnicity"]].value_counts()

self_reported_ethnicity                                             
European                                                                1336016
unknown                                                                  169212
Asian                                                                     32458
Hispanic or Latin American                                                11505
Han Chinese                                                               10783
African American or Afro-Caribbean                                            0
American                                                                      0
British                                                                       0
Bangladeshi                                                                   0
East Asian                                                                    0
Chinese                                                                       0
European American                                  

In [14]:
heart_obs[["donor_id"]].value_counts()

donor_id             
D6                       79650
D11                      48931
D2                       43143
H5                       38401
D7                       36134
                         ...  
Control_Participant2         0
Control_Participant19        0
Control_Participant18        0
Control_Participant17        0
Control_Participant9         0
Name: count, Length: 6603, dtype: int64

# Adding dataset information

## Collection of dataset information

In [15]:
census_datasets = (
    census["census_info"]["datasets"]
    .read(column_names=["collection_name", "dataset_title", "dataset_id", "soma_joinid"])
    .concat()
    .to_pandas()
)
census_datasets = census_datasets.set_index("dataset_id")
census_datasets

Unnamed: 0_level_0,collection_name,dataset_title,soma_joinid
dataset_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0895c838-e550-48a3-a777-dbcd35d30272,"Single-Cell, Single-Nucleus, and Spatial RNA S...",Healthy human liver: B cells,0
00ff600e-6e2e-4d76-846f-0eec4f0ae417,Single-cell analysis of human B cell maturatio...,Human tonsil nonlymphoid cells scRNA,1
bdacc907-7c26-419f-8808-969eab3ca2e8,Molecular characterization of selectively vuln...,Molecular characterization of selectively vuln...,2
a5d95a42-0137-496f-8a60-101e17f263c8,Single-cell Atlas of common variable immunodef...,Steady-state B cells - scRNA-seq,3
d3566d6a-a455-4a15-980f-45eb29114cab,Single-cell proteo-genomic reference maps of t...,blood and bone marrow from a healthy young donor,4
...,...,...,...
0bce33ed-455c-4e12-93f8-b7b04a2de4a1,A single-cell transcriptional timelapse of mou...,Whole dataset: Normalized subset 2,807
c2876b1b-06d8-4d96-a56b-5304f815b99a,SEA-AD: Seattle Alzheimer’s Disease Brain Cell...,Whole Taxonomy - MTG: Seattle Alzheimer's Dise...,808
6f7fd0f1-a2ed-4ff1-80d3-33dde731cbc3,SEA-AD: Seattle Alzheimer’s Disease Brain Cell...,Whole Taxonomy - DLPFC: Seattle Alzheimer's Di...,809
dcfa2614-7ca7-4d82-814c-350626eccb26,A single-cell transcriptional timelapse of mou...,Major cell cluster: Mesoderm,810


## Addition of dataset information to our heart data

In [16]:
dataset_cell_counts = pd.DataFrame(heart_obs[["dataset_id"]].value_counts())
dataset_cell_counts = dataset_cell_counts.rename(columns={0: "cell_counts"})
dataset_cell_counts = dataset_cell_counts.merge(census_datasets, on="dataset_id")

dataset_cell_counts

Unnamed: 0_level_0,count,collection_name,dataset_title,soma_joinid
dataset_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
65badd7a-9262-4fd1-9ce2-eb5dc0ca8039,665955,Pathogenic variants damage cell composition an...,DCM/ACM heart cell atlas: All cells,776
d4e69e01-3ba2-4d6b-a15d-e7048f78f22e,486134,Cells of the adult human heart,All — Cells of the adult human heart,719
1c739a3e-c3f5-49d5-98e0-73975e751201,191795,Spatial multi-omic map of human myocardial inf...,All-snRNA-Spatial multi-omic map of human myoc...,674
f7c1c579-2dc0-47e2-ba19-8165c5a0e353,101749,A human cell atlas of fetal gene expression,Survey of human embryonic development,799
4ed927e9-c099-49af-b8ce-a2652d069333,36574,Single-nucleus cross-tissue molecular referenc...,Single-nucleus cross-tissue molecular referenc...,644
...,...,...,...,...
576f193c-75d0-4a11-bd25-8676587e6dc2,0,HTAN MSK - Single cell profiling reveals novel...,Combined samples,650
5829c7ba-697f-418e-8b98-d605b192dc48,0,Single-cell genomic profiling of human dopamin...,Human Oligodendrocytes 10x scRNA-seq,616
58679288-9ecc-4647-9781-12a3a8f8c6fd,0,Spatiotemporal analysis of human intestinal de...,Spatiotemporal analysis of human intestinal de...,55
58c43cc2-e00e-43c4-94eb-8501369264e1,0,SEA-AD: Seattle Alzheimer’s Disease Brain Cell...,OPC - DLPFC: Seattle Alzheimer's Disease Atlas...,482


# Adding gene information

## Collection of gene information for all Human

In [17]:
all_var = cellxgene_census.get_var(census, "homo_sapiens")
all_var

Unnamed: 0,soma_joinid,feature_id,feature_name,feature_length,nnz,n_measured_obs
0,0,ENSG00000000003,TSPAN6,4530,4530448,73855064
1,1,ENSG00000000005,TNMD,1476,236059,61201828
2,2,ENSG00000000419,DPM1,9276,17576462,74159149
3,3,ENSG00000000457,SCYL3,6883,9117322,73988868
4,4,ENSG00000000460,C1orf112,5970,6287794,73636201
...,...,...,...,...,...,...
60525,60525,ENSG00000288718,ENSG00000288718.1,1070,4,1248980
60526,60526,ENSG00000288719,ENSG00000288719.1,4252,2826,1248980
60527,60527,ENSG00000288724,ENSG00000288724.1,625,36,1248980
60528,60528,ENSG00000290791,ENSG00000290791.1,3612,1642,43485


## Collection of gene information specific to our dataset based on 'soma_joinid'

In [18]:
presence_matrix = cellxgene_census.get_presence_matrix(census, "Homo sapiens", "RNA")
presence_matrix = presence_matrix[dataset_cell_counts.soma_joinid, :]

In [19]:
presence_matrix.sum(axis=1).A1

array([30731, 31222, 28941, 44067, 29553, 27408, 20691, 56926, 26392,
       30996, 27705, 28599, 27393, 25576, 25246, 33741,   458, 46621,
       17538, 22372, 14446, 28244, 31312, 35385, 26265, 25431, 24646,
       55880, 40956, 39447, 28418, 26595, 43001, 35061, 23079, 33599,
       24981, 38618, 30716, 26588, 17374, 20191, 25680, 15407, 23659,
       40226, 42852, 25546, 30357, 30469, 33936, 46139, 35899, 22177,
       43525, 21194, 25315, 22779, 22392, 45463, 41821, 25295, 38541,
       56350, 30800, 33212, 24647, 35489, 33945, 25983, 27074, 36616,
       40773, 22288, 23658, 23172, 25022, 28588, 26187, 15979, 37854,
       23798, 45369, 25311, 33596, 26418, 21007, 34601, 21196, 19550,
       42410, 17104, 45530, 26624, 26365, 23586, 32622, 48704, 34480,
       46983, 27807, 52115, 34468, 34052, 28186, 24965, 42094, 20267,
       21908, 55818, 34078, 22643, 45950, 49063, 20901, 32391, 20982,
       51489, 25342,  3420, 41464, 29497, 18252, 42549, 35468, 32121,
       41775, 23493,

In [20]:
genes_measured = presence_matrix.sum(axis=1).A1
dataset_cell_counts["genes_measured"] = genes_measured
dataset_cell_counts

Unnamed: 0_level_0,count,collection_name,dataset_title,soma_joinid,genes_measured
dataset_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
65badd7a-9262-4fd1-9ce2-eb5dc0ca8039,665955,Pathogenic variants damage cell composition an...,DCM/ACM heart cell atlas: All cells,776,30731
d4e69e01-3ba2-4d6b-a15d-e7048f78f22e,486134,Cells of the adult human heart,All — Cells of the adult human heart,719,31222
1c739a3e-c3f5-49d5-98e0-73975e751201,191795,Spatial multi-omic map of human myocardial inf...,All-snRNA-Spatial multi-omic map of human myoc...,674,28941
f7c1c579-2dc0-47e2-ba19-8165c5a0e353,101749,A human cell atlas of fetal gene expression,Survey of human embryonic development,799,44067
4ed927e9-c099-49af-b8ce-a2652d069333,36574,Single-nucleus cross-tissue molecular referenc...,Single-nucleus cross-tissue molecular referenc...,644,29553
...,...,...,...,...,...
576f193c-75d0-4a11-bd25-8676587e6dc2,0,HTAN MSK - Single cell profiling reveals novel...,Combined samples,650,22397
5829c7ba-697f-418e-8b98-d605b192dc48,0,Single-cell genomic profiling of human dopamin...,Human Oligodendrocytes 10x scRNA-seq,616,31118
58679288-9ecc-4647-9781-12a3a8f8c6fd,0,Spatiotemporal analysis of human intestinal de...,Spatiotemporal analysis of human intestinal de...,55,20649
58c43cc2-e00e-43c4-94eb-8501369264e1,0,SEA-AD: Seattle Alzheimer’s Disease Brain Cell...,OPC - DLPFC: Seattle Alzheimer's Disease Atlas...,482,33253


## Filtering genes matching our soma_joinid

In [21]:
heart_var =  all_var[all_var['soma_joinid'].isin(dataset_cell_counts['soma_joinid'])]
heart_var

Unnamed: 0,soma_joinid,feature_id,feature_name,feature_length,nnz,n_measured_obs
0,0,ENSG00000000003,TSPAN6,4530,4530448,73855064
1,1,ENSG00000000005,TNMD,1476,236059,61201828
2,2,ENSG00000000419,DPM1,9276,17576462,74159149
3,3,ENSG00000000457,SCYL3,6883,9117322,73988868
4,4,ENSG00000000460,C1orf112,5970,6287794,73636201
...,...,...,...,...,...,...
799,799,ENSG00000059728,MXD1,6351,16245940,74218366
801,801,ENSG00000059769,DNAJC25,2604,7161442,71248640
803,803,ENSG00000059915,PSD,5979,11487910,73984771
808,808,ENSG00000060339,CCAR1,6059,25400719,73988868


# Fetching all human heart data from the Census

In [22]:
heart_sample_ids = heart_obs["soma_joinid"].to_numpy()
heart_gene_ids = heart_var["soma_joinid"].to_numpy()

In [23]:
heart_adata = cellxgene_census.get_anndata(
    census,
    organism="Homo sapiens",
    obs_coords=heart_sample_ids,
    var_coords=heart_gene_ids  
)

heart_adata

AnnData object with n_obs × n_vars = 1559974 × 678
    obs: 'soma_joinid', 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'observation_joinid', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_type', 'tissue_general', 'tissue_general_ontology_term_id', 'raw_sum', 'nnz', 'raw_mean_nnz', 'raw_variance_nnz', 'n_measured_vars'
    var: 'soma_joinid', 'feature_id', 'feature_name', 'feature_length', 'nnz', 'n_measured_obs'

In [24]:
heart_adata.obs

Unnamed: 0,soma_joinid,dataset_id,assay,assay_ontology_term_id,cell_type,cell_type_ontology_term_id,development_stage,development_stage_ontology_term_id,disease,disease_ontology_term_id,...,tissue,tissue_ontology_term_id,tissue_type,tissue_general,tissue_general_ontology_term_id,raw_sum,nnz,raw_mean_nnz,raw_variance_nnz,n_measured_vars
0,1702855,f15e263b-6544-46cb-a46e-e33ab7ce8347,10x 3' v3,EFO:0009922,immature innate lymphoid cell,CL:0001082,61-year-old human stage,HsapDv:0000155,myocardial infarction,MONDO:0005068,...,heart left ventricle,UBERON:0002084,tissue,heart,UBERON:0000948,4190.0,2093,2.001911,12.914910,20691
1,1702856,f15e263b-6544-46cb-a46e-e33ab7ce8347,10x 3' v3,EFO:0009922,fibroblast of cardiac tissue,CL:0002548,61-year-old human stage,HsapDv:0000155,myocardial infarction,MONDO:0005068,...,heart left ventricle,UBERON:0002084,tissue,heart,UBERON:0000948,6506.0,2750,2.365818,41.999268,20691
2,1702857,f15e263b-6544-46cb-a46e-e33ab7ce8347,10x 3' v3,EFO:0009922,fibroblast of cardiac tissue,CL:0002548,61-year-old human stage,HsapDv:0000155,myocardial infarction,MONDO:0005068,...,heart left ventricle,UBERON:0002084,tissue,heart,UBERON:0000948,2462.0,1420,1.733803,23.365312,20691
3,1702858,f15e263b-6544-46cb-a46e-e33ab7ce8347,10x 3' v3,EFO:0009922,immature innate lymphoid cell,CL:0001082,61-year-old human stage,HsapDv:0000155,myocardial infarction,MONDO:0005068,...,heart left ventricle,UBERON:0002084,tissue,heart,UBERON:0000948,1524.0,1067,1.428304,3.061224,20691
4,1702859,f15e263b-6544-46cb-a46e-e33ab7ce8347,10x 3' v3,EFO:0009922,cardiac muscle myoblast,CL:0000513,61-year-old human stage,HsapDv:0000155,myocardial infarction,MONDO:0005068,...,heart left ventricle,UBERON:0002084,tissue,heart,UBERON:0000948,6199.0,2001,3.097951,387.196401,20691
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1559969,66344455,f7c1c579-2dc0-47e2-ba19-8165c5a0e353,sci-RNA-seq,EFO:0010550,cardiac muscle cell,CL:0000746,12th week post-fertilization human stage,HsapDv:0000049,normal,PATO:0000461,...,heart,UBERON:0000948,tissue,heart,UBERON:0000948,633.0,485,1.305155,1.369498,44067
1559970,66344456,f7c1c579-2dc0-47e2-ba19-8165c5a0e353,sci-RNA-seq,EFO:0010550,cardiac muscle cell,CL:0000746,12th week post-fertilization human stage,HsapDv:0000049,normal,PATO:0000461,...,heart,UBERON:0000948,tissue,heart,UBERON:0000948,502.0,349,1.438395,1.953809,44067
1559971,66344457,f7c1c579-2dc0-47e2-ba19-8165c5a0e353,sci-RNA-seq,EFO:0010550,cardiac muscle cell,CL:0000746,12th week post-fertilization human stage,HsapDv:0000049,normal,PATO:0000461,...,heart,UBERON:0000948,tissue,heart,UBERON:0000948,459.0,364,1.260989,0.964755,44067
1559972,66344458,f7c1c579-2dc0-47e2-ba19-8165c5a0e353,sci-RNA-seq,EFO:0010550,endothelial cell of vascular tree,CL:0002139,12th week post-fertilization human stage,HsapDv:0000049,normal,PATO:0000461,...,heart,UBERON:0000948,tissue,heart,UBERON:0000948,247.0,220,1.122727,0.190349,44067


In [25]:
heart_adata.var

Unnamed: 0,soma_joinid,feature_id,feature_name,feature_length,nnz,n_measured_obs
0,0,ENSG00000000003,TSPAN6,4530,4530448,73855064
1,1,ENSG00000000005,TNMD,1476,236059,61201828
2,2,ENSG00000000419,DPM1,9276,17576462,74159149
3,3,ENSG00000000457,SCYL3,6883,9117322,73988868
4,4,ENSG00000000460,C1orf112,5970,6287794,73636201
...,...,...,...,...,...,...
673,799,ENSG00000059728,MXD1,6351,16245940,74218366
674,801,ENSG00000059769,DNAJC25,2604,7161442,71248640
675,803,ENSG00000059915,PSD,5979,11487910,73984771
676,808,ENSG00000060339,CCAR1,6059,25400719,73988868


In [26]:
filtered_data = dataset_cell_counts[dataset_cell_counts.index.isin(heart_adata.obs['dataset_id'])]

filtered_data

Unnamed: 0_level_0,count,collection_name,dataset_title,soma_joinid,genes_measured
dataset_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
65badd7a-9262-4fd1-9ce2-eb5dc0ca8039,665955,Pathogenic variants damage cell composition an...,DCM/ACM heart cell atlas: All cells,776,30731
d4e69e01-3ba2-4d6b-a15d-e7048f78f22e,486134,Cells of the adult human heart,All — Cells of the adult human heart,719,31222
1c739a3e-c3f5-49d5-98e0-73975e751201,191795,Spatial multi-omic map of human myocardial inf...,All-snRNA-Spatial multi-omic map of human myoc...,674,28941
f7c1c579-2dc0-47e2-ba19-8165c5a0e353,101749,A human cell atlas of fetal gene expression,Survey of human embryonic development,799,44067
4ed927e9-c099-49af-b8ce-a2652d069333,36574,Single-nucleus cross-tissue molecular referenc...,Single-nucleus cross-tissue molecular referenc...,644,29553
5500c673-1610-40a0-86d9-64d987ae50e6,30889,Integrated adult and foetal heart single-cell ...,Integrated adult and foetal hearts,442,27408
f15e263b-6544-46cb-a46e-e33ab7ce8347,19722,Spatial multi-omic map of human myocardial inf...,Ischemia-snRNA-Spatial multi-omic map of human...,233,20691
53d208b0-2cfd-4366-9866-c3c6114081bc,16372,Tabula Sapiens,Tabula Sapiens - All Cells,785,56926
2adb1f8a-a6b1-4909-8ee8-484814e2d4bf,10783,Construction of a human cell landscape at sing...,Construction of a human cell landscape at sing...,642,26392
d567b692-c374-4628-a508-8008f6778f22,1,Spatially resolved multiomics of human cardiac...,Combined single cell and single nuclei RNA-Seq...,760,30996


In [27]:
heart_adata.obs = heart_adata.obs.merge(filtered_data, left_on='dataset_id', right_index=True, how='left')

heart_adata

AnnData object with n_obs × n_vars = 1559974 × 678
    obs: 'soma_joinid_x', 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'observation_joinid', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_type', 'tissue_general', 'tissue_general_ontology_term_id', 'raw_sum', 'nnz', 'raw_mean_nnz', 'raw_variance_nnz', 'n_measured_vars', 'count', 'collection_name', 'dataset_title', 'soma_joinid_y', 'genes_measured'
    var: 'soma_joinid', 'feature_id', 'feature_name', 'feature_length', 'nnz', 'n_measured_obs'

In [28]:
heart_adata.obs['collection_name'].value_counts()

collection_name
Pathogenic variants damage cell composition and single cell transcription in cardiomyopathies    665955
Cells of the adult human heart                                                                   486134
Spatial multi-omic map of human myocardial infarction                                            211517
A human cell atlas of fetal gene expression                                                      101749
Single-nucleus cross-tissue molecular reference maps to decipher disease gene function            36574
Integrated adult and foetal heart single-cell RNA sequencing                                      30889
Tabula Sapiens                                                                                    16372
Construction of a human cell landscape at single-cell level                                       10783
Spatially resolved multiomics of human cardiac niches                                                 1
Name: count, dtype: int64

In [29]:
heart_adata.obs['dataset_title'].value_counts()

dataset_title
DCM/ACM heart cell atlas: All cells                                                       665955
All — Cells of the adult human heart                                                      486134
All-snRNA-Spatial multi-omic map of human myocardial infarction                           191795
Survey of human embryonic development                                                     101749
Single-nucleus cross-tissue molecular reference maps to decipher disease gene function     36574
Integrated adult and foetal hearts                                                         30889
Ischemia-snRNA-Spatial multi-omic map of human myocardial infarction                       19722
Tabula Sapiens - All Cells                                                                 16372
Construction of a human cell landscape at single-cell level                                10783
Combined single cell and single nuclei RNA-Seq data - Heart Global                             1
Name: count, dty

In [39]:
heart_adata.obs[['dataset_title','disease','suspension_type']].value_counts()

dataset_title                                                                           disease                                          suspension_type
DCM/ACM heart cell atlas: All cells                                                     dilated cardiomyopathy                           nucleus            482581
All — Cells of the adult human heart                                                    normal                                           nucleus            359652
All-snRNA-Spatial multi-omic map of human myocardial infarction                         myocardial infarction                            nucleus            150132
All — Cells of the adult human heart                                                    normal                                           cell               126482
DCM/ACM heart cell atlas: All cells                                                     arrhythmogenic right ventricular cardiomyopathy  nucleus            104496
Survey of human embryonic develo

### Closing census

In [31]:
census.close()
del census

## Data saving

In [32]:
heart_adata

AnnData object with n_obs × n_vars = 1559974 × 678
    obs: 'soma_joinid_x', 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'observation_joinid', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_type', 'tissue_general', 'tissue_general_ontology_term_id', 'raw_sum', 'nnz', 'raw_mean_nnz', 'raw_variance_nnz', 'n_measured_vars', 'count', 'collection_name', 'dataset_title', 'soma_joinid_y', 'genes_measured'
    var: 'soma_joinid', 'feature_id', 'feature_name', 'feature_length', 'nnz', 'n_measured_obs'

In [33]:
heart_adata.write(f'../cellxgene/data/cellxgene_heart_data_sk_{timestamp}.h5ad')