# Querying and fetching data from the Cell Census

The Cell Census is a versioned container for the single-cell data hosted at [CELLxGENE Discover](https://cellxgene.cziscience.com/). The Cell Census utilizes [SOMA](https://github.com/single-cell-data/SOMA/blob/main/abstract_specification.md) powered by [TileDB](https://tiledb.com/products/tiledb-embedded) for storing, accessing, and efficiently filtering data.

TBD

# Contents
TBD

## Opening the census

The `cell_census` python package contains a convenient API to open the latest version of the Cell Census.

In [2]:
import cell_census
import numpy as np
import memento

census = cell_census.open_soma()

You can learn more about the `cell_census` methods by accessing their corresponding documentation via `help()`. For example `help(cell_census.open_soma)`.

Getting dataset of interest

In [10]:
# Tabula muris senis 10x liver
dataset_title = "All major cell types in adult human retina"

# Getting the joinid and dataset id
datasets = census["census_info"]["datasets"].read(value_filter = f"dataset_title == '{dataset_title}'").concat().to_pandas()
dataset_joinid = list(datasets["soma_joinid"])
datasset_interest_id = datasets["dataset_id"][0]

Get measured genes in dataset

In [6]:
presence_matrix = cell_census.get_presence_matrix(census, organism="Homo sapiens", measurement_name="RNA")
dataset_presnence = presence_matrix[dataset_joinid,]

gene_joinid = np.nonzero(dataset_presnence.sum(axis=0).A1 == dataset_presnence.shape[0])[0].tolist()

Get anndata

In [11]:
adata = cell_census.get_anndata(
    census=census,
    organism="Homo sapiens",
    var_coords= gene_joinid,
    obs_value_filter=f"dataset_id == '{datasset_interest_id}'",
)

In [16]:
adata.var_names = adata.var["feature_name"]

Let's do a 1-vs-all comparison for Kupffer cells. 

In [35]:
#adata.obs["differential_grouping"] = adata.obs.cell_type == "Kupffer cell"
#adata.obs["differential_grouping"] = adata.obs["differential_grouping"] * 1 

0       0
1       0
2       0
3       0
4       0
       ..
7289    0
7290    0
7291    0
7292    0
7293    0
Name: differential_grouping, Length: 7294, dtype: int64

Run memento

In [39]:
differential_exp_results = memento.binary_test_1d(
    adata=adata, 
    capture_rate=0.07, 
    treatment_col='differential_grouping', 
    num_cpus=2,
    num_boot=500)

KeyboardInterrupt: 

In [48]:
differential_exp_results = differential_exp_results.merge(adata.var["feature_name"], how = "left", left_on = "gene", right_index = True)


Unnamed: 0,gene,tx,de_coef,de_se,de_pval,dv_coef,dv_se,dv_pval,feature_name
290,725,differential_grouping,2.773499,0.044993,5.461894e-218,-2.90919,0.057528,1.59607e-15,Fcer1g
3546,8403,differential_grouping,2.255487,0.034945,1.855812e-187,-2.040773,0.053801,6.984151e-10,Cyba
3757,8872,differential_grouping,0.948356,0.013281,3.60537e-147,0.051177,0.054603,0.4351297,Itm2b
6158,14260,differential_grouping,1.46171,0.035563,4.965358e-136,0.30612,0.144744,0.04391218,Ehd1
3307,7879,differential_grouping,2.541776,0.060401,1.908478e-133,-1.899846,0.078899,8.612426e-08,Msr1
6026,13998,differential_grouping,1.603368,0.029265,2.402969e-132,0.454168,0.092245,0.00127678,Snx2
6047,14033,differential_grouping,3.542341,0.068687,3.689383e-127,-3.028363,0.107592,2.185141e-07,Csf1r
648,1547,differential_grouping,0.669162,0.013153,3.323402e-126,0.18025,0.052691,0.01083828,Serf2
925,2232,differential_grouping,0.839348,0.021644,1.145787e-85,-0.322942,0.27021,0.2734531,Ssr4
2213,5281,differential_grouping,1.221132,0.028902,2.230698e-81,0.31201,0.107738,0.01128435,Tmem176a


In [49]:
differential_exp_results.query('de_coef > 0').sort_values('de_pval').head(10)


Unnamed: 0,gene,tx,de_coef,de_se,de_pval,dv_coef,dv_se,dv_pval,feature_name
290,725,differential_grouping,2.773499,0.044993,5.461894e-218,-2.90919,0.057528,1.59607e-15,Fcer1g
3546,8403,differential_grouping,2.255487,0.034945,1.855812e-187,-2.040773,0.053801,6.984151e-10,Cyba
3757,8872,differential_grouping,0.948356,0.013281,3.60537e-147,0.051177,0.054603,0.4351297,Itm2b
6158,14260,differential_grouping,1.46171,0.035563,4.965358e-136,0.30612,0.144744,0.04391218,Ehd1
3307,7879,differential_grouping,2.541776,0.060401,1.908478e-133,-1.899846,0.078899,8.612426e-08,Msr1
6026,13998,differential_grouping,1.603368,0.029265,2.402969e-132,0.454168,0.092245,0.00127678,Snx2
6047,14033,differential_grouping,3.542341,0.068687,3.689383e-127,-3.028363,0.107592,2.185141e-07,Csf1r
648,1547,differential_grouping,0.669162,0.013153,3.323402e-126,0.18025,0.052691,0.01083828,Serf2
925,2232,differential_grouping,0.839348,0.021644,1.145787e-85,-0.322942,0.27021,0.2734531,Ssr4
2213,5281,differential_grouping,1.221132,0.028902,2.230698e-81,0.31201,0.107738,0.01128435,Tmem176a


In [44]:
adata.var_names

Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
       ...
       '15508', '15509', '15510', '15511', '15512', '15513', '15514', '15515',
       '15516', '15517'],
      dtype='object', length=15518)

In [37]:
census.close()