# Download Embeddings CELLxGENE 

## This module interfaces with the CZ CELLxGENE Discover Census package to download pre-computed embeddings for a subset of cells. These embeddings can be used as a reference for label transfer in single-cell analyses.


In [1]:
#| default_exp Census

In [2]:
#| hide
from nbdev.showdoc import *

# Download and Embed Census Data Function
This function `download_and_embed_census_data` is designed to automate the process of downloading census data and retrieving embeddings for a specific organism and tissue type. It takes four parameters: `census_version` to specify the version of the census data, `embedding_uri` to provide the location of the embeddings data, `organism` to filter the data for a particular organism (like "homo_sapiens"), and `tissue` to specify the tissue type (like "blood"). The function returns an Anndata object that contains the embeddings corresponding to the filtered census data.


In [3]:
#| export
def download_and_embed_census_data(census_version='2023-12-15', embedding_uri='s3://cellxgene-contrib-public/contrib/cell-census/soma/2023-12-15/CxG-contrib-2', organism='homo_sapiens', tissue='blood'):
    """
    Download census data and retrieve embeddings for the specified organism and tissue with default parameters.

    :param census_version: Version of the census data to use, default is '2023-12-15'
    :param embedding_uri: URI for the embeddings data, default is the provided S3 bucket path
    :param organism: Organism to filter by, default is 'homo_sapiens'
    :param tissue: Tissue to filter by, default is 'blood'
    :return: Anndata object with embeddings
    """
    
    import cellxgene_census
    from cellxgene_census.experimental import get_embedding

    # Open the census data for the given version
    census = cellxgene_census.open_soma(census_version=census_version)

    # Get the Anndata object for the specified organism and tissue
    adata = cellxgene_census.get_anndata(
        census,
        organism=organism,
        measurement_name="RNA",
        obs_value_filter=f"tissue_general == '{tissue}'",
    )

    # Retrieve embeddings using the soma_joinid from the Anndata object
    embeddings = get_embedding(census_version, embedding_uri, adata.obs["soma_joinid"].to_numpy())
    adata.obsm["emb"] = embeddings

    return adata



# Example Usage
The following code snippet demonstrates how to use the `download_and_embed_census_data` function to download and embed census data for human blood tissue. The function is called with default parameters for `census_version` and `embedding_uri`, while `organism` and `tissue` are set to 'homo_sapiens' and 'blood', respectively. The result is stored in the variable `reference`, which contains the Anndata object with the embeddings.




In [4]:
# reference = download_and_embed_census_data(organism='homo_sapiens', tissue='blood')

In [5]:
# reference

In [6]:
#| hide
import nbdev; nbdev.nbdev_export()