# Find the Number of Cells of a Specific Cell Type

I need to find how many cells of a certain organ type are available in the HCA database. How can I go about doing this?

First, let's get set up.

In [94]:
import hca, json

Now, what type of cells are we looking for? Maybe cells from a _lymph node_?

In [95]:
organ_type = 'lymph node'

To see what cells appear in the database, we'll need to do some _searching_: specifically, using the __post_search()__ method. post_search() has four parameters: `es_query`, `output_format`, `replica`, and `per_page`, two of which are optional: `output_format` and `per_page`.

Don't worry about `per_page`; we won't be needing it. For `replica`, we'll be using AWS, and for `output_format`, we'll want to use the mode `raw`. Using `raw` will return get the verbatim JSON metadata for bundles that match our query, which we'll need if we want to find information about the organ type.

Now, let's start by writing our ElasticSearch query, `es_query`.

In [96]:
query = {
    "query" : {
        "bool" : {
            "must" : [{
                "match" : {
                    "files.biomaterial_json.biomaterials.content.organ.text" : organ_type
                }
            }, {
                "match" : {
                    "files.biomaterial_json.schema_version" : "5.1.0"
                }
            }, {
                "range" : {
                    "files.biomaterial_json.biomaterials.content.total_estimated_cells" : {
                        "gt" : 0
                    }
                }
            }]
        }
    }
}

From this search, we want all bundles that have an _organ type_ of lymph node, a _schema version_ of 5.1.0, and _at least one cell_ in the sample. Let's see if this returns any results. Instead of using a `raw` `output_format` here, let's just use `summary` for simplicity.

In [97]:
hca.dss.DSSClient().post_search(es_query=query, replica='aws', output_format='summary')['total_hits']

3274

Great, we got something. It looks like we have 3274 bundles with schema version 1.5.0, containing at least one lymph node cell.

Now, to count up the number of cells in each bundle.

In [99]:
count = 0

# Iterate through all search results containing the chosen cell type
for bundle in hca.dss.DSSClient().post_search.iterate(es_query=query, replica='aws', output_format='raw'):
    count += bundle['metadata']['files']['biomaterial_json']['biomaterials'][0]['content']['total_estimated_cells']

print('{} cell count: {}'.format(organ_type, count))

lymph node cell count: 3274


Huh, it looks like we got the same number of cells that were returned in our previous search. There must be exactly one cell per bundle.