# Download all bundles for T cells sequenced with 10x

Suppose I want to get all bundles that contain T cells _and_ were sequenced using 10x. How should I go about doing this?

Well, the first thing we'll need is a query to search with. It might be a little more complicated than the ones we've used in previous vignettes, but the process overall is simple.

In [55]:
query = {
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "files.process_json.processes.content.dissociation_method": "10x_v2"
                    }
                },
                {
                    "regexp": {
                        "files.biomaterial_json.biomaterials.content.target_cell_type.text": {
                            "value": ".*T\\ cell" # Gives us any type of T cell
                        }
                    }
                }
            ]
        }
    }
}

This query should give us all bundles with a dissociation method matching *10x_v2* and a target cell type matching *any type of T cell*. Keep in mind that while the use of the characters `.*` is convenient for finding a value in an unknown format, it _can_ make searches slow. However, for this example, let's not worry about performance.

Also, if you're wondering how to find the paths to these fields, [this previous vignette](https://github.com/HumanCellAtlas/data-consumer-vignettes/tree/feature/vignette-find-cell-count/tasks/Find%20Cell%20Type%20Count) should be helpful.

Now, let's give the query a try.

In [57]:
import hca.dss, json
client = hca.dss.DSSClient()

# Print the first bundle we get from this query

search_results = client.post_search(es_query=query, replica='aws', output_format='raw')
print(json.dumps(search_results['results'][0], indent=4, sort_keys=True))

IndexError: list index out of range

...Well, that didn't exactly work out like we were hoping. What went wrong?

If the list index is out of range, it probably means that the search returned no results. Let's see...

In [58]:
print(search_results['total_hits'])

0


Aha, we've found the problem. Let's try simplifying the search a little, this time only looking for T cells.

In [59]:
query = {
    "query": {
        "regexp": {
            "files.biomaterial_json.biomaterials.content.target_cell_type.text": {
                "value": ".*T\\ cell"
            }
        }
    }
}

Now that we've abandoned half the query, we should get some results.

In [60]:
search_results = client.post_search(es_query=query, replica='aws', output_format='raw')
print(search_results['total_hits'])

1183


That's a lot of bundles with T cells. Why aren't we getting any that were sequenced using 10x?

Let's search using the other half of the query and find out.

In [68]:
query = {
    "query": {
        "match": {
            "files.process_json.processes.content.dissociation_method": "10x_v2"
        }
    }
}

Okay, let's see how many bundles with 10x sequencing there are.

In [69]:
search_results = client.post_search(es_query=query, replica='aws', output_format='raw')
print(search_results['total_hits'])

4


Well, well. That makes more sense now. It seems there are only four bundles sequenced by 10x, which makes it unlikely that any of them include data about T cells. Maybe we can examine part of a bundle to get a better idea of what's going on.

In [70]:
print(json.dumps(search_results['results'][0]['metadata']['files']['biomaterial_json']['biomaterials'][0], indent=4, sort_keys=True))

{
    "content": {
        "biomaterial_core": {
            "biomaterial_id": "3_BM1_cells",
            "has_input_biomaterial": "3_BM1",
            "ncbi_taxon_id": [
                9606
            ]
        },
        "describedBy": "https://schema.humancellatlas.org/type/biomaterial/5.1.0/cell_suspension",
        "genus_species": [
            {
                "ontology": "NCBITaxon:9606",
                "text": "Homo sapiens"
            }
        ],
        "schema_type": "biomaterial",
        "target_cell_type": [
            {
                "ontology": "CL:1001610",
                "text": "bone marrow hematopoietic cell"
            }
        ],
        "total_estimated_cells": 3971
    },
    "hca_ingest": {
        "accession": "",
        "document_id": "6d98e8a4-dc7e-4ee8-aad6-9861b744e9fe",
        "submissionDate": "2018-03-26T16:59:18.876Z",
        "updateDate": "2018-03-28T17:49:42.521Z"
    }
}


Looking at this, it would seem that all the cells recorded here are bone marrow hematopoietic cells. What about the other bundles though?

In [64]:
print( 'T cell' in json.dumps(search_results['results']) )

False


Well, there's our answer! It would seem that there isn't any data on T cells anywhere in these four bundles, meaning there aren't any bundles containing data on both T cells and 10x sequencing.

Still, I'm not quite satisfied yet; I want to see some results from a compound query. Let's find a bundle with T cells and _mechanical_ sequencing, instead.

In [65]:
query = {
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "files.process_json.processes.content.dissociation_method": "mechanical"
                    }
                },
                {
                    "regexp": {
                        "files.biomaterial_json.biomaterials.content.target_cell_type.text": {
                            "value": ".*T\\ cell" # Gives us any type of T cell
                        }
                    }
                }
            ]
        }
    }
}

And now to run a search on it...

In [66]:
search_results = client.post_search(es_query=query, replica='aws', output_format='raw')
print(search_results['total_hits'])

1183


There we go! Our query with two parameters worked. It look like all of the current bundles with T cell data have a _mechanical_ dissociation method. 