# Download all bundles for liver cells sequenced with 10x

In this notebook, we cover how to search the Human Cell Atlas Data Store (DSS) for bundles containing liver cells that were sequenced with 10x, and download all of the data that is found.

The two steps of this process are:

1. Write an ElasticSearch query to return bundles matching our two conditions (liver cells, and 10x).

2. Iterate over the results and download the relevant files using the DSS API.

As usual, we start with a DSS API client.

In [218]:
import hca.dss, json
client = hca.dss.DSSClient()

## Writing the ElasticSearch Query

It is recommended that the reader at least skims the ["Writing ElasticSearch Queries"](../elasticsearch-queries/elasticsearch-queries.html) notebook, which covers how ElasticSearch queries are written.

To find bundles with T cells and that were sequenced with 10x, we will use a boolean conditional query, with two conditions. We should first run an empty query and look at the metadata returned by one item to figure out what fields should contain "T cells" and which fields should contain "10x".

In [219]:
response = client.post_search(es_query={}, replica='aws', output_format='raw')
first_bundle = response['results'][1]

Each result contains metadata, as extracted from several JSON files:

In [220]:
print("Metadata files:")
print("\n".join(first_bundle['metadata']['files'].keys()))

Metadata files:
cell_suspension_json
dissociation_protocol_json
donor_organism_json
enrichment_protocol_json
library_preparation_protocol_json
links_json
process_json
project_json
sequence_file_json
sequencing_protocol_json
specimen_from_organism_json


### Boolean Condition 1: 10x Data

If we are looking for 10x data, the `sequencing_protocol_json` file might be the first place you would look:

In [222]:
import json
print(json.dumps(
    first_bundle['metadata']['files']['sequencing_protocol_json'],
    indent=4
))

[
    {
        "describedBy": "https://schema.humancellatlas.org/type/protocol/sequencing/10.0.0/sequencing_protocol",
        "instrument_manufacturer_model": {
            "ontology": "EFO:0009173",
            "ontology_label": "Illumina NextSeq 500",
            "text": "Illumina NextSeq 500"
        },
        "method": {
            "ontology": "EFO:0008441",
            "ontology_label": "full length single cell RNA sequencing",
            "text": "full length single cell RNA sequencing"
        },
        "paired_end": true,
        "protocol_core": {
            "protocol_id": "sequencing_protocol_1"
        },
        "provenance": {
            "document_id": "293c36a8-33ce-4e21-a694-b2ae60fb1e2d",
            "submission_date": "2019-05-10T14:22:49.335Z",
            "update_date": "2019-05-10T14:22:53.538Z"
        },
        "schema_type": "protocol"
    }
]


This doesn't contain the right metadata, so we can look next in `library_preparation_protocol_json`:

In [223]:
import json
print(json.dumps(
    first_bundle['metadata']['files']['library_preparation_protocol_json'],
    indent=4
))

[
    {
        "describedBy": "https://schema.humancellatlas.org/type/protocol/sequencing/6.1.0/library_preparation_protocol",
        "end_bias": "full length",
        "input_nucleic_acid_molecule": {
            "ontology": "OBI:0000869",
            "text": "polyA RNA"
        },
        "library_construction_kit": {
            "manufacturer": "Illumina",
            "retail_name": "Nextera XT kit"
        },
        "library_construction_method": {
            "ontology": "EFO:0008931",
            "ontology_label": "Smart-seq2",
            "text": "Smart-seq2"
        },
        "nucleic_acid_source": "single cell",
        "primer": "poly-dT",
        "protocol_core": {
            "protocol_id": "library_preparation_protocol_1"
        },
        "provenance": {
            "document_id": "3ab6b486-f900-4f70-ab34-98859ac5f77a",
            "submission_date": "2019-05-10T14:22:49.330Z",
            "update_date": "2019-05-10T14:22:53.537Z"
        },
        "schema_type": "p

From this we can determine the first boolean condition:

```
files.library_preparation_protocol.library_construction_method.text
```

should contain the text "10x". A `wildcard` query would be good here.

### Boolean Condition 2: Matching liver cells

To find the organ type for the cells in the bundle, we can use the `specimen_from_organism_json` metadata file, which contains information about the organism the specimen came from (including the organ).

Checking the metadata reveals the path needed to obtain the organ type:

In [224]:
import json
print(json.dumps(
    first_bundle['metadata']['files']['specimen_from_organism_json'],
    indent=4
))

[
    {
        "biomaterial_core": {
            "biomaterial_id": "DID_scRSq06_pancreas",
            "ncbi_taxon_id": [
                9606
            ]
        },
        "describedBy": "https://schema.humancellatlas.org/type/biomaterial/10.2.0/specimen_from_organism",
        "diseases": [
            {
                "ontology": "PATO:0000461",
                "ontology_label": "normal",
                "text": "normal"
            }
        ],
        "genus_species": [
            {
                "ontology": "NCBITaxon:9606",
                "ontology_label": "Homo sapiens",
                "text": "Homo sapiens"
            }
        ],
        "organ": {
            "ontology": "UBERON:0001264",
            "ontology_label": "pancreas",
            "text": "pancreas"
        },
        "organ_parts": [
            {
                "ontology": "UBERON:0000006",
                "ontology_label": "islet of Langerhans",
                "text": "islet of Langerhans"
        

The path needed for our boolean condition is

```
files.specimen_from_organism_json.organ.text
```

and it should match "liver".

### Combining for the Query

We now assemble a boolean conditional query using our two conditions: the first, a wildcard query to find 3' single cell data, and the second, a match query to find organs matching "liver".

In [231]:
method = "*10x*"
organ = "liver"

query = {
    "query": {
        "bool": {
            "must": [
                {
                    "wildcard": {
                        "files.library_preparation_protocol_json.library_construction_method.text": {
                            "value": method
                        }
                    }
                },
                {
                    "match": {
                        "files.specimen_from_organism_json.organ.text": organ
                    }
                }
            ]
        }
    }
}

Now we run the query:

In [232]:
search_results = client.post_search(
    es_query=query, replica='aws', output_format='raw')

print("post_search() found %d results"%(search_results['total_hits']))
print("post_search() returned %d results"%(len(search_results['results'])))

post_search() found 10 results
post_search() returned 10 results


It so happens that all of these bundles belong to the same project:

In [239]:
print(json.dumps(
    search_results['results'][1]['metadata']['files']['project_json'][0]['project_core'],
    indent=4
))

{
    "project_description": "The liver is the largest solid organ in the body and is critical for metabolic and immune functions. However, little is known about the cells that make up the human liver and its immune microenvironment. Here we report a map of the cellular landscape of the human liver using single-cell RNA sequencing. We provide the transcriptional profiles of 8444 parenchymal and non-parenchymal cells obtained from the fractionation of fresh hepatic tissue from five human livers. Using gene expression patterns, flow cytometry, and immunohistochemical examinations, we identify 20 discrete cell populations of hepatocytes, endothelial cells, cholangiocytes, hepatic stellate cells, B cells, conventional and non-conventional T cells, NK-like cells, and distinct intrahepatic monocyte/macrophage populations. Together, our study presents a comprehensive view of the human liver at single-cell resolution that outlines the characteristics of resident cells in the liver, and in part

Thus concludes our tutorial.