# Download Any BAM File 🌕

Any BAM file will do. I just want to see what aligned data looks like in the HCA.

First I'll set up the DSS client.

In [1]:
import hca, hca.dss, json
client = hca.dss.DSSClient()

****
#### Now I want to find a bundle that has a BAM file in it.

The `client` has a method `get_file` that sounds very promising, but I need to get a UUID. I don't know what RFC4122 is, but hopefully I won't have to.

In [2]:
help(client.get_file)

Help on method get_file in module hca.util:

get_file(client, uuid: str = None, replica: str = None, version: Union[str, NoneType] = None, token: Union[str, NoneType] = None) method of builtins.type instance
    Retrieve a file given a UUID and optionally a version.
    
    
    .. admonition:: Streaming
    
     Use ``DSSClient.get_file.stream(**kwargs)`` to get a ``requests.Response`` object whose body has not been
     read yet. This allows streaming large file bodies:
    
     .. code-block:: python
    
        fid = "7a8fbda7-d470-467a-904e-5c73413fab3e"
        with DSSClient().get_file.stream(uuid=fid, replica="aws") as fh:
            while True:
                chunk = fh.raw.read(1024)
                ...
                if not chunk:
                    break
    
     The keyword arguments for ``DSSClient.get_file.stream()`` are identical to the arguments for
     ``DSSClient.get_file()`` listed here.
    :param uuid:  A RFC4122-compliant ID for the file. 
    :type uui

****
#### Searching for bundles

The `post_search` method accepts a `query`, and there's even an example query in the data-store repo's [readme](https://github.com/HumanCellAtlas/data-store/blob/master/README.md). The function signature doesn't quite match the example, but it should be easy to fix. 

In [3]:
search_response = client.post_search(replica="aws", es_query={
    "query": {
        "bool": {
            "must": [{
                "match": {
                    "files.donor_organism_json.medical_history.smoking_history": "yes"
                }
            }, {
                "match": {
                    "files.specimen_from_organism_json.genus_species.text": "Homo sapiens"
                }
            }, {
                "match": {
                    "files.specimen_from_organism_json.organ.text": "brain"
                }
            }]
        }
    }
})

search_response

{'es_query': {'query': {'bool': {'must': [{'match': {'files.donor_organism_json.medical_history.smoking_history': 'yes'}},
     {'match': {'files.specimen_from_organism_json.genus_species.text': 'Homo sapiens'}},
     {'match': {'files.specimen_from_organism_json.organ.text': 'brain'}}]}}},
 'results': [{'bundle_fqid': '7bc6f37f-d40e-4949-9f9f-5b1c31836c80.2018-10-31T003051.083711Z',
   'bundle_url': 'https://dss.data.humancellatlas.org/v1/bundles/7bc6f37f-d40e-4949-9f9f-5b1c31836c80?version=2018-10-31T003051.083711Z&replica=aws',
   'search_score': None},
  {'bundle_fqid': 'f82eeb01-7e36-4e86-9552-889bbbbe736c.2018-10-29T162307.815836Z',
   'bundle_url': 'https://dss.data.humancellatlas.org/v1/bundles/f82eeb01-7e36-4e86-9552-889bbbbe736c?version=2018-10-29T162307.815836Z&replica=aws',
   'search_score': None},
  {'bundle_fqid': 'fa5934e2-e48d-4a53-9a44-dcc5196cd233.2018-10-23T173130.736396Z',
   'bundle_url': 'https://dss.data.humancellatlas.org/v1/bundles/fa5934e2-e48d-4a53-9a44-dcc5

****
#### It worked!

And now I can see how the results are structured. Let's take a closer look at a single result.

In [4]:
search_response["results"][0]

{'bundle_fqid': '7bc6f37f-d40e-4949-9f9f-5b1c31836c80.2018-10-31T003051.083711Z',
 'bundle_url': 'https://dss.data.humancellatlas.org/v1/bundles/7bc6f37f-d40e-4949-9f9f-5b1c31836c80?version=2018-10-31T003051.083711Z&replica=aws',
 'search_score': None}

____
Now I have an ID that I can work with, though it's an "fqid" not a "uuid", and it's for a bundle, not a file. I can try it with `get_bundle`:

In [6]:
client.get_bundle(uuid=search_response["results"][0]["bundle_fqid"], replica="aws")

SwaggerAPIException: not_found: Cannot find bundle! (HTTP 404). Details:
Traceback (most recent call last):
  File "/var/task/chalicelib/dss/error.py", line 55, in wrapper
    return func(*args, **kwargs)
  File "/var/task/chalicelib/dss/api/bundles/__init__.py", line 51, in get
    raise DSSException(404, "not_found", "Cannot find bundle!")
dss.error.DSSException


****
Hmmm, a `SwaggerAPIExcecption`; it appears that it couldn't find a bundle with that UUID. I suppose I can strip off the timestamp from the ID.

In [7]:
client.get_bundle(uuid=search_response["results"][0]["bundle_fqid"][:36], replica="aws")

{'bundle': {'creator_uid': 8008,
  'files': [{'content-type': 'application/json; dcp-type="metadata/biomaterial"',
    'crc32c': '32ec10b9',
    'indexed': True,
    'name': 'cell_suspension_0.json',
    's3_etag': 'a38c2d91ea0c3a742d1c2b3efe490fd2',
    'sha1': 'c50acea6314c1830c9380ca7f89f5f417217e55b',
    'sha256': '2f1782e31b469e17c95c0429062b1ebc831c23731eda73143d8a5d24b5c0d768',
    'size': 471,
    'uuid': '9b5a6292-b6db-4503-8eac-379c04e33952',
    'version': '2018-10-29T161653.631000Z'},
   {'content-type': 'application/json; dcp-type="metadata/biomaterial"',
    'crc32c': 'db8c8626',
    'indexed': True,
    'name': 'specimen_from_organism_0.json',
    's3_etag': '39001372fbfa2ccb2b6af9fa8c514608',
    'sha1': 'a58b1be2fa6edf2d0ea2f15d9aec63b22f9b121b',
    'sha256': '652c067b73b1ca78ed5943cdd983db1851ea9856385c34225b5307afb8d167c8',
    'size': 968,
    'uuid': '33578674-d296-452b-be2b-206085d4df93',
    'version': '2018-10-29T161653.617000Z'},
   {'content-type': 'applicat

****
This is very promising! Here I can see the structure of a "bundle" and how to work with it. Now I can use one of these uuids with the `get_file` method.

In [11]:
bam_file = client.get_file(uuid="8510ec11-c0f8-4016-a13a-bdbe1d0d0e9e", replica="aws")
with open("9b5a6292-b6db-4503-8eac-379c04e33952_qc.bam", "wb") as output_bam:
    output_bam.write(bam_file)

In [9]:
import pysam
bam = pysam.AlignmentFile("Aligned.sortedByCoord.out.bam", "rb")
print(bam.header)

ModuleNotFoundError: No module named 'pysam'