# Using dos-resolver-lambda

This notebook demonstrates how to access metadata provided by a Data Object Service Resolver. A "resolver" in this case means a service which looks up a given request amongst a selection of servers to provides its response. It allows clients to "resolve" Data Object Service identifiers from more than one service.

In [3]:
lambda_url = "https://0eofs5xr08.execute-api.us-west-2.amazonaws.com/api"

data_object_ids = ['9da8a3c9-46a2-48c0-9aaa-dde6ea194b33', '75a87632-cec4-43bc-b8e9-319ebd3f4627']
data_bundles_ids = ['d8fe0ae3-efa2-59c3-9e70-1b164ca868b3']

## Resolving Data Objects

Just as we would with any other Data Object Service, we issue a `GetDataObjectRequest`.

In [6]:
import requests

response = requests.get("{}/ga4gh/dos/v1/dataobjects/{}".format(lambda_url, data_object_ids[0]))
print(response.json())

{u'data_object': {u'checksums': [{u'checksum': u'e035d4b7f18e8373de4c1d943ba944a71fc16452c545b44775f23102172a84a1', u'type': u'sha256'}, {u'checksum': u'a65aefe7f11b62aedb364696ab74fb1e', u'type': u'etag'}, {u'checksum': u'314d42361ad9efaf6090047a6e09033d8a0e9ebe', u'type': u'sha1'}, {u'checksum': u'4319fe3c', u'type': u'crc32c'}], u'version': u'2018-02-28T051202.123007Z', u'id': u'9da8a3c9-46a2-48c0-9aaa-dde6ea194b33', u'content_type': u'application/json', u'urls': [{u'url': u'https://commons-dss.ucsc-cgp-dev.org/v1/files/9da8a3c9-46a2-48c0-9aaa-dde6ea194b33?replica=aws'}, {u'url': u'https://commons-dss.ucsc-cgp-dev.org/v1/files/9da8a3c9-46a2-48c0-9aaa-dde6ea194b33?replica=azure'}, {u'url': u'https://commons-dss.ucsc-cgp-dev.org/v1/files/9da8a3c9-46a2-48c0-9aaa-dde6ea194b33?replica=gcp'}, {u'url': u's3://commons-dss-commons/blobs/e035d4b7f18e8373de4c1d943ba944a71fc16452c545b44775f23102172a84a1.314d42361ad9efaf6090047a6e09033d8a0e9ebe.a65aefe7f11b62aedb364696ab74fb1e.4319fe3c'}, {u'url

Our first identifier resolves to the `commons-dss` DOS instance. We can verify this using the embedded DOS url.

In [9]:
data_object = response.json()['data_object']
print(data_object['urls'][5])

{u'url': u'https://spbnq0bc10.execute-api.us-west-2.amazonaws.com/api/ga4gh/dos/v1/dataobjects/9da8a3c9-46a2-48c0-9aaa-dde6ea194b33'}


In [11]:
inner_response = requests.get(data_object['urls'][5]['url'])
print(inner_response.json()['data_object']['id'])
print(data_object['id'])

9da8a3c9-46a2-48c0-9aaa-dde6ea194b33
9da8a3c9-46a2-48c0-9aaa-dde6ea194b33


Note that both Objects have the same identifier. The DOS resolver adds a URL to check the original DOS as a matter of service.

Let's get another Data Object!

In [13]:
response = requests.get("{}/ga4gh/dos/v1/dataobjects/{}".format(lambda_url, data_object_ids[1]))
data_object = response.json()['data_object']
print(data_object)

{u'name': u'75a87632-cec4-43bc-b8e9-319ebd3f4627.vep.vcf.gz', u'version': u'2017-10-24T11:37:38.011252-05:00', u'urls': [{u'url': u'https://api.gdc.cancer.gov/data/75a87632-cec4-43bc-b8e9-319ebd3f4627', u'system_metadata': {u'data_type': u'Annotated Somatic Mutation', u'updated_datetime': u'2017-10-24T11:37:38.011252-05:00', u'created_datetime': u'2017-09-10T18:46:45.741251-05:00', u'file_name': u'75a87632-cec4-43bc-b8e9-319ebd3f4627.vep.vcf.gz', u'md5sum': u'c97edfd81f0522c1cf1f47aba3d08866', u'data_format': u'VCF', u'acl': [u'phs001179'], u'access': u'controlled', u'state': u'submitted', u'file_id': u'75a87632-cec4-43bc-b8e9-319ebd3f4627', u'data_category': u'Simple Nucleotide Variation', u'file_size': 3767, u'submitter_id': u'AD11793_AnnotatedSomaticMutation', u'type': u'annotated_somatic_mutation', u'file_state': u'submitted', u'experimental_strategy': u'Targeted Sequencing'}}, {u'url': u'https://gmyakqsfp8.execute-api.us-west-2.amazonaws.com/api/ga4gh/dos/v1/dataobjects/75a87632-c

Note that this identifier resolved to the NCI GDC provided by dos-gdc-lambda. Again, we can check the inner url.

In [17]:
inner_response = requests.get(data_object['urls'][1]['url'])
print(inner_response.json()['data_object']['id'])
print(data_object['id'])

75a87632-cec4-43bc-b8e9-319ebd3f4627
75a87632-cec4-43bc-b8e9-319ebd3f4627


## Resolving Data Bundles

Data Bundles can be accessed in a similar fashion, though the service only presents one service which supports them.

In [30]:
response = requests.get('{}/ga4gh/dos/v1/databundles/{}'.format(lambda_url, data_bundles_ids[0]))
print(response.json())

{u'data_bundle': {u'system_metadata': {u'dosurl': u'https://spbnq0bc10.execute-api.us-west-2.amazonaws.com/api/ga4gh/dos/v1/databundles/d8fe0ae3-efa2-59c3-9e70-1b164ca868b3'}, u'version': u'2018-02-28T051207.231265Z', u'id': u'd8fe0ae3-efa2-59c3-9e70-1b164ca868b3', u'data_object_ids': [u'9da8a3c9-46a2-48c0-9aaa-dde6ea194b33', u'ab4c0815-a366-47b8-b94f-626458d43859', u'f0b142de-e19a-4771-bd6c-d7c5a11d2a43']}}


Since Data Bundles don't have lists of URLs we can append to, we add the `dosurl` as `system_metadata`.

In [31]:
data_bundle = response.json()['data_bundle']
print(data_bundle['system_metadata'])

{u'dosurl': u'https://spbnq0bc10.execute-api.us-west-2.amazonaws.com/api/ga4gh/dos/v1/databundles/d8fe0ae3-efa2-59c3-9e70-1b164ca868b3'}


Now, we'll get the Bundle from the original service:

In [32]:
inner_response = requests.get(data_bundle['system_metadata']['dosurl'])
print(inner_response)
print(inner_response.json()['data_bundle']['id'])
print(data_bundle['id'])

<Response [200]>
d8fe0ae3-efa2-59c3-9e70-1b164ca868b3
d8fe0ae3-efa2-59c3-9e70-1b164ca868b3


## Future Work

Add `ListDataObjectsRequest` so that Data Objects can be resolved by GUID.