 ### Basic DRS
 
#### Learning Objectives
Workshop attendees will learn how use the GA4GH Data Repository Service (DRS).  

What will participants do as part of the exercise?

 - Understanding the two main DRS methods
 - Find where a file is available
 - Use a Python client to access DRS and return results
 
 
     
 
 #### Icons in this Guide

 🖐 A hands-on section where you will code something or interact with the server
 
 #### Step 1: Run a cell in a Jupyter notebook
 To run a cell in a Jupyter notebook
 - Click to the left of the cell
 - Click the Run icon in the toolbar below the menu bar.
 
 🖐 Try it out with the following cell

In [None]:
host_url = 'https://locate.be-md.ncbi.nlm.nih.gov'
drs_id = 'fb1cfb04d3ef99d07c21f9dbf87ccc68'

full_url = host_url + '/ga4gh/drs/v1/objects/' + drs_id
print(full_url)

The result of the cell is printed out below the cell.

The python code above built a URL to access a API function which will provide information about where a file is available.

 #### Step 2: Call the API using the link above
 🖐 Open the link above a new web browser window.

See that a response is produced, but that it is not a detailed web page. It is a response which is intended to be read by a computer program.

We will look at the response more closely below.

Close the browser window or tab you opened to view the URL.

 #### Step 3: Call the API from Python

The url we built is stored in the variable called full_url.

In the next cell we can use the Python requests module to make the request to the DRS server.

 🖐 Click the cell and run it to obtain the response

In [None]:
# First to make requests to a web server the requests module is imported 
import requests

response = requests.get(full_url)
print(response.json())

That's still not very readable. We can define a function to print the response in a more readable form.

#### Step 4: Understanding the DRS response

🖐 Click and run the next two cells in turn.

In [None]:
import json
def pretty_print(a_dict):
    print(json.dumps(a_dict, indent=3))

In [None]:
pretty_print(response.json())

The most relevant section of the response is the access_methods.

In this example the response from DRS shows there are three ways the file could be accessed.
The 'region' tells us that the files are available in the US region of Google Cloud Storage (gs.us) and in Amazon S3 storage in the us-east-1 region (s3.us-east-1).

We'll pass on the second of the three access methods for now.

#### Step 5: Making the second DRS call - getting a URL to access the file

Let's say we have credits available to compute on one of the clouds available. We would pick the access_id from above and use a second DRS API call to obtain a URl to access the file.

Note that we say access and not download. Because the bam file is large, and we may want to work with many such files, we may want to run analysis on the file where it is. We will come back to this later.

For now we'll just get the URL.

🖐 As before click on the cell to get the URL

In [None]:
access_id = "1e4846c05c81a49f684e7f940ffbd3a98e5f0e335f019ee4d32d85c72096b743"
full_url = '{}/ga4gh/drs/v1/objects/{}/access/{}'.format(host_url, drs_id, access_id)

print(full_url)

As in the previous example the step above simply demonstrated how the URL to access the API is created. In this case it is created for a specific file and access method.

🖐 Click on the cell below to send the request and print the response

In [None]:
url_response = requests.get(full_url)
print(url_response.json())

Noting the size of the bam file we saw in the first API call we won't actually download it.


🖐 Using what you learnt above, and some python skills, add code to the example below to retrieve the urls for each access method. The loop to do this for each access method is already there. Just add the code to call the API to retrieve a URL.

In [None]:
drs_response = response.json()
for access_method in drs_response['access_methods']:
    print (access_method['access_id'])
    # Add code here to make the DRS call to retrieve the URL for each access_id


#### Step 6: Optional - stretch goal - for python experts

🖐 Imagine you have a preference for working in a particular cloud provider and region. Complete the following function to use DRS to obtain the URL for the file in a specific region

In [None]:
def get_url_for_region(drs_id, region):
    full_url = '{}/ga4gh/drs/v1/objects/{}'.format(host_url, drs_id)
    r = requests.get(full_url)
    drs_response = r.json()
    # add code here - find the access_id for the region
    # Watch out that not all access_methods have region
    # make the DRS call to get the url


#### 🖐  Test it

In [None]:
get_url_for_region(drs_id, 'gs.US')

In [None]:
get_url_for_region(drs_id, 's3.us-east-1')

In [None]:
get_url_for_region(drs_id, 's3.us-west-1')

#### Step 7: Using a DRS Python Client
The above showed how individual calls to DRS can be made. As we are likely to do this repetitively we created a set of functions that could be used to call DRS so we can focus on more interesting aspects of the task.

We can still make use of the variables like host_id and drs_id previously, but now we will pass them to our client.

🖐 Click on the following to make the first DRS request

In [None]:
from fasp.loc import DRSClient
cl = DRSClient(host_url, public=True)
cl.get_object(drs_id)

🖐 and again to get the access URL

In [None]:
cl.get_access_url(drs_id, 'b14572d74b5aafe87a0fcc873050d6c3993f27338cdd088b5883aed4b118f0c8')

Our client also includes the function to retrieve the URL to access the file in a specified region. This is the function that we set as a task above.

🖐 Click to test it

In [None]:
cl.get_url_for_region(drs_id, 's3.us-east-1')

#### Return to Session 2 notebooks
Close the tab for this workbook and return to the Jupyter window for the Session 2 notebooks