### Passing DRS ids to the right DRS server

This notebook covers two related topics about DRS
* Sending DRS requests to the right server 
* Two different kinds of DRS URI

The fasp-clients package installed for this tutorial contains a class to handle both these aspects of DRS.

Any regular DRS server only responds to the local ids known to it. Notebook 2-1 explored that basic DRS capability. We saw that the NCBI DRS server could return information about the following DRS id.
* fb1cfb04d3ef99d07c21f9dbf87ccc68

In another case, the BioDataCatalyst DRS server knows about and can resolve the following local id
* e747c529-a6ee-415f-90b8-e2db631f8ed9

To make these local ids globally unique and usable they need to be expressed as DRS URIs.

### Compact DRS URIs

Compact URIs use a short prefix which is associated with a specific DRS server. For example, the prefix dg.4503 was registered for BioDataCatalyst. This allows the id to be referred to globally as 
* dg.4503:e747c529-a6ee-415f-90b8-e2db631f8ed9

The form of DRS id above is known as a compact DRS URI.

#### General metaresolvers
* http://identifiers.org/dg.4503:e747c529-a6ee-415f-90b8-e2db631f8ed9
* http://bioregistry.io/dg.4503:e747c529-a6ee-415f-90b8-e2db631f8ed9

These general metaresolvers are capable of sending many types of identifiers

* icdc:000009
* ncit:C15492
* dbgap:phs000200.v11.p3

### Host based DRS URIs - the second form of DRS URIs

Not all data providers use compact identifiers by registering a prefix as shown above.

The alternative is a host based DRS URI . For example, the globally unique URI for the NCBI DRS is above is:
* drs://locate.be-md.ncbi.nlm.nih.gov/fb1cfb04d3ef99d07c21f9dbf87ccc68

Note that the URI is not a URL that can be used in a script or a browser to get a response. We still need to use a tool that knows how to work with the drs protocol or "scheme".

One benefit of compact ids is that if NCBI were to change the location of its host from be-md (Bethesda, Maryland) the compact ids would remain valid.


### A metaresolver for DRS
We need a tool that knows about different DRS servers and how to pass our request to the right one.

The DRSMetaResolver in fasp-clients will:
* Resolve compact ids
* Resolve host based DRS ids
* Send DRS calls to the correct server
* Handle id's prefixed with drs:// or not

 
The DRSMetaresolver appears like a DRS Client. It supports the two basic DRS functions
* get_object
* get_access_url

 🖐 Run the following examples and review how each kind of DRS id is handled


#### Step 1: As in previous examples create a client
This time our client is a metaresolver

In [None]:
from fasp.loc import DRSMetaResolver
drs_mr = DRSMetaResolver(debug=False)

#### Step 2: Resolve a compact DRS id


In [None]:
drs_mr.get_object('dg.4503:e747c529-a6ee-415f-90b8-e2db631f8ed9')

#### Step 3: Work with a list of ids from different DRS servers
We may find ourselves working with a list of DRS ids from different sources. 

In [None]:
prefixedIDs = [
            'dg.4503:66eeec21-aad0-4a77-8de5-621f05e2d301',
            'dg.4DFC:0e3c5237-6933-4d30-83f8-6ab721096bc7',
            'dg.ANV0:895c5a81-b985-4559-bc8e-cecece550756',
            #'dg.F82A1A:e6eecb29-1ae4-4f65-ae83-9ecf1c496de1',
            #'dg.MD1R:f55b8fed-a938-4cd7-8f39-5ee3cb75c218',
]

#### Step 3: Define a function to use the Metaresolver to send the DRS URI to the right server
We're going to work with a number of lists like this so we'll define a python function to send a list of ids to the DRSMetaResolver we set up above.

 🖐 Click on the following cell to define the function.

In [None]:
def check_list(id_list):
    for drs_id in id_list:
        print(f"DRS URI {drs_id}")
        drs_response = drs_mr.get_object(drs_id)
        num_of_methods = len(drs_response['access_methods'])
        print(f"Full response\n{drs_response}")
        print(f"has {num_of_methods} access_methods")
        print('_'*80)

#### Step 4: Run our function on the list above
This resolves our list of compact DRS ids from different servers.

In [None]:
check_list(prefixedIDs)

#### Step 5: Resolve host based ids

DRSMetaResolver will also make the call to the right DRS server for host based DS ids.

In [None]:
host_based_IDs = [
    'drs://locate.be-md.ncbi.nlm.nih.gov/81b75c18e5def027579f9441f987b8a8'
]
check_list(host_based_IDs)

#### Step 6: Mixed list
* The following shows that lists of mixed host based and CURIEs can be handled
* The DRS Metaresolver can handle DRS ids with or without the drs:// (scheme) on the front

In [None]:
mixedIDs = [
            
    # Curie with scheme
    'drs://dg.4503:66eeec21-aad0-4a77-8de5-621f05e2d301',
    # Curie no scheme
    'dg.4DFC:0e3c5237-6933-4d30-83f8-6ab721096bc7',
    # Host based with scheme
    'drs://dg.ANV0:895c5a81-b985-4559-bc8e-cecece550756',
    # Host based no scheme
    'locate.be-md.ncbi.nlm.nih.gov/81b75c18e5def027579f9441f987b8a8'
]

check_list(mixedIDs)

#### Step 7: Using a time saver so we can focus on what matters to us
As the function we wrote above is likely to be generally useful a version of it was added to DRSMetaResolver. We can call it as

`drs_mr.get_objects(drs_uri_list)`

🖐 Use your python knowledge to try out a couple of things
* Edit the call to get_objects to use the other lists we created above
* Access and print various parts of the DRS responses that will be useful to what you want to do

In [None]:
responses = drs_mr.get_objects(prefixedIDs)
for drs_response in responses:
        num_of_methods = len(drs_response['access_methods'])
        print(f"has {num_of_methods} access_methods")
        print('_'*80)

#### Step 8: More experimentation - if you like to learn that way

The real goal of DRS is to get a URL which can be used to access a file in a cloud (or local) system.



🖐 WUsing the function and code shown below, what can you learn about how to do that 
* How might you use it?
* How will access control affect what you want to do?
* Have you updated the access tokens for the DRS servers you want to access?
* You may have to adapt the examples below for data to which you have access

Take care with examples like the following. If you print URLs to controlled access data you would be sharing protected information. Take special care that you don't publish such data e.g. to GitHub if you share your notebooks there. In general the URLs are only valid for a short time. Even so, hackers could still get insights, like the names of secure storage locations in the cloud. 

In [None]:
import json

access_list = [["dg.4503:66eeec21-aad0-4a77-8de5-621f05e2d301","gs"],
              ["dg.4503:66eeec21-aad0-4a77-8de5-621f05e2d301","s3"]]

urls = drs_mr.get_access_urls(access_list)