### Finding files and data using Data Connect

#### Learning Objectives
Workshop attendees will learn how use the GA4GH Data Connect Service.  

What will participants do as part of the exercise?

 - Understanding how to query data via Data Connect
 - Use Data Connect to find files that can be accessed via DRS
 - Learn how to obtain and use data descriptions (schema)
 - Discover the meaning of codes used in data
 

 #### Icons in this Guide

 🖐 A hands-on section where you will code something or interact with the server
 
### Query files
The approach taken below is using mapping available through subject and specimen data available through the Data Connect API. 

Queries are submitted as SQL queries to one or more tables on the Data Connect server.

As with other examples, first we set up a client to use the API. The server at DNAStack is used in the following examples.

#### Step 1: Set up a Data Connect Client and run a predefined query 

In [None]:
from fasp.search import DataConnectClient
searchClient = DataConnectClient('https://data-connect-trino-public.prod.dnastack.com/')

### Important note
Looking up a data dictionary to discover codes in this way is not what we would typically expect a user to do. Our aim today is to focus on the API and what it is capable of and what it can enable.

Given the information the data schema provide about the data it is possible for developers to create interfaces in their systems which allow new datasources to be integrated as they appear.

In [None]:
query_all = '''SELECT f.sample_name, drs_id bam_drs_id, acc, population, mapped, sequencing_type
FROM thousand_genomes.onek_genomes.ssd_drs s 
join thousand_genomes.onek_genomes.sra_drs_files f on f.sample_name = s.su_submitter_id 
where filetype = 'bam'   
 '''
int_df = searchClient.run_query(query_all, returnType='dataframe')
print("Query complete. Continue with next step.")

In [None]:
def getColValues(info, columns):
    enumVals = {}
    for column in columns:
        var = info['data_model']['properties'][column]
        valueList = []
        for value in var['oneOf']:
            valueList.append(value['const'])
        enumVals[column] = valueList
    return enumVals
    
info1 = searchClient.list_table_info('thousand_genomes.onek_genomes.ssd_drs').schema
enumCols1 = getColValues(info1, ['population'])
info2 = searchClient.list_table_info('thousand_genomes.onek_genomes.sra_drs_files').schema
enumCols2 = getColValues(info2, ['sequencing_type','mapped'])
#print(enumCols1)
#print(enumCols2)

In [None]:
import ipywidgets as widgets
from ipywidgets import interact, interact_manual, IntRangeSlider

def filter_onek(
                 population=enumCols1['population'],
                 sequencing_type=enumCols2['sequencing_type'],
                 mapped=enumCols2['mapped']
                ):
    
    selected_df = int_df.loc[ (int_df['population'] == population) 
                            & (int_df['sequencing_type'] == sequencing_type)
                            & (int_df['mapped'] == mapped)]
    #drs_ids = selected_df['bam_drs_id'].tolist()
    return selected_df

In [None]:
drs_list = interact(filter_onek,  
                 population=enumCols1['population'],
                 sequencing_type=enumCols2['sequencing_type'],
                 mapped=enumCols2['mapped']
                )

#### Step 7 - Combine with DRS Server

The following shows how the SRA DRS server we used in workbook 2-1 can be used to determine where the files can be obtained from. 

🖐 Using the results from one of the queries that you ran above DRS id from the query results.

In [None]:
from fasp.loc import DRSClient

drsClient = DRSClient('https://locate.be-md.ncbi.nlm.nih.gov', public=True, debug=True)
test_id = 'replace_with_a_drs_id'
objInfo = drsClient.get_object(test_id)
objInfo

A second DRS call can be used to obtain a url to access the file from one of the above locations.

In [None]:
access_id = objInfo['access_methods'][0]['access_id']
print('access_id:{}'.format(access_id))
url = drsClient.get_access_url(test_id, access_id=access_id)
print('url:{}'.format(url))