## Introduction
A quick example of how to filter sequencing file info using the UDN Gateway API.  See the ***Sequencing Files Download Guide*** for information on how to download specific files using data obtained in this guide. 

## Imports
This example usese python, specifically the requests and json packages.  Search google for more info.

In [19]:
import requests
import json
import pandas

## Authentication
A authorization token is needed to access the UDN Gateway API.  This token is shown in the dictionary below as the `Authorization` token. Login to the web UDN Gateway and navigate to `/login/token/` to obtain an authorization token. 

A second token is needed to access the details about files stored in FileService which is the applicaiton that manages metadata for all UDN Gateway sequencing files.  This token is shown in the dictionary below as `FSAuthorization`.  The `FSAuthorization` key is specific to the UDN Gateway API.  The UDN Gateway API uses the FileService Token to obtain detailed information about the files from FileService.     

Login to FileService and navigate to `/filemaster/token/` to obtain an authorization token. 

development: https://fileservice-ci.dbmi.hms.harvard.edu/

production: https://fileservice.dbmi.hms.harvard.edu/

In [16]:
gateway_token = 'xxxxx'
fileservice_token = 'xxxxx'

In [17]:
headers = {'Content-Type': 'application/json', 
           'Authorization': 'Token ' + gateway_token, 
           'FSAuthorization': 'FSToken ' + fileservice_token}

In [None]:
system_url = 'https://gateway.undiagnosed.hms.harvard.edu/api'

## Sequence File Metadata
Access Sequence File metadata from the UDN Gateway API.  A GET request to 

```
/api/sequence/files
``` 

returns a list of metadata dictionaries for every sequence file in the UDN Gateway.  In the code example below the `[0]` on the last line prints the first JSON object in the list. 

In [18]:
url = system_url + '/sequence/files/'
r = requests.get(url, headers=headers)
# check the status
print r
# look at the data
r.json()[0]

<Response [200]>


{u'complete': True,
 u'filename': u'616273-UDN59528_HKWJWADXX-2-ID03_DNASeq_BWA_AlignerCommandsSubstituted.xml',
 u'fileserviceloc': 3,
 u'fileserviceuuid': u'77f5ca8c-f6be-4ed0-884e-bb6d7a796810',
 u'sequencingfilesstuff': u'None',
 u'sequencingsites': 1,
 u'uploaded': None,
 u'uuid': u'5167fea9-af38-42f9-9384-e5ced2a5fef4'}

## Data Analysis
Once we have the list of sequence files, we can do some processing on that data like... put it into a DataFrame

In [24]:
seqfile_df = pandas.DataFrame(r.json())
seqfile_df.head()

Unnamed: 0,complete,filename,fileserviceloc,fileserviceuuid,sequencingfilesstuff,sequencingsites,uploaded,uuid
0,True,616273-UDN59528_HKWJWADXX-2-ID03_DNASeq_BWA_Al...,3,77f5ca8c-f6be-4ed0-884e-bb6d7a796810,,1,,5167fea9-af38-42f9-9384-e5ced2a5fef4
1,True,616273-UDN59528_HKWJWADXX-2-ID03.SNPs_Annotate...,21,88ae6bfe-13b7-4930-9698-3380e9add3fc,,1,,f73fac75-222d-4963-8f5a-02a34af0f429
2,True,616273-UDN59528_HKWJWADXX-2-ID03.INDELs_Annota...,24,b25c43ad-d531-4d77-a231-e5ac56f13555,,1,,fccdad8d-7083-49d5-ab04-a84e65c47a77
3,True,616273-UDN59528_HKWJWADXX-2-ID03.tsv.all.tsv.i...,27,80fca357-9c8c-4499-8b35-735b61b42d80,,1,,c0630445-dbcd-49e2-8a7c-dec11f6559d3
4,True,616273-UDN59528_HKWJWADXX-2-ID03_2_sequence.tx...,30,c705b3cb-fd50-47be-b868-3020d8526306,,1,,ff3d6eba-d106-463b-be9e-ceeaefb002bd


## Filtering
Filter the dataframe for just vcf files...

In [28]:
vcf_files_df = seqfile_df[seqfile_df['filename'].str.contains('.vcf')]
vcf_files_df.head()

Unnamed: 0,complete,filename,fileserviceloc,fileserviceuuid,sequencingfilesstuff,sequencingsites,uploaded,uuid
1,True,616273-UDN59528_HKWJWADXX-2-ID03.SNPs_Annotate...,21,88ae6bfe-13b7-4930-9698-3380e9add3fc,,1,,f73fac75-222d-4963-8f5a-02a34af0f429
2,True,616273-UDN59528_HKWJWADXX-2-ID03.INDELs_Annota...,24,b25c43ad-d531-4d77-a231-e5ac56f13555,,1,,fccdad8d-7083-49d5-ab04-a84e65c47a77
32,True,616817-UDN58253_HKWFNADXX-2-ID02.INDELs_Annota...,0,22887e0d-bb01-4b6f-b36b-1b171e6cf4c1,,1,,45a252d3-8712-42b5-84de-e99d06389d11
37,True,616817-UDN58253_HKWFNADXX-2-ID02.SNPs_Annotate...,0,5768b0b4-f55a-41be-a571-68ff87a6c2e6,,1,,bd02fd98-7192-4c8e-b023-181eb9ab608e
58,True,616816-UDN25969_HKWFNADXX-2-ID01.INDELs_Annota...,0,53ed6cc6-1834-420a-9b89-374bd3dc55a4,,1,,1486e71a-2d58-4a00-97b3-0d2d0a91ed4c


## Process
Or if you want execute a process on each vcf file, just iterate over the dataframe

In [33]:
for index, row in vcf_files_df.iterrows():
    # let's just look at the first few...
    if index < 100:
        print row['filename'] + ', ' + row['fileserviceuuid']

616273-UDN59528_HKWJWADXX-2-ID03.SNPs_Annotated.vcf, 88ae6bfe-13b7-4930-9698-3380e9add3fc
616273-UDN59528_HKWJWADXX-2-ID03.INDELs_Annotated.vcf, b25c43ad-d531-4d77-a231-e5ac56f13555
616817-UDN58253_HKWFNADXX-2-ID02.INDELs_Annotated.vcf, 22887e0d-bb01-4b6f-b36b-1b171e6cf4c1
616817-UDN58253_HKWFNADXX-2-ID02.SNPs_Annotated.vcf, 5768b0b4-f55a-41be-a571-68ff87a6c2e6
616816-UDN25969_HKWFNADXX-2-ID01.INDELs_Annotated.vcf, 53ed6cc6-1834-420a-9b89-374bd3dc55a4
616816-UDN25969_HKWFNADXX-2-ID01.SNPs_Annotated.vcf, e5211428-04f1-4924-969b-acd4da95564a
616818-UDN13334_HKWFNADXX-2-ID03.INDELs_Annotated.vcf, 3783b723-baa8-4b72-9938-dbfd62b45bb9
616818-UDN13334_HKWFNADXX-2-ID03.SNPs_Annotated.vcf, ee11b8e8-90ab-4676-a2a3-ffe49df13f71
/clinical/MCW2015-000554/Bam/MCW2015-000554_mbq20_raw.vcf, bdb4d6cf-929f-4ff5-b2d5-b3867f20b3ff
/clinical/MCW2015-000554/Bam/MCW2015-000554_mbq20_raw.vcf.idx, 7ff7bd58-fd22-45e6-b390-b04a339e5c00
/clinical/MCW2015-000554/Bam/MCW2015-000554_mbq20_raw.vcf.out, 7dbf2a0b-580d

## Download
Check out the ***Sequence File Download Guide*** guide for how to obtain a download link for each file