# Endpoint Chaining
---

The CDA provides a custom python tool for searching CDA data. [`Q`](usage/#q) (short for Query) offers several ways to search and filter data, and several input modes:

---
- **<a href="../../QuickStart/usage/#q">Q.()</a>** builds a query that can be used by `run()` or `count()`
- **<a href="../../QuickStart/usage/#qrun">Q.run()</a>** returns data for the specified search 
- **<a href="../../QuickStart/usage/#qcount">Q.count()</a>** returns summary information (counts) data that fit the specified search
- **<a href="../../QuickStart/usage/#columns">columns()</a>** returns entity field names
- **<a href="../../QuickStart/usage/#unique_terms">unique_terms()</a>** returns entity field contents

---
                                                                    
Before we do any work, we needs to import these functions cdapython.
We're also telling cdapython to report it's version so we can be sure we're using the one we mean to:

In [1]:
from cdapython import Q, columns, unique_terms, query
import cdapython
print(cdapython.__version__)

2022.6.22


In [None]:
Q.set_default_project_dataset("broad-dsde-dev.cda_dev")
Q.set_host_url("https://cancerdata.dsde-dev.broadinstitute.org/")
Q.get_host_url()
Q.get_default_project_dataset()

## Endpoint Chaining

We're going to build on our <a href="../BasicSearch">previous basic search</a> to see what information exists about cancers that were first diagnosed in the brain. 


In [2]:
myquery = Q('ResearchSubject.primary_diagnosis_site = "brain"')


Previously we looked at subject, research_subject, specimen and file results seperately, but we can also combine these. 

Let's say what we're really interested in is finding analysis done on specimens, so we're looking for files that belong to specimens that match our search. If we search the files endpoint directly, we'll get back all files that meet our search criteria, regardless of whether they are specimens specficially. Instead, we can chain our query to the specimen endpoint and then to the files endpoint and get the combined result:

In [3]:
myqueryspecimenfiles =  myquery.specimen.file.run()
myqueryspecimenfiles

Total execution time: 3535 ms



            QueryID: a5926a1e-ab3b-4000-8e2e-c94751782c84
            
            Offset: 0
            Count: 100
            Total Row Count: 405031
            More pages: True
            

We get back 405031 files that belong to specimens that meet our search criteria. As before, we can preview the results by using the `.to_dataframe()` function:

In [4]:
myqueryspecimenfiles.to_dataframe()

Unnamed: 0,id,identifier,label,data_category,data_type,file_format,associated_project,drs_uri,byte_size,checksum,data_modality,imaging_modality,dbgap_accession_number,subject_id,researchsubject_id,researchsubject_specimen_id
0,2c237a0e-58bf-4385-950e-3ef13b426a3e,"[{'system': 'GDC', 'value': '2c237a0e-58bf-438...",TCGA-E1-A7YJ-01A-21-A44D-20_RPPA_data.tsv,Proteome Profiling,Protein Expression Quantification,TSV,TCGA-LGG,drs://dg.4DFC:2c237a0e-58bf-4385-950e-3ef13b42...,22030,326072a230598e4cce85b419577765d6,Genomic,,,TCGA-E1-A7YJ,bc6fbb3c-870d-40b5-983b-7c88705e020d,59b97f4b-7c13-46eb-b3a3-ff8db73a97f2
1,3e87a1bf-3752-4d7a-a064-1c992401ae6c,"[{'system': 'GDC', 'value': '3e87a1bf-3752-4d7...",TCGA-QH-A6XA-01A-21-A44C-20_RPPA_data.tsv,Proteome Profiling,Protein Expression Quantification,TSV,TCGA-LGG,drs://dg.4DFC:3e87a1bf-3752-4d7a-a064-1c992401...,21962,21a989ca554caf9d0c43a1214fb4d158,Genomic,,,TCGA-QH-A6XA,4289a50b-c6be-495e-9826-53aeb67b6a22,e088b85b-3ef2-4105-9e13-06166e782b8e
2,3eecbeb9-ff63-4ae4-99f4-cf26bc78dcf1,"[{'system': 'GDC', 'value': '3eecbeb9-ff63-4ae...",TCGA-S9-A6U5-01A-21-A44D-20_RPPA_data.tsv,Proteome Profiling,Protein Expression Quantification,TSV,TCGA-LGG,drs://dg.4DFC:3eecbeb9-ff63-4ae4-99f4-cf26bc78...,21988,cfdd32bf4d1fbfbe54c56ff0e08e7923,Genomic,,,TCGA-S9-A6U5,621424ec-67bc-4c42-b0ad-64c5654b5ad9,24332cd7-3a92-4ce3-b7c0-97f42041dfb9
3,49306693-e7b4-45ce-afb2-93caf0d6dcae,"[{'system': 'GDC', 'value': '49306693-e7b4-45c...",TCGA-HT-7603-01A-11-A29X-20_RPPA_data.tsv,Proteome Profiling,Protein Expression Quantification,TSV,TCGA-LGG,drs://dg.4DFC:49306693-e7b4-45ce-afb2-93caf0d6...,21989,e9609a4f4b99ddf299e11f88bbd8d2bd,Genomic,,,TCGA-HT-7603,b435e88b-f105-4c65-b725-126e5a299113,9e8a3c17-abc3-42b8-a786-186885a52bc9
4,5845fe78-86bd-43a0-b365-b9673beada97,"[{'system': 'GDC', 'value': '5845fe78-86bd-43a...",TCGA-HW-7489-01A-21-A29Y-20_RPPA_data.tsv,Proteome Profiling,Protein Expression Quantification,TSV,TCGA-LGG,drs://dg.4DFC:5845fe78-86bd-43a0-b365-b9673bea...,21976,ff84b7d87d0db89ac94ff6fe5a237bf6,Genomic,,,TCGA-HW-7489,6957d45e-7ccb-457d-800d-c2a2037f4f01,c95ec8f5-dc19-4787-bed2-8c4c97082bfb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,76f4b413-aec3-4a8b-8109-24213551a9b8,"[{'system': 'GDC', 'value': '76f4b413-aec3-4a8...",TCGA-06-0190-01A-21-2214-20_RPPA_data.tsv,Proteome Profiling,Protein Expression Quantification,TSV,TCGA-GBM,drs://dg.4DFC:76f4b413-aec3-4a8b-8109-24213551...,10511,d74cf029d0b96cb68526faa014426351,Genomic,,,TCGA-06-0190,74139255-a635-4c87-814d-3dd04ed630a8,5e92a5d5-77a4-401a-ade3-e01c95302ec0
96,7f26a4ff-a560-4936-af95-854ee2dfab56,"[{'system': 'GDC', 'value': '7f26a4ff-a560-493...",TCGA-06-1087-01A-03-1900-20_RPPA_data.tsv,Proteome Profiling,Protein Expression Quantification,TSV,TCGA-GBM,drs://dg.4DFC:7f26a4ff-a560-4936-af95-854ee2df...,24010,0e0b8047b464263c8a3a4f0bba3eb8a2,Genomic,,,TCGA-06-1087,98dd5ea6-8d40-4963-bbae-2d37001d71c8,030e2a08-a519-4f41-bb70-a04f0d53e4ce
97,870e2714-2a1e-477d-a9eb-2294de5112c0,"[{'system': 'GDC', 'value': '870e2714-2a1e-477...",TCGA-06-A6S0-01A-21-A44T-20_RPPA_data.tsv,Proteome Profiling,Protein Expression Quantification,TSV,TCGA-GBM,drs://dg.4DFC:870e2714-2a1e-477d-a9eb-2294de51...,23970,a7d6c089885fe8b203e0677b5d2667a8,Genomic,,,TCGA-06-A6S0,820aea32-8f1c-478b-ab56-8171425cd76b,5247107c-74c3-4a69-aa18-275ab3a7b945
98,88d30ccb-5684-41fb-8e73-8877b9e5c683,"[{'system': 'GDC', 'value': '88d30ccb-5684-41f...",seurat.analysis.tsv,Transcriptome Profiling,Single Cell Analysis,TSV,CPTAC-3,drs://dg.4DFC:88d30ccb-5684-41fb-8e73-8877b9e5...,5010388,25320f6866613f2f056c6c335c173e71,Genomic,,phs001287,C3N-01798,bb1a6a1e-a0fb-46d6-9de2-893efe6717df,c0798a43-c80e-5e46-8397-da70e31610e8


<div class="cdanote" style="background-color:#b3e5d5;color:black;padding:20px;">
    
<h3>Valid Endpoint Chains</h3>

Not all endpoints can be chained together. This is a restriction caused by the data itself. `diagnosis` and `treatment` information does not have files directly attached to it, instead these files are associated with the `researchsubject`, as such both "myquery.treatment.files.run()" and "myquery.diagnosis.files.run()" will fail, as there are no files to retrieve. Valid chains are:
    
<ul>
<li><b>myquery.subject.file.run:</b> This will return all the files that meet the query and that are directly tied to subject</li>
<li><b>myquery.researchsubject.file.run:</b>This will return all the files that meet the query and that are directly tied to researchsubject</li>
<li><b>myquery.specimen.file.run:</b> This will return all the files that meet the query and that are directly tied to specimen</li>
<li><b>myquery.subject.file.count.run:</b> This will return the count of files that meet the query and that are directly tied to subject</li>
<li><b>myquery.researchsubject.file.count.run:</b>This will return the count of files that meet the query and that are directly tied to researchsubject</li>
<li><b>myquery.specimen.file.count.run:</b> This will return the count of files that meet the query and that are directly tied to specimen</li>
</ul>
</div>