# Chaining Search Terms
---

Before we do any work, we need to import several functions from cdapython:


- `Q` and `query` which power the search
- `columns` which lets us view entity field names
- `unique_terms` which lets view entity field contents
    
We're also asking cdapython to report it's version so we can be sure we're using the one we mean to.

In [1]:
from cdapython import Q, columns, unique_terms, query
import cdapython
print(cdapython.__version__)
Q.set_host_url("http://35.192.60.10:8080/")

2022.5.9



The CDA provides a custom python tool for searching CDA data. [`Q`](usage/#q) (short for Query) offers several ways to search and filter data, and several input modes:

---

- **[Q.run()](../../../Documentation/usage/#qrun)** returns data for the specified search 
- **[Q.count()](../../../Documentation/usage/#qcounts)** returns summary information (counts) data that fit the specified search

---

## Chaining

We're going to build on our [previous search](../BasicSearch) to see what information exists about cancers that were first diagnosed in the brain. 


In [2]:
myquery = Q('ResearchSubject.primary_diagnosis_site = "brain"')


Previously we looked at subject, research_subject, specimen and file results seperately, but we can also combine these. 

Let's say what we're really interested in is finding analysis done on specimens, so we're looking for files that belong to specimens that match our search. If we search the files endpoint directly, we'll get back all files that meet our search criteria, regardless of whether they are specimens specficially. Instead, we can chain our query to the specimen endpoint and then to the files endpoint and get the combined result:

In [3]:
myqueryspecimenfiles =  myquery.specimen.files.run()
myqueryspecimenfiles

Total execution time: 3622 ms



            QueryID: 3845d4c4-0e99-490c-b2f3-fa59288df875
            Query:SELECT results.* EXCEPT(rn) FROM (SELECT ROW_NUMBER() OVER (PARTITION BY all_Files_v3_0_w_RS.id) as rn, all_Files_v3_0_w_RS.id AS id, all_Files_v3_0_w_RS.identifier AS identifier, all_Files_v3_0_w_RS.label AS label, all_Files_v3_0_w_RS.data_category AS data_category, all_Files_v3_0_w_RS.data_type AS data_type, all_Files_v3_0_w_RS.file_format AS file_format, all_Files_v3_0_w_RS.associated_project AS associated_project, all_Files_v3_0_w_RS.drs_uri AS drs_uri, all_Files_v3_0_w_RS.byte_size AS byte_size, all_Files_v3_0_w_RS.checksum AS checksum, all_Files_v3_0_w_RS.data_modality AS data_modality, all_Files_v3_0_w_RS.imaging_modality AS imaging_modality, all_Files_v3_0_w_RS.dbgap_accession_number AS dbgap_accession_number FROM gdc-bq-sample.dev.all_Subjects_v3_0_w_RS AS all_Subjects_v3_0_w_RS LEFT JOIN UNNEST(all_Subjects_v3_0_w_RS.ResearchSubject) AS _ResearchSubject LEFT JOIN UNNEST(_ResearchSubject.Specimen) AS 

We get back 50495 files that belong to specimens that meet our search criteria. As before, we can preview the results by using the `.to_dataframe()` function:

In [4]:
myqueryspecimenfiles.to_dataframe()

Unnamed: 0,id,identifier,label,data_category,data_type,file_format,associated_project,drs_uri,byte_size,checksum,data_modality,imaging_modality,dbgap_accession_number
0,1676bc16-a120-4951-bcb9-b3e0599b45af,"[{'system': 'GDC', 'value': '1676bc16-a120-495...",7dee3b3f-7f68-493b-a99c-49ae65675609.genie.ali...,Simple Nucleotide Variation,Masked Annotated Somatic Mutation,MAF,GENIE-MSK,drs://dg.4DFC:1676bc16-a120-4951-bcb9-b3e0599b...,1051,ecabcaa16c4f79262c996b5df904d3c3,Genomic,,
1,1825d954-56e3-4faa-991e-56e28d23f31e,"[{'system': 'GDC', 'value': '1825d954-56e3-4fa...",415b62cf-1ab3-4432-9fd4-c326718d8142.genie.ali...,Copy Number Variation,Gene Level Copy Number Scores,TSV,GENIE-MSK,drs://dg.4DFC:1825d954-56e3-4faa-991e-56e28d23...,22932,78fb97b022c9790c89888ba62bfb222e,Genomic,,
2,3c4c6124-e58f-497a-8a20-e5ade9caa765,"[{'system': 'GDC', 'value': '3c4c6124-e58f-497...",c7a985b2-bc10-4e58-86e8-0ef41f41f80c.genie.ali...,Simple Nucleotide Variation,Masked Annotated Somatic Mutation,MAF,GENIE-JHU,drs://dg.4DFC:3c4c6124-e58f-497a-8a20-e5ade9ca...,1053,d7dabf1852034e3fc333ae68b38f2281,Genomic,,
3,9148502d-cb91-41c6-b382-62639010ff4e,"[{'system': 'GDC', 'value': '9148502d-cb91-41c...",e1c60464-3f4f-400b-bbfa-0e96f826d901.genie.ali...,Simple Nucleotide Variation,Masked Annotated Somatic Mutation,MAF,GENIE-MSK,drs://dg.4DFC:9148502d-cb91-41c6-b382-62639010...,3825,20d837abab0ffa04e06e5de5f35108e8,Genomic,,
4,7aef2592-bb88-44fe-886c-22e0e3fd4aac,"[{'system': 'GDC', 'value': '7aef2592-bb88-44f...",9010b2f6-5eb0-4c68-b42a-fb271d8f5ba2.genie.ali...,Simple Nucleotide Variation,Masked Annotated Somatic Mutation,MAF,GENIE-MSK,drs://dg.4DFC:7aef2592-bb88-44fe-886c-22e0e3fd...,1051,3f36395c16a0fc18171d490f1039efcb,Genomic,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,d91dbc00-747a-4c78-9fe9-d3d599df2656,"[{'system': 'GDC', 'value': 'd91dbc00-747a-4c7...",41a6475c-09bf-4b0c-b568-a5ce5d8f7a23.genie.ali...,Simple Nucleotide Variation,Masked Annotated Somatic Mutation,MAF,GENIE-GRCC,drs://dg.4DFC:d91dbc00-747a-4c78-9fe9-d3d599df...,3465,fa2269804bffbba348da639311efe254,Genomic,,
96,87b76b2b-7f27-49b1-8989-c1c3820e5661,"[{'system': 'GDC', 'value': '87b76b2b-7f27-49b...",39444fc1-0c79-44c6-a1dc-9f86e72835e1.genie.ali...,Simple Nucleotide Variation,Masked Annotated Somatic Mutation,MAF,GENIE-GRCC,drs://dg.4DFC:87b76b2b-7f27-49b1-8989-c1c3820e...,2196,68f202798873dfdd85b5680cb6263388,Genomic,,
97,c609be50-3cc3-447a-acbc-4d82b4c30e72,"[{'system': 'GDC', 'value': 'c609be50-3cc3-447...",fa186f29-1bd9-4b82-9daf-d606a6ddef11.genie.ali...,Simple Nucleotide Variation,Masked Annotated Somatic Mutation,MAF,GENIE-GRCC,drs://dg.4DFC:c609be50-3cc3-447a-acbc-4d82b4c3...,3565,b1b7e7322180b6764e27faac63e77bc2,Genomic,,
98,e47498b5-27ae-46f7-8ee5-cf310360c598,"[{'system': 'GDC', 'value': 'e47498b5-27ae-46f...",9a9009af-e4ce-4477-9af7-851c82e61802.genie.ali...,Simple Nucleotide Variation,Masked Annotated Somatic Mutation,MAF,GENIE-MDA,drs://dg.4DFC:e47498b5-27ae-46f7-8ee5-cf310360...,1052,919e44b2808e2de2281ba9275698ba1d,Genomic,,
