# Available Search Terms
---

Before we do any work, we need to import several functions from cdapython:
- `Q` and `query` which power the search
- `columns` which lets us view entity field names
- `unique_terms` which lets view entity field contents

We're also asking cdapython to report it's version so we can be sure we're using the one we mean to.

In [1]:
from cdapython import Q, columns, unique_terms, query
print(Q.get_version())


2022.6.22


In [None]:
Q.set_default_project_dataset("broad-dsde-dev.cda_dev")
Q.set_host_url("https://cancerdata.dsde-dev.broadinstitute.org/")
Q.get_host_url()
Q.get_default_project_dataset()



<div class="cdanote" style="background-color:#b3e5d5;color:black;padding:20px;">
    
You can think of the CDA as a really, really enormous spreadsheet full of data. To search this enormous spreadsheet, you'd want select columns, and then filter rows.
</div>

Accordingly, to see what search fields are available, we use the command `columns`:

In [2]:
columns()


            QueryID: 900bdaa7-c9fc-4859-a46d-c3f86635ffa2
            
            Offset: 0
            Count: 62
            Total Row Count: 62
            More pages: False
            

This output tells us that there are 65 searchable fields, but it doesn't output them directly. Running CDA commands like this first gives you an overall summary of the data you're going to get, and so is nice for doing a gut check. However, if we want to see the data on our screen we can have `columns()` print out it's contents to a list instead:

In [3]:
columns().to_list()

['id',
 'identifier',
 'identifier.system',
 'identifier.value',
 'species',
 'sex',
 'race',
 'ethnicity',
 'days_to_birth',
 'subject_associated_project',
 'vital_status',
 'age_at_death',
 'cause_of_death',
 'Files',
 'ResearchSubject',
 'ResearchSubject.id',
 'ResearchSubject.identifier',
 'ResearchSubject.identifier.system',
 'ResearchSubject.identifier.value',
 'ResearchSubject.member_of_research_project',
 'ResearchSubject.primary_diagnosis_condition',
 'ResearchSubject.primary_diagnosis_site',
 'ResearchSubject.Files',
 'ResearchSubject.Diagnosis',
 'ResearchSubject.Diagnosis.id',
 'ResearchSubject.Diagnosis.identifier',
 'ResearchSubject.Diagnosis.identifier.system',
 'ResearchSubject.Diagnosis.identifier.value',
 'ResearchSubject.Diagnosis.primary_diagnosis',
 'ResearchSubject.Diagnosis.age_at_diagnosis',
 'ResearchSubject.Diagnosis.morphology',
 'ResearchSubject.Diagnosis.stage',
 'ResearchSubject.Diagnosis.grade',
 'ResearchSubject.Diagnosis.method_of_diagnosis',
 'Research

By default, `columns()` returns the first 100 items. If that is too many, you can limit your search to only a specified number: 

In [None]:
columns(limit=10).to_list()

Or you can filter the list for terms that match your interests:

In [None]:
columns().to_list(filters="diagnosis")

<div class="cdawarn" style="background-color:#f9cfbf;color:black;padding:20px;">
<strong>Check your search criteria!</strong>
While available search fields may look like ones you've seen in PDC, GDC or IDC, that does not mean they will contain exactly the same information; several are renamed or restructured in the CDA model. The field name mappings are described in <a href="../../Schema/overview_mapping">CDA Schema Field Mapping.</a>
</div>


We can directly get information about what data populates any of these fields using the `unique_terms()` function. Like `columns`, `unique_terms` defaults to giving us an overview of the results, and we view them the same way:

In [None]:
unique_terms("ResearchSubject.primary_diagnosis_site").to_list()

We can use the same trick here to search for only diagnosis sites that we're interested in:

In [None]:
unique_terms("ResearchSubject.primary_diagnosis_site").to_list(filters="lung")


We can use this same logic to look for partial matches. For instance, if I'm not sure whether the data I'm interested in would be labeled as "uterine" or "uterus" I might search for just "uter"

In [None]:
unique_terms("ResearchSubject.primary_diagnosis_site").to_list(filters="uter")

Success! Not only are there multiple ways that "Uterus" is specified in the CDA data, I now also know that there are also data for specific uterine tissues. 

---

<div class="cdawarn" style="background-color:#f9cfbf;color:black;padding:20px;">
<strong>Check your search terms!</strong>
If you run into unexpected results when running a search, be sure that you're searching all the terms you want. CDA data is not yet harmonized across centers, so there are many cases where a single term search will not return all the information you need, however the CDA provides tools that make it easy to search all forms of a term to enable cross dataset search.
</div>

---


Explore the available terms by changing which table, how many results, and which unique terms you request. Once you have found terms you're interested in, head to <a href="../BasicSearch">Basic Search</a> to build simple queries.