# Welcome to the CDA example notebook set. 

Below you will find links to open the interactive versions of all of our example notebooks, or you can browse them by clicking the folder icon on the left.

## Notebook Quicklinks

- [Available Search Terms](./SearchTerms.ipynb): Shows how to find out what you can search
- [Basic Search](./BasicSearch.ipynb): Demonstrates the basic search functions
- [Search Summaries](./DataSummaries.ipynb): How to return summary statistics instead of raw search results
- [Search Operators](./Operators.ipynb): How to combine many search terms to get specific results
- [Endpoint chaining](./Chaining.ipynb): How to narrow your search to files only from a specific endpoint
- [Building a cohort](./BuildingACohort.ipynb): An end-to-end walkthrough for how to find, filter, merge and save data to build a cohort.

## How to use these notebooks

ipython notebooks let you intersperse code with text, and run bits of code one at a time, in any order. MyBinder is a service that makes you a small personal cloud computer when you click the launch button.

If you want to run a notebook as is, choose a notebook, then either press the play button at the top of the panel, or use the keyboard shortcut command + enter (control + enter on windows) to execute each command.

If you want to use these notebooks as a starting place for your own search, you can safely change any parameters. Since this is a small cloud computer, if the instance breaks, or you can't get the code back to working again, you can always close this browser tab and click the `launchbinder` button again to get a new, clean version. 

<div style="background-color:#f9cfbf;color:#000000;padding:20px;">
<strong>Save your changes locally!</strong>
Any changes you make to the code here will disappear as soon as you close this window. If you have used this binder instance to do your own searches, remember to download the changed ipython notebook(s) to your local computer before you close. You can always upload your changed notebooks to a new MyBinder instance next time you work on it. 
</div>

## About CDA python

The CDA provides a custom python tool for searching CDA data. `Q` (short for Query) offers several ways to search and filter data, and several input modes:

---
- **Q.()** builds a query that can be used by `run()` or `count()`
- **Q.run()** returns data for the specified search 
- **Q.count()** returns summary information (counts) data that fit the specified search
- **columns()** returns entity field names
- **unique_terms()** returns entity field contents

---


## About CDA data

<div class="cdanote" style="background-color:#b3e5d5;color:black;padding:20px;">
    
CDA data comes from three sources:
<ul>
<li><b>The <a href="https://proteomic.datacommons.cancer.gov/pdc/"> Proteomic Data Commons</a> (PDC)</b></li>
<li><b>The <a href="https://gdc.cancer.gov/">Genomic Data Commons</a> (GDC)</b></li>
<li><b>The <a href="https://datacommons.cancer.gov/repository/imaging-data-commons">Imaging Data Commons</a> (IDC)</b></li>
</ul> 
    
The CDA makes this data searchable in five main endpoints:

<ul>
<li><b>subject:</b> A patient entity captures the study-independent metadata for research subjects. Human research subjects are usually not traceable to a particular person to protect the subjects privacy.</li>
<li><b>researchsubject:</b> A research subject is the entity of interest in a specific research study or project, typically a human being or an animal, but can also be a device, group of humans or animals, or a tissue sample. Human research subjects are usually not traceable to a particular person to protect the subjects privacy. This entity plays the role of the case_id in existing data. A subject who participates in 3 studies will have 3 researchsubject IDs.</li>
<li><b>specimen:</b> Any material taken as a sample from a biological entity (living or dead), or from a physical object or the environment. Specimens are usually collected as an example of their kind, often for use in some investigation.</li>
<li><b>file:</b> A unit of data about subjects, researchsubjects, specimens, or their associated information.</li>
<li><b>mutation:</b> Molecular data about specific mutations, currently limited to the TCGA-READ project from GDC.</li>
</ul>
and two endpoints that offer deeper information about data in the researchsubject endpoint:
<ul>
<li><b>diagnosis:</b> A collection of characteristics that describe an abnormal condition of the body as assessed at a point in time. May be used to capture information about neoplastic and non-neoplastic conditions.</li>
<li><b>treatment:</b> Represent medication administration or other treatment types.</li>
</ul>


Any metadata field can be searched from any endpoint, the only difference between search types is what type of data is returned by default. This means that you can think of the CDA as a really, really enormous spreadsheet full of data. To search this enormous spreadsheet, you'd want select columns, and then filter rows.
</div>