# Getting all the results


The CDA provides a custom python tool for searching CDA data. [`Q`](usage/#q) (short for Query) offers several ways to search and filter data, and several input modes:

---
- **<a href="../../QuickStart/usage/#q">Q.()</a>** builds a query that can be used by `run()` or `count()`
- **<a href="../../QuickStart/usage/#qrun">Q.run()</a>** returns data for the specified search 
- **<a href="../../QuickStart/usage/#qcount">Q.count()</a>** returns summary information (counts) data that fit the specified search
- **<a href="../../QuickStart/usage/#columns">columns()</a>** returns entity field names
- **<a href="../../QuickStart/usage/#unique_terms">unique_terms()</a>** returns entity field contents

---
                                                                    
Before we do any work, we need to import several functions from cdapython:
- `Q` and `query` which power the search
- `columns` which lets us view entity field names
- `unique_terms` which lets view entity field contents

We're also importing functions from several other packages to make viewing and manipulating tables easier. The `opt.` settings are pre-configuring how itables should display our tables, with scrolling and paging enabled.
Finally, we're telling cdapython to report it's version so we can be sure we're using the one we mean to:

In [1]:
from cdapython import (
    Q, columns, unique_terms)
import numpy as np
import pandas as pd
from itables import init_notebook_mode, show
init_notebook_mode(all_interactive=True)
import itables.options as opt
opt.maxBytes=0
opt.scrollX="200px"
opt.scrollCollapse=True
opt.paging=True
opt.maxColumns=0
print(Q.get_version())

<IPython.core.display.Javascript object>

The CDA indexes tens of thousands of subjects, researchsubjects, speciments, and their diagnosis data. CDA also indexes more than 45 million files. To keep search results from being overwhelming, CDA limits search results to the first 100 records by default:

In [2]:
myquery = Q('primary_diagnosis_site = "brain"')
brainresults = myquery.subject.run()
brainresults


            
            Offset: 0
            Count: 100
            Total Row Count: 2438
            More pages: True
            


---

- **Offset:** This is how many rows of information we've told the query to skip in the data, here we didn't tell it to skip anything, so the offset is zero
- **Count:** This is how many rows the current page of our results table has. To keep searches fast, we default to pages with 100 rows.
- **Total Row Count:** This is how many rows are in the full results table
- **More pages:** This is always a True or False. False means that our current page has all the available results. True means that we will see only the first 100 results in this table, and will need to page through for more.

---

This preview behaviour is great for search, but not when you are ready to download the information. The `get_all` feature is how you retrieve all the results for your final query and not just the first 100. 

You can have your full results output to a dataframe or a list.

## Results to a dataframe



In [3]:
all_brain_results = brainresults.get_all().to_dataframe()

Output()

In [4]:
all_brain_results # view the dataframe

subject_id,subject_identifier,species,sex,race,ethnicity,days_to_birth,subject_associated_project,vital_status,days_to_death,cause_of_death
Loading... (need help?),,,,,,,,,,


## Results to a list

In [5]:
list_all_brain_results = brainresults.get_all().to_list()
list_all_brain_results

Output()

[{'subject_id': 'TCGA.TCGA-12-1601',
  'subject_identifier': [{'system': 'GDC',
    'field_name': 'case.submitter_id',
    'value': 'TCGA-12-1601'}],
  'species': 'Homo sapiens',
  'sex': 'not reported',
  'race': 'not reported',
  'ethnicity': 'not reported',
  'days_to_birth': None,
  'subject_associated_project': ['TCGA-GBM'],
  'vital_status': 'Not Reported',
  'days_to_death': None,
  'cause_of_death': None},
 {'subject_id': 'TCGA.TCGA-32-2498',
  'subject_identifier': [{'system': 'GDC',
    'field_name': 'case.submitter_id',
    'value': 'TCGA-32-2498'}],
  'species': 'Homo sapiens',
  'sex': 'not reported',
  'race': 'not reported',
  'ethnicity': 'not reported',
  'days_to_birth': None,
  'subject_associated_project': ['TCGA-GBM'],
  'vital_status': 'Not Reported',
  'days_to_death': None,
  'cause_of_death': None},
 {'subject_id': 'CPTAC3 Discovery and Confirmatory.GTEX-Q2AG-0011-R10A-SM-HAKXT',
  'subject_identifier': [{'system': 'PDC',
    'field_name': 'Case.case_id',
    '

# Advanced usage

If you'd like more control over your output, you can use the `paginator`. It will also get all the data, but requires you to write the looping code. By using the paginator and a combination of limits, offsets, and page sizes, you can download all or part of the dataset, at any rate.

In the simplest case, you create an empty dataframe for the data to land in, then use the paginator in a loop to get all the results:

### Dataframes

In [6]:
mydf = pd.DataFrame()
for i in myquery.subject.run().paginator(to_df=True):
    mydf = pd.concat([mydf, i])

In [7]:
mydf  # view the dataframe

Unnamed: 0,subject_id,subject_identifier,species,sex,race,ethnicity,days_to_birth,subject_associated_project,vital_status,days_to_death,cause_of_death
Loading... (need help?),,,,,,,,,,,


### Lists
Saving to a list works similarly to the dataframe call. The differences are:

- initiate a list not a DataFrame
- change `to_list=True`
- change the concat index to `extend()`

In [8]:
mylist = []
for i in myquery.subject.run().paginator(to_list=True):
    mylist.extend(i)

This gives back the correct number of results:

In [9]:
len(mylist)

2438

And we can preview the first result to see that it has the same values:

In [10]:
mylist[0:1]

[{'subject_id': 'TCGA.TCGA-12-1601',
  'subject_identifier': [{'system': 'GDC',
    'field_name': 'case.submitter_id',
    'value': 'TCGA-12-1601'}],
  'species': 'Homo sapiens',
  'sex': 'not reported',
  'race': 'not reported',
  'ethnicity': 'not reported',
  'days_to_birth': None,
  'subject_associated_project': ['TCGA-GBM'],
  'vital_status': 'Not Reported',
  'days_to_death': None,
  'cause_of_death': None}]