# Getting Results (Pagination)


The CDA provides a custom python tool for searching CDA data. [`Q`](usage/#q) (short for Query) offers several ways to search and filter data, and several input modes:

---
- **<a href="../../QuickStart/usage/#q">Q.()</a>** builds a query that can be used by `run()` or `count()`
- **<a href="../../QuickStart/usage/#qrun">Q.run()</a>** returns data for the specified search 
- **<a href="../../QuickStart/usage/#qcount">Q.count()</a>** returns summary information (counts) data that fit the specified search
- **<a href="../../QuickStart/usage/#columns">columns()</a>** returns entity field names
- **<a href="../../QuickStart/usage/#unique_terms">unique_terms()</a>** returns entity field contents

---
                                                                    
Before we do any work, we needs to import these functions cdapython.
We're also telling cdapython to report it's version so we can be sure we're using the one we mean to:

In [1]:
from cdapython import Q, columns, unique_terms, query
import pandas as pd 
print(Q.get_version())

2022.6.22


In [None]:
Q.set_default_project_dataset("broad-dsde-dev.cda_dev")
Q.set_host_url("https://cancerdata.dsde-dev.broadinstitute.org/")
Q.get_host_url()
Q.get_default_project_dataset()

The CDA indexes tens of thousands of subjects, researchsubjects, speciments, and their diagnosis data. CDA also indexes more than 45 million files. To keep search results from being overwhelming, CDA limits search results to the first 100 records by default:

In [13]:
myquery = Q('ResearchSubject.primary_diagnosis_site = "brain"')
myquery.subject.run()

Total execution time: 3213 ms



            QueryID: 7b243902-1151-438d-a2e8-3b2459257019
            
            Offset: 0
            Count: 100
            Total Row Count: 2314
            More pages: True
            


---

- **Offset:** This is how many rows of information we've told the query to skip in the data, here we didn't tell it to skip anything, so the offset is zero
- **Count:** This is how many rows the current page of our results table has. To keep searches fast, we default to pages with 100 rows.
- **Total Row Count:** This is how many rows are in the full results table
- **More pages:** This is always a True or False. False means that our current page has all the available results. True means that we will see only the first 100 results in this table, and will need to page through for more.

---

This preview behaviour is great for search, but not when you are ready to download the information. The pagination feature is how you retreive all the results for your final query and not just the first 100. 

You can have your full results output to a dataframe or a list.

## Results to a dataframe

To use it, you create an empty dataframe for the data to land in, then use the paginator in a loop to get all the results:

In [15]:
mydf = pd.DataFrame()
for i in myquery.subject.run().paginator(to_df=True):
    mydf = pd.concat([mydf, i])

Total execution time: 3341 ms


In [16]:
mydf  # view the dataframe

Unnamed: 0,id,identifier,species,sex,race,ethnicity,days_to_birth,subject_associated_project,vital_status,age_at_death,cause_of_death
0,900-00-5445,"[{'system': 'IDC', 'value': '900-00-5445'}]",Homo sapiens,,,,,[rembrandt],,,
1,C16974,"[{'system': 'PDC', 'value': 'C16974'}]",Homo sapiens,male,white,not hispanic or latino,,[Proteogenomic Analysis of Pediatric Brain Can...,Alive,,Not Reported
2,C270477,"[{'system': 'PDC', 'value': 'C270477'}]",Homo sapiens,male,white,not hispanic or latino,,[Proteogenomic Analysis of Pediatric Brain Can...,Alive,,Not Reported
3,C30012,"[{'system': 'PDC', 'value': 'C30012'}]",Homo sapiens,male,white,not hispanic or latino,,[Proteogenomic Analysis of Pediatric Brain Can...,Dead,,Not Reported
4,C38868,"[{'system': 'PDC', 'value': 'C38868'}]",Homo sapiens,female,white,not hispanic or latino,,[Proteogenomic Analysis of Pediatric Brain Can...,Alive,,Not Reported
...,...,...,...,...,...,...,...,...,...,...,...
9,TCGA-HT-A617,"[{'system': 'GDC', 'value': 'TCGA-HT-A617'}, {...",Homo sapiens,male,american indian or alaska native,not reported,-17331.0,"[TCGA-LGG, tcga_lgg]",Alive,,
10,TCGA-QH-A65X,"[{'system': 'GDC', 'value': 'TCGA-QH-A65X'}]",Homo sapiens,female,white,not hispanic or latino,-10440.0,[TCGA-LGG],Alive,,
11,TCGA-S9-A6WQ,"[{'system': 'GDC', 'value': 'TCGA-S9-A6WQ'}]",Homo sapiens,female,white,not hispanic or latino,-21133.0,[TCGA-LGG],Alive,,
12,TCGA-S9-A7IZ,"[{'system': 'GDC', 'value': 'TCGA-S9-A7IZ'}]",Homo sapiens,female,white,not hispanic or latino,-17874.0,[TCGA-LGG],Alive,,


## Results to a list

Pagination to a list works similarly to the dataframe call. The differences are:

- initiate a list not a DataFrame
- change `to_list=True`
- change the concat index to `extend()`

In [45]:
mylist = []
for i in myquery.subject.run().paginator(to_list=True):
    mylist.extend(i)

Total execution time: 3516 ms


This gives back the correct number of results:

In [49]:
len(mylist)
#2314

2314

And we can preview the first result to see that it has the same values:

In [51]:
mylist[0:1]

[{'id': '900-00-5445',
  'identifier': [{'system': 'IDC', 'value': '900-00-5445'}],
  'species': 'Homo sapiens',
  'sex': None,
  'race': None,
  'ethnicity': None,
  'days_to_birth': None,
  'subject_associated_project': ['rembrandt'],
  'vital_status': None,
  'age_at_death': None,
  'cause_of_death': None}]