# Working With Results
---

Before we do any work, we need to import several functions from cdapython:
- `Q` and `query` which power the search
- `columns` which lets us view entity field names
- `unique_terms` which lets view entity field contents

We're also asking cdapython to report it's version so we can be sure we're using the one we mean to.

In [1]:
from cdapython import Q, columns, unique_terms, query
import pandas as pd
print(Q.get_version())

2022.6.28


In [2]:
myquery = Q('ResearchSubject.primary_diagnosis_site = "brain" AND ResearchSubject.primary_diagnosis_condition = "Pediatric/AYA Brain Tumors"')


In [3]:
 myquery.researchsubject.count.run()

Total execution time: 3675 ms


system,count
PDC,199

primary_diagnosis_condition,count
Pediatric/AYA Brain Tumors,199

primary_diagnosis_site,count
Brain,199




In [4]:
myquery.diagnosis.count.run()


Total execution time: 3355 ms


system,count
PDC,219

primary_diagnosis,count
"Glioma, NOS",93
"Glioma, malignant",26
"Ganglioglioma, NOS",18
Craniopharyngioma,16
Atypical teratoid/rhabdoid tumor,12
"Medulloblastoma, NOS",22
"Ependymoma, NOS",32

stage,count
Unknown,219

grade,count
G4,34
G1,98
G2,52
High Grade,26
Low Grade,9




In [5]:
myquery.researchsubject.file.count.run()

Total execution time: 3394 ms


system,count
PDC,1288

data_category,count
Raw Mass Spectra,322
Peptide Spectral Matches,644
Processed Mass Spectra,322

file_format,count
tsv,322
vendor-specific,322
mzIdentML,322
mzML,322

data_type,count
Open Standard,644
Text,322
Proprietary,322




In [6]:
researchsubjectresults = pd.DataFrame()
for i in myquery.researchsubject.run().paginator(to_df=True):
    researchsubjectresults = pd.concat([mydf, i])

Total execution time: 3409 ms


NameError: name 'mydf' is not defined

In [None]:
diagnosisresults = pd.DataFrame()
for i in myquery.diagnosis.run().paginator(to_df=True):
    diagnosisresults = pd.concat([mydf, i])

In [None]:
fileresults = pd.DataFrame()
for i in myquery.researchsubject.file.run().paginator(to_df=True):
    fileresults = pd.concat([mydf, i])

In [None]:
resubdiagnosis = researchsubjectresults.set_index("id").join(diagnosisresults.set_index("researchsubject_id"), lsuffix='resub', rsuffix="diag")
diagnosistreatment = diagnosisresults.set_index("researchsubject_id").join(treatmentresults.set_index("researchsubject_id"), lsuffix='diag', rsuffix="treat")
brainall = resubdiagnosis.set_index("id").join(diagnosistreatment.set_index("iddiag"), lsuffix='rd', rsuffix='dt')

In [None]:
resubdiagnosis.to_csv("brainRSdiagnosis.csv")
researchsubjectresults.to_csv("brainRS.csv")
diagnosisresults.to_csv("braindiagnosis.csv")
treatmentresults.to_csv("braindiagnosis.csv")
diagnosistreatment.to_csv("braindiagnosistreatment.csv")
brainall.to_csv("brainall.csv")

In [None]:
brainall

In [None]:
brainall[brainall['subject_idresub'].str.contains("ACRIN-DSC-MR-Brain-102", case=False, na=False)]



In [None]:
columns().to_list()


In [None]:
unique_terms("ResearchSubject.Diagnosis.id")



In [None]:
myquery.researchsubject.run(limit=30)[0]