# Merging Results
---

Before we do any work, we need to import several functions from cdapython:
- `Q` and `query` which power the search
- `columns` which lets us view entity field names
- `unique_terms` which lets view entity field contents

To get the data into mergeable dataframes, we need to import [pandas](https://pandas.pydata.org/).
We're also asking cdapython to report it's version so we can be sure we're using the one we mean to.

In [2]:
from cdapython import Q, columns, unique_terms, query
import pandas as pd
print(Q.get_version())

2022.6.28


In [3]:
myquery = Q('ResearchSubject.primary_diagnosis_site = "brain" AND ResearchSubject.primary_diagnosis_condition = "Pediatric/AYA Brain Tumors"')


In [4]:
 myquery.researchsubject.count.run()

Total execution time: 7230 ms


system,count
PDC,199

primary_diagnosis_condition,count
Pediatric/AYA Brain Tumors,199

primary_diagnosis_site,count
Brain,199




In [5]:
myquery.diagnosis.count.run()


Total execution time: 5020 ms


system,count
PDC,219

primary_diagnosis,count
"Glioma, NOS",93
"Ependymoma, NOS",32
Craniopharyngioma,16
"Medulloblastoma, NOS",22
"Glioma, malignant",26
"Ganglioglioma, NOS",18
Atypical teratoid/rhabdoid tumor,12

stage,count
Unknown,219

grade,count
G4,34
G1,98
G2,52
High Grade,26
Low Grade,9




In [6]:
myquery.researchsubject.file.count.run()

Total execution time: 9916 ms


system,count
PDC,1288

data_category,count
Raw Mass Spectra,322
Peptide Spectral Matches,644
Processed Mass Spectra,322

file_format,count
mzML,322
mzIdentML,322
tsv,322
vendor-specific,322

data_type,count
Proprietary,322
Text,322
Open Standard,644




In [7]:
researchsubjectresults = pd.DataFrame()
for i in myquery.researchsubject.run().paginator(to_df=True):
    researchsubjectresults = pd.concat([researchsubjectresults, i])

Total execution time: 4026 ms


In [9]:
diagnosisresults = pd.DataFrame()
for i in myquery.diagnosis.run().paginator(to_df=True):
    diagnosisresults = pd.concat([diagnosisresults, i])

Total execution time: 3970 ms


In [10]:
fileresults = pd.DataFrame()
for i in myquery.researchsubject.file.run().paginator(to_df=True):
    fileresults = pd.concat([fileresults, i])

Total execution time: 7305 ms


In [8]:
researchsubjectresults

Unnamed: 0,id,identifier,member_of_research_project,primary_diagnosis_condition,primary_diagnosis_site,subject_id
0,d08d5d7d-ff5e-11e9-9a07-0a80fada099c,"[{'system': 'PDC', 'value': 'd08d5d7d-ff5e-11e...",Proteogenomic Analysis of Pediatric Brain Canc...,Pediatric/AYA Brain Tumors,Brain,C21771
1,d08d607d-ff5e-11e9-9a07-0a80fada099c,"[{'system': 'PDC', 'value': 'd08d607d-ff5e-11e...",Proteogenomic Analysis of Pediatric Brain Canc...,Pediatric/AYA Brain Tumors,Brain,C22509
2,d08dd6d3-ff5e-11e9-9a07-0a80fada099c,"[{'system': 'PDC', 'value': 'd08dd6d3-ff5e-11e...",Proteogenomic Analysis of Pediatric Brain Canc...,Pediatric/AYA Brain Tumors,Brain,C717336
3,d08d263d-ff5e-11e9-9a07-0a80fada099c,"[{'system': 'PDC', 'value': 'd08d263d-ff5e-11e...",Proteogenomic Analysis of Pediatric Brain Canc...,Pediatric/AYA Brain Tumors,Brain,C102459
4,d08d365c-ff5e-11e9-9a07-0a80fada099c,"[{'system': 'PDC', 'value': 'd08d365c-ff5e-11e...",Proteogenomic Analysis of Pediatric Brain Canc...,Pediatric/AYA Brain Tumors,Brain,C136284
...,...,...,...,...,...,...
94,d08d4587-ff5e-11e9-9a07-0a80fada099c,"[{'system': 'PDC', 'value': 'd08d4587-ff5e-11e...",Proteogenomic Analysis of Pediatric Brain Canc...,Pediatric/AYA Brain Tumors,Brain,C15744
95,d08d59eb-ff5e-11e9-9a07-0a80fada099c,"[{'system': 'PDC', 'value': 'd08d59eb-ff5e-11e...",Proteogenomic Analysis of Pediatric Brain Canc...,Pediatric/AYA Brain Tumors,Brain,C21402
96,d08d7a66-ff5e-11e9-9a07-0a80fada099c,"[{'system': 'PDC', 'value': 'd08d7a66-ff5e-11e...",Proteogenomic Analysis of Pediatric Brain Canc...,Pediatric/AYA Brain Tumors,Brain,C308853
97,d08d4a5d-ff5e-11e9-9a07-0a80fada099c,"[{'system': 'PDC', 'value': 'd08d4a5d-ff5e-11e...",Proteogenomic Analysis of Pediatric Brain Canc...,Pediatric/AYA Brain Tumors,Brain,C17466


In [None]:
resubdiagnosis = researchsubjectresults.set_index("id").join(diagnosisresults.set_index("researchsubject_id"), lsuffix='resub', rsuffix="diag")
diagnosistreatment = diagnosisresults.set_index("researchsubject_id").join(treatmentresults.set_index("researchsubject_id"), lsuffix='diag', rsuffix="treat")
brainall = resubdiagnosis.set_index("id").join(diagnosistreatment.set_index("iddiag"), lsuffix='rd', rsuffix='dt')

In [None]:
resubdiagnosis.to_csv("brainRSdiagnosis.csv")
researchsubjectresults.to_csv("brainRS.csv")
diagnosisresults.to_csv("braindiagnosis.csv")
treatmentresults.to_csv("braindiagnosis.csv")
diagnosistreatment.to_csv("braindiagnosistreatment.csv")
brainall.to_csv("brainall.csv")

In [None]:
brainall

In [None]:
brainall[brainall['subject_idresub'].str.contains("ACRIN-DSC-MR-Brain-102", case=False, na=False)]



In [None]:
columns().to_list()


In [None]:
unique_terms("ResearchSubject.Diagnosis.id")



In [None]:
myquery.researchsubject.run(limit=30)[0]