# Summarize Search Results

The CDA provides a custom python tool for searching CDA data. [`Q`](usage/#q) (short for Query) offers several ways to search and filter data, and several input modes:

---
- **<a href="../../QuickStart/usage/#q">Q.()</a>** builds a query that can be used by `run()` or `count()`
- **<a href="../../QuickStart/usage/#qrun">Q.run()</a>** returns data for the specified search 
- **<a href="../../QuickStart/usage/#qcount">Q.count()</a>** returns summary information (counts) data that fit the specified search
- **<a href="../../QuickStart/usage/#columns">columns()</a>** returns entity field names
- **<a href="../../QuickStart/usage/#unique_terms">unique_terms()</a>** returns entity field contents

---
                                                                    
Before we do any work, we needs to import these functions cdapython.
We're also telling cdapython to report it's version so we can be sure we're using the one we mean to:

In [1]:
from cdapython import Q, columns, unique_terms, query
import numpy as np
import pandas as pd
from itables import init_notebook_mode, show
init_notebook_mode(all_interactive=True)
import itables.options as opt
opt.maxBytes=0
opt.scrollX="200px"
opt.scrollCollapse=True
opt.paging=True
opt.maxColumns=0
print(Q.get_version())

<div class="cdanote" style="background-color:#b3e5d5;color:black;padding:20px;">
    
CDA data comes from three sources:
<ul>
<li><b>The <a href="https://proteomic.datacommons.cancer.gov/pdc/"> Proteomic Data Commons</a> (PDC)</b></li>
<li><b>The <a href="https://gdc.cancer.gov/">Genomic Data Commons</a> (GDC)</b></li>
<li><b>The <a href="https://datacommons.cancer.gov/repository/imaging-data-commons">Imaging Data Commons</a> (IDC)</b></li>
</ul> 
    
The CDA makes this data searchable in four main endpoints:

<ul>
<li><b>subject:</b> A patient entity captures the study-independent metadata for research subjects. Human research subjects are usually not traceable to a particular person to protect the subjects privacy.</li>
<li><b>researchsubject:</b> A research subject is the entity of interest in a specific research study or project, typically a human being or an animal, but can also be a device, group of humans or animals, or a tissue sample. Human research subjects are usually not traceable to a particular person to protect the subjects privacy. This entity plays the role of the case_id in existing data. An individual who participates in 3 studies will have 3 researchsubject IDs</li>
<li><b>specimen:</b> Any material taken as a sample from a biological entity (living or dead), or from a physical object or the environment. Specimens are usually collected as an example of their kind, often for use in some investigation.</li>
<li><b>file:</b> A unit of data about subjects, researchsubjects, specimens, or their associated information</li>
</ul>
    
And two endpoints that offer deeper information about data in the researchsubject endpoint:
<ul>
<li><b>diagnosis:</b> A collection of characteristics that describe an abnormal condition of the body as assessed at a point in time. May be used to capture information about neoplastic and non-neoplastic conditions.</li>
<li><b>treatment:</b> Represent medication administration or other treatment types.</li>
</ul>
Any metadata field can be searched from any endpoint, the only difference between search types is what type of data is returned by default. This means that you can think of the CDA as a really, really enormous spreadsheet full of data. To search this enormous spreadsheet, you'd want select columns, and then filter rows.
</div>


If you are looking to build a cohort of distinct individuals who meet some criteria, search by `subject`. If you want to build a cohort, but are particularly interested in studies rather than the participates per se, search by `researchsubject`. If you are looking for biosamples that can be ordered or a specific format of information (for e.g. histological slides) start with `specimen`. If you are primarily looking for files you can reuse for your own analysis, start with `file`.

In CDA search, these concepts can also be chained together, so you can look specifically for specimen subjects, or researchsubject diagnoses. In the four 'main' tables, all of the rows will have one or more files associated with them that can be directly found by chaining, as in specimen files. Diagnosis and treatment do not have files directly associated with them and so can only be used to find files in conjunction with the other searches.

In all cases, any search can use any metadata field, the only difference between search types is what type of data you return by default. 



## Getting simple summary data

Let's try a broad search of the CDA to see what information exists about cancers that were first diagnosed in the brain. To run this simple search, we would first construct a query in `Q` and save it to a variable `myquery`. This is the same query we ran in the <a href="../BasicSearch">Basic Search</a> notebook:

In [2]:
myquery = Q('primary_diagnosis_site = "brain"')


<div class="cdawarn" style="background-color:#f9cfbf;color:black;padding:20px;">
<h3>Where did those terms come from?</h3>
    
If you aren't sure how we knew what terms to put in our search, please refer back to the <a href="../SearchTerms">What search terms are available?</a> notebook. 
</div>

### Overall summary

You can get a quick summary of how many unique specimens, treatments, diagnoses, researchsubjects and subjects meet your search criteria by chaining a `count` command into the basic `run` call. 

In [3]:
myquery.count.run()



These numbers are how many total rows of data will come back when querying the various endpoints.



### subject summary

We can also add `count`to the other run calls we did in the <a href="../BasicSearch">Basic Search</a> notebook to get more detailed summaries:

In [4]:
subjectresults = myquery.subject.count.run()

In [5]:
myquery.subject.count.run()

subject_identifier_system,count
IDC,1955
PDC,309
GDC,1454

sex,count
,748
male,980
female,653
not reported,3

race,count
,748
white,1311
not reported,136
Unknown,21
black or african american,96
asian,33
not allowed to collect,25
other,9
american indian or alaska native,4
native hawaiian or other pacific islander,1

ethnicity,count
,748
not hispanic or latino,1285
not reported,219
Unknown,22
hispanic or latino,85
not allowed to collect,25

cause_of_death,count
,2098
Not Reported,200
Cancer Related,63
Infection,3
Not Cancer Related,9
Unknown,9
Surgical Complications,2




Since we save the output as a variable, we need to look at the variable to see the results:

In [6]:
subjectresults

subject_identifier_system,count
IDC,1955
PDC,309
GDC,1454

sex,count
,748
male,980
female,653
not reported,3

race,count
,748
white,1311
not reported,136
Unknown,21
black or african american,96
asian,33
not allowed to collect,25
other,9
american indian or alaska native,4
native hawaiian or other pacific islander,1

ethnicity,count
,748
not hispanic or latino,1285
not reported,219
Unknown,22
hispanic or latino,85
not allowed to collect,25

cause_of_death,count
,2098
Not Reported,200
Cancer Related,63
Infection,3
Not Cancer Related,9
Unknown,9
Surgical Complications,2




By default, the results are displayed as a table for easy previewing of the data. Since we queried the `subject` endpoint, our default results tell us `subject` level information, that is, information about unique individuals: their sex, race, age, species, etc. Using counts gives us back a nice pivot table type summary of the countable fields for Subjects. Note that above the table it also tells you the total subject count, as well as how many files are associated with those subjects.


---

<div class="cdadefine" style="background-color:#add9e5;color:black;padding:20px;">

<h3>Subject Field Definitions</h3>

<i>A patient entity captures the study-independent metadata for research subjects. Human research subjects are usually not traceable to a particular person to protect the subjects privacy.</i>
    
    
    
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;margin:0px auto;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-7zrl{text-align:left;vertical-align:bottom}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
<tbody>
  <tr>
    <td class="tg-0lax"> id (`total`)</td>
    <td class="tg-0lax"> The overall number of subjects returned.</td>
  </tr>
  <tr>
    <td class="tg-0lax"> files</td>
    <td class="tg-0lax"> The number of files that match this search.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">identifier.value(`system`)</td>
    <td class="tg-0lax"> The identifier for the data provider.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">species</td>
    <td class="tg-0lax"> The taxonomic group (e.g. species) of the subject.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">sex</td>
    <td class="tg-0lax">  The biologic character or quality that distinguishes male and female from one another as expressed by analysis of the person's gonadal, morphologic (internal and external), chromosomal, and hormonal characteristics. </td>
  </tr>
  <tr>
    <td class="tg-7zrl">race</td>
    <td class="tg-0lax"> An arbitrary classification of a taxonomic group that is a division of a species.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">ethnicity</td>
    <td class="tg-0lax"> An individual's self-described social and cultural grouping.</td>
  </tr>
  <tr>
    <td class="tg-0lax"> cause_of_death</td>
    <td class="tg-0lax"> The cause of death, if known</td>
  </tr>
</tbody>
</table>

</div>
    
---

This gives you a quick way to assess whether the full search results will have the data fields you require. But if you want to get the underlying data for your own downstream applications, you can also get the raw numbers by calling the zeroth value of the variable:

In [7]:
subjectresults[0]

{'total': 2384,
 'files': 4099497,
 'subject_identifier_system': [{'subject_identifier_system': 'IDC',
   'count': 1955},
  {'subject_identifier_system': 'PDC', 'count': 309},
  {'subject_identifier_system': 'GDC', 'count': 1454}],
 'sex': [{'sex': 'null', 'count': 748},
  {'sex': 'male', 'count': 980},
  {'sex': 'female', 'count': 653},
  {'sex': 'not reported', 'count': 3}],
 'race': [{'race': 'null', 'count': 748},
  {'race': 'white', 'count': 1311},
  {'race': 'not reported', 'count': 136},
  {'race': 'Unknown', 'count': 21},
  {'race': 'black or african american', 'count': 96},
  {'race': 'asian', 'count': 33},
  {'race': 'not allowed to collect', 'count': 25},
  {'race': 'other', 'count': 9},
  {'race': 'american indian or alaska native', 'count': 4},
  {'race': 'native hawaiian or other pacific islander', 'count': 1}],
 'ethnicity': [{'ethnicity': 'null', 'count': 748},
  {'ethnicity': 'not hispanic or latino', 'count': 1285},
  {'ethnicity': 'not reported', 'count': 219},
  {'e

### researchsubject

If we're interested in what researchsubjects meet our criteria, we can also run our query against the researchsubject endpoint. Lets run it without saving to a variable this time to make it a bit quicker:

In [8]:
myquery.researchsubject.count.run()

researchsubject_identifier_system,count
GDC,1454
PDC,309
IDC,1953

primary_diagnosis_condition,count
Gliomas,1246
Glioblastoma,100
,1953
Pediatric/AYA Brain Tumors,199
Not Applicable,9
Not Reported,11
Germ Cell Neoplasms,104
"Neoplasms, NOS",66
Other,10
Mature B-Cell Lymphomas,2

primary_diagnosis_site,count
Brain,3716







---

<div class="cdadefine" style="background-color:#add9e5;color:black;padding:20px;">

<h3>ResearchSubject Field Definitions</h3>

<i>A research subject is the entity of interest in a specific research study or project, typically a human being or an animal, but can also be a device, group of humans or animals, or a tissue sample. Human research subjects are usually not traceable to a particular person to protect the subjects privacy. This entity plays the role of the case_id in existing data. An individual who participates in 3 studies will have 3 researchsubject IDs</i>
    
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;margin:0px auto;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-7zrl{text-align:left;vertical-align:bottom}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
<tbody>
  <tr>
    <td class="tg-0lax"> id (`total`)</td>
    <td class="tg-0lax"> The overall number of researchsubjects returned.</td>
  </tr>
  <tr>
    <td class="tg-0lax"> files</td>
    <td class="tg-0lax"> The number of files that match this search.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">identifier.value(`system`)</td>
    <td class="tg-0lax"> The identifier for the data provider.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">primary_diagnosis_condition</td>
    <td class="tg-0lax"> The text term used to describe the type of malignant disease.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">primary_diagnosis_site</td>
    <td class="tg-0lax"> The text term used to describe the primary site of disease.</td>
  </tr>
</tbody>
</table>
</div>
    
---

### diagnosis

The diagnosis endpoint is an extension of the researchsubject endpoint, and returns information about researchsubjects that have a diagnosis that meets our search criteria. :

In [9]:
myquery.diagnosis.count.run()

diagnosis_identifier_system,count
GDC,1427
PDC,329

primary_diagnosis,count
Glioblastoma,821
"Oligodendroglioma, NOS",112
Mixed glioma,131
"Glioma, NOS",93
"Astrocytoma, anaplastic",130
"Ganglioglioma, NOS",18
"Neoplasm, uncertain whether benign or malignant",13
"Oligodendroglioma, anaplastic",78
Mixed germ cell tumor,79
"Glioma, malignant",26

stage,count
,1427
Not Reported,110
Unknown,219

grade,count
Not Reported,392
G1,98
not reported,1116
G2,52
G4,36
Low Grade,9
,27
High Grade,26




---

<div class="cdadefine" style="background-color:#add9e5;color:black;padding:20px;">

<h3>Diagnosis Field Definitions</h3>

<i>A collection of characteristics that describe an abnormal condition of the body as assessed at a point in time. May be used to capture information about neoplastic and non-neoplastic conditions.</i>

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;margin:0px auto;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-za14{border-color:inherit;text-align:left;vertical-align:bottom}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-7zrl{text-align:left;vertical-align:bottom}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
<tbody>
  <tr>
    <td class="tg-0lax"> id (`total`)</td>
    <td class="tg-0lax"> The overall number of diagnoses returned.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">identifier.value(`system`)</td>
    <td class="tg-0lax"> The identifier for the data provider.</td>
  </tr>
  <tr>
    <td class="tg-za14">primary_diagnosis</td>
    <td class="tg-0pky"> The diagnosis instance that qualified a subject for inclusion on a ResearchProject.</td>
  </tr>
  <tr>
    <td class="tg-za14">stage</td>
    <td class="tg-0pky"> The extent of a cancer in the body.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">grade</td>
    <td class="tg-0lax"> The degree of abnormality of cancer cells.</td>
  </tr>
</tbody>
</table>


</div>
    
---


### treatment

The treatment endpoint is an extension of diagnosis and returns information about treatments undertaken on research subjects that have a given diagnosis that meets our search criteria:

In [10]:
myquery.treatment.count.run()

treatment_identifier_system,count
GDC,2386

treatment_type,count
"Radiation Therapy, NOS",1139
"Pharmaceutical Therapy, NOS",1117
,30
Surgery,23
Immunotherapy (Including Vaccines),23
Targeted Molecular Therapy,23
Chemotherapy,30
"Radiation, Proton Beam",1

treatment_effect,count
,2386





---

<div class="cdadefine" style="background-color:#add9e5;color:black;padding:20px;">

<h3>Treatment Field Definitions</h3>

<i><i> Medication administration or other treatment types. A single research subject may have multiple treatments for a single diagnosis, and/or different diagnoses, and different treatments, across different studies</i></i>
    
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;margin:0px auto;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-za14{border-color:inherit;text-align:left;vertical-align:bottom}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-7zrl{text-align:left;vertical-align:bottom}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
<tbody>
  <tr>
    <td class="tg-0lax"> id (`total`)</td>
    <td class="tg-0lax"> The overall number of treatments returned.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">identifier.value(`system`)</td>
    <td class="tg-0lax"> The identifier for the data provider.</td>
  </tr>
  <tr>
    <td class="tg-za14">treatment_type</td>
    <td class="tg-0pky"> The treatment type including medication/therapeutics or other procedures.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">treatment_effect</td>
    <td class="tg-0lax">The effect of a treatment on the diagnosis or tumor. </td>
  </tr>
</tbody>
</table>
 

</div>
    
---




### specimens

We can use this same query to see what specimens are available for brain tissue at the CDA:

In [11]:
myquery.specimen.count.run()

specimen_identifier_system,count
GDC,38543
PDC,658

primary_disease_type,count
Gliomas,37567
Glioblastoma,200
Other,20
Pediatric/AYA Brain Tumors,438
"Malignant Lymphomas, NOS or Diffuse",56
Mature B-Cell Lymphomas,54
Germ Cell Neoplasms,416
"Neoplasms, NOS",285
Not Reported,121
Neuroepitheliomatous Neoplasms,8

source_material_type,count
Primary Tumor,27570
Solid Tissue Normal,538
Blood Derived Normal,10074
Next Generation Cancer Model,169
Recurrent Tumor,513
Metastatic,252
Expanded Next Generation Cancer Model,35
Not Reported,36
Buccal Cell Normal,14

specimen_type,count
aliquot,18696
portion,5992
sample,4090
slide,3752
analyte,6671




Nearly 40,000 specimens with over 50,000 files meet our search criteria! We would typically expect this number to be much larger than our number of subjects or research_subjects. First because studies will often take more than one sample per subject, and second because any given specimen might be aliquoted out to be used in multiple tests. 

<div class="cdadefine" style="background-color:#add9e5;color:black;padding:20px;">

<h3>Specimen Field Definitions</h3>

<i><i>Any material taken as a sample from a biological entity (living or dead), or from a physical object or the environment. Specimens are usually collected as an example of their kind, often for use in some investigation.</i>
 A given specimen will have only a single subject ID and a single research subject ID</i>
    
    
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;margin:0px auto;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-za14{border-color:inherit;text-align:left;vertical-align:bottom}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-7zrl{text-align:left;vertical-align:bottom}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
    
<table class="tg">
<tbody>
  <tr>
    <td class="tg-0lax"> id (`total`)</td>
    <td class="tg-0lax"> The overall number of specimens returned.</td>
  </tr>
  <tr>
    <td class="tg-0lax"> files</td>
    <td class="tg-0lax"> The number of files that match this search.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">identifier.value(`system`)</td>
    <td class="tg-0lax"> The identifier for the data provider.</td>
  </tr>
  <tr>
    <td class="tg-za14">primary_disease_type</td>
    <td class="tg-0pky"> The text term used to describe the type of malignant disease.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">source_material_type</td>
    <td class="tg-0lax"> The general kind of material from which the specimen was derived.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">specimen_type</td>
    <td class="tg-0lax"> The high-level type of the specimen, based on its how it has been derived from the original extracted sample. One of: analyte, aliquot, portion, sample, or slide.</td>
  </tr>
</tbody>
</table>
</div>

### file

The file endpoint returns all files that match our query:

In [12]:
myquery.file.count.run()

file_identifier_system,count
IDC,4045816
GDC,50633
PDC,3048

data_category,count
Imaging,4045816
Simple Nucleotide Variation,20481
Clinical,1187
Structural Variation,3160
Peptide Spectral Matches,1524
Raw Mass Spectra,762
DNA Methylation,3339
Sequencing Reads,5885
Copy Number Variation,6909
Transcriptome Profiling,3112

file_format,count
DICOM,4045816
BAM,5885
mzIdentML,762
MAF,8375
mzML,762
VCF,12255
BCR XML,2281
TSV,4335
TXT,8848
vendor-specific,762

data_type,count
,4045816
Clinical Supplement,1182
Annotated Somatic Mutation,11808
Aligned Reads,5885
Biospecimen Supplement,1953
Proprietary,762
Open Standard,1524
Methylation Beta Value,1113
Text,762
Gene Expression Quantification,904




There are a huge number of files (4099497) that match our search. Likely we would want to additionally filter the results by file format or data type to get only files we can use. See all the ways you can filter and refine searches  with more search terms in the <a href="../Operators">Operators</a> notebook.


<div class="cdadefine" style="background-color:#add9e5;color:black;padding:20px;">

<h3>File Field Definitions</h3>

<i>A file is an information-bearing electronic object that contains a physical embodiment of some information using a particular character encoding.</i>

    
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:noto sans, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-7zrl{text-align:left;vertical-align:bottom}
</style>
<table class="tg">
<tbody>
  <tr>
    <td class="tg-0lax"> id (`total`)</td>
    <td class="tg-0lax"> The overall number of files returned.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">identifier.value(`system`)</td>
    <td class="tg-0lax"> The identifier for the data provider.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">data_catagory</td>
    <td class="tg-7zrl">Broad categorization of the contents of the data file.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">data_type</td>
    <td class="tg-7zrl">Specific content type of the data file.</td>
  </tr>
  <tr>
    <td class="tg-7zrl">file_format</td>
    <td class="tg-7zrl">Format of the data files.</td>
  </tr>
</tbody>
</table>

</div>

### mutation

The mutation endpoint returns all mutations that match our query:

In [13]:
myquery.mutation.count.run()

KeyError: 'ncbi_build'

## Files from a single endpoint (endpoint chaining)

If you want all file formats and data types, but only from a specific endpoint, you can also filter the file results by chaining endpoints together. This will return all the files that match our search AND that are specifically from specimens:

In [14]:
myquery.specimen.file.count.run()

file_identifier_system,count
GDC,47493
PDC,3048

data_category,count
Peptide Spectral Matches,1524
Simple Nucleotide Variation,20481
Sequencing Reads,5885
Processed Mass Spectra,762
Raw Mass Spectra,762
Copy Number Variation,6909
DNA Methylation,3339
Biospecimen,3629
Transcriptome Profiling,3112
Structural Variation,3160

file_format,count
BEDPE,1886
mzIdentML,762
MAF,8375
mzML,762
VCF,12255
TXT,8848
tsv,762
vendor-specific,762
SVS,3629
TSV,4335

data_type,count
Raw Simple Somatic Mutation,6202
Masked Intensities,2226
Open Standard,1524
Masked Copy Number Segment,2185
Proprietary,762
Annotated Somatic Mutation,11808
Structural Rearrangement,298
Text,762
Masked Somatic Mutation,1144
Aligned Reads,5885




Learn more about chaining endpoints in the [Chaining endpoints]("../AdvancedSearch-Chaining") notebook.