RESPIRE is the REpoSitory for Pulmonary expressIon data Reuse. Detailed documentation available in the RESPIRE Wiki
The goal of the RESPIRE project is to provide a simple interface to a wide variety of microbiological data types. Sourcing clean, analysis-ready data is one of the primary challenges to "multi-omics" (i.e. from multiple "omes" -- the genome, proteome, microbiome, metabolome, etc.) data analysis. RESPIRE provides a standard interface for exposing processed data for download with the goal of increasing the reusability and availability of said data.
This is accomplished through a dynamic front-end and modular application structure that makes it easy for researchers to expose an API with a standard specification to a deployed RESPIRE instance and make data immediately available to the community.
The RESPIRE system has three pieces
- One or more module APIs that serve data and metadata specific to a module (e.g. gene expression data, microbiome data)
- A registry API that tracks the available modules and makes them known to the front-end application
- A standard React front end set up to consume data from the registry and module APIs
Module APIs are registered with a deployed registry API, which is in turn connected to a RESPIRE frontend instance. Both the registry API and the front-end are containerized using Docker. These containers can be deployed on AWS or on your institution's infrastrucure.
RESPIRE Modules combine searchable metadata with processed data and a standard API interface that can be registered with a hosted RESPIRE frontend.
The RESPIRE frontend is designed to search study level metadata and select studies based on one or more filters. These studies are associated with samples which comprise the data and may have metadata of their own.
Two types of data are required for a functional RESPIRE module:
- Study Metadata: This information describes studies having one or more samples. This information is used by the front end to identify studies of interest to the researcher.
- Data: This is the information associated with the studies searchable through the front end. How this data is processed and stored is at the discretion of the module developer.
Additionally, sample metadata is strongly recommended but not required for the system to operate. This is information that describes the samples present in the data.
A module API has four required endpoints
/studies/searchMetadata: Returns study results matching the search from the front end/data/download: Returns a FileResponse containing the selected studies/admin/inputs: Returns a list of JSON objects defining one or more inputs for the front end/admin/dataStructure: Returns a JSON object describing the module data structure. Required only to use the module with therespireAdminR package
Additional end points, such as to provide lists of options for dropdown menus or other desired administrative functions, can be added as necessary.
Purpose: Search study metadata and return information about studies matching the search.
Accepts:
Arbitrary JSON defined by /admin/inputs
See the section on /admin/inputs for more information on how the JSON is created and served from the front end.
Returns:
List of results with the following structure.
[{
accession_number: [NUM/STR] Unique ID for study
title: [STR]
description: [STR]
n_samples: [NUM]
}]
Purpose: Download data and metadata matching selected studies in a .ZIP file. The exact structure of the data and contents of the ZIP are at the discretion of the module developer.
Accepts:
List of unique ID values
Returns:
FileResponse
headers = {'Content-Disposition': 'attachment; filename="YOURFILENAMEHERE.zip"',
'Accept': 'application/zip, application/octet-stream '}
Purpose: RESPIRE modules are intended to handle diverse types of data. /admin/inputs returns a list of input specifications for a module that will be drawn dynamically by the user interface. This endpoint determines the choices the end-user will have for searching the study metadata.
Returns:
A hash with one key, inputs. The value of inputs will be an array of JSON objects
{inputs: List[Dict]}
An input definition is a JSON with 4 keys
{
function: FUNCTION_NAME,
searchField: FIELD_NAME,
split_download: true/false
args: {
ARG_NAME: ARG_VALUE
}
}
- function
- Valid pre-defined input function, one of checkboxInput, textInput, selectInput, or numberInput
- searchField
- Corresponds to a field accepted by /studies/searchMetadata
- split_download
- A boolean
true/false. Iftrue, data will be downloaded separately for each selected value of the input. This is intended to prevent non-sensical combinations of data, such as combining data sourced from multiple profiling methods into a single compendium. - If multiple
truesplit_download values are provided, downloads will be possible for all combinations of the marked inputs.
- A boolean
- args
- Arguments specific to the function. Required args detailed below
The values provided in the searchField of each input will be extracted into a JSON object defining that will be passed to /studies/searchMetadata when a search is submitted. For example, given the following input collection:
inputs: [
{
function: checkboxInput,
searchField: has_data,
split_download: false,
args: {
label: "Data available?"
}
},
{
function: numberInput,
searchField: "min_samples",
split_download: false,
args: {
label: "Minimum available samples",
defaultValue: 50
}
}]
The RESPIRE interface will pass the following JSON structure to /studies/searchMetadata (with values corresponding to the user selection)
{
"has_data": true,
"min_samples": 50
}
- checkboxInput
- args
- label: Text to display as the label
- args
- textInput
- args
- label: Text to display as the label
- placeholderText: Text to display as a placeholder
- args
- selectInput
- args
- label: Text to display as the label
- options: Optional list of values to choose from
- source: An optional API endpoint to use to populate choices. If
sourceis not null, anything specified inoptionswill be overwritten
- args
- numberInput
- args
- label: Text to display as the label
- defaultValue: Initial value for the input
- args
Purpose: This endpoint describes the database structure of the study metadata, data, and optional sample metadata. This endpoint is used by the respireAdmin R package.
Returns:
JSON
{
'study_metadata_table': {
'db_schema': '',
'table': '',
'fields': [{FIELD_NAME: FIELD_TYPE}]
},
'sample_metadata_table': {
'db_schema': '',
'table': '',
'fields': [{FIELD_NAME: FIELD_TYPE}]
},
'data_table': {
'db_schema': '',
'table': '',
'fields': [{FIELD_NAME: FIELD_TYPE}]
},
'shared_key': SHARED_ID
}
The sample_metadata_table key is optional. All other keys are required.
The specification of the database table containing study metadata
- db_schema: The database schema containing the table
- table: The name of the table
- fields: A list of field names and field types
- Field types must be one of:
- integer
- numeric
- character
- boolean
- Field types must be one of:
The specification of the database table containing sample metadata
- db_schema: The database schema containing the table
- table: The name of the table
- fields: A list of field names and field types
- Field types must be one of:
- integer
- numeric
- character
- boolean
- Field types must be one of:
The specification of the database table containing the data for the studies
- db_schema: The database schema containing the table
- table: The name of the table
- fields: A list of field names and field types
- Field types must be one of:
- integer
- numeric
- character
- boolean
- Field types must be one of:
The study metadata table and the data table should share a key that can be used to identify the data for a study. shared_key should be the name of that column.
This can be done with the respireAdmin package's register_new_module() function or with a post request to your registry API's register_module endpoint.
A module specification is formed as follows:
- module_name: A unique, descriptive name for the module. This will appear in a dropdown, so one or two words is best.
- module_api: The base URL for the module API