Skip to content
Matt Rogers edited this page Nov 14, 2023 · 3 revisions

What is RESPIRE?

RESPIRE, the REpoSitory for Pulmonary expressIon data ReusE, is a framework for centralizing, searching, and distributing processed publicly-available data from diverse sources.

The Problem

The data we seek to organize and present for download are publicly available. In theory, scientists of all sorts could retrieve and use these data to:

  • Develop new hypotheses based on public data sets, perhaps by combining them together into novel compendia
  • Validate preliminary findings in their own data by looking at the results of others

In practice, while these data are publicly available, there are many barriers to the actual use of these data for scientific analysis:

  • Publicly-available studies are discoverable only through diverse search interfaces of varying quality
  • When studies are identified, they may not contain sufficient samples to address the research question
  • If sufficient samples are available, data are generated in various modalities, not all of which may be suitable to the research question
  • The data is often stored in file formats that are difficult to process
  • Researchers outside of a handful of high-priority conditions such as cancer, heart disease, and diabetes may struggle to identify studies relevant to their research given the volume of data related to the aforementioned conditions

RESPIRE Problem drawio

RESPIRE packages domain-specific, fully standardized, easy-to-use data behind a standard search and download interface with the goal of promoting data reuse.

Technical Challenges

From a technical perspective, there are two primary challenges to overcome when providing diverse data sets from a centralized location.

  1. The available metadata varies widely across data types. A standard set of search fields would not be suitable for all types of data.
  2. The data itself is not consistent. Enforcing a single table format across all possible data types is not feasible.

Both of these challenges require an intentionally flexible application design.

The RESPIRE Solution

RESPIRE uses a modular structure to address the issues laid out above. The core RESPIRE system consists of a user interface and a registry that tracks available modules. The RESPIRE core is then associated with one or more modules.

RESPIRE Modules

In the simplest terms, a RESPIRE module is a REST API with a defined structure. It serves as a plug-in to a centralized front end. A researcher interested in contributing data to a RESPIRE instance can develop a RESPIRE module and register it with the RESPIRE core. A module provides:

  • The ability to search study metadata
  • Downloadable data and metadata corresponding to selected studies

Modules will typically represent a single type of domain-specific data (e.g. gene expression, microbiome, proteomics, or metabolomics data related to a specific area of research).

The RESPIRE core functions like a video game console -- on its own, it has very limited functionality. Its purpose is to provide an interface to various interchangeable games (modules). Provided that the game developer follows the system specifications, a huge array of games can feasibly be created with each providing unique capabilities.

The core RESPIRE system is intended to allow researchers to focus on developing games rather than creating their own consoles from scratch.

Documentation on creating a data module is available in the Modules section of the Wiki. Further discussion of the RESPIRE core is available in the Core section.

RESPIRE System Overview

RESPIRE CONTEXT DIAGRAM

Clone this wiki locally