BioMiner

BioMiner Project

BioMiner is dedicated to building a data mining platform that integrates high-quality multi-omics data management, distribution and exploratory analysis.

For Users

Please access Metadata System and Indexd to get metadata and multi-omics data.

For Data Collaborators

Full Lifecycle Management of Metadata

DataHub for The Cancer Omics Atlas

The DataHub repository is used to manage the metadata and specifications for the TCOA project.

|- .github                   -> Scripts for Continuous Integration and Delivery (Metadata QA & QC, Syncing the Metadata to the BioMiner System)
|- data                      -> Metadata Tables (Each sub directory (named with the project name) contains several metadata tables for every entity, please read the specifications for more details)
|    |- FUSCC_LUAD
|    |       |- project/
|    |       |- ...
|    |       |- README.md
|    |- FUSCC_TNBC
|    |- FUSCC_CBCGA
|- docs                      -> Specifications and How-To
|- README.md                 -> Quick Start
|- CHANGELOG                 -> Change Log
|- LICENSE                   -> License File

SEQC DataHub for Quality Control

The SEQC DataHub repository is used to manage the metadata and specifications for the SEQC project.

|- .github/                          -> Scripts for Continuous Integration and Delivery (Metadata QA & QC, Syncing the Metadata to the BioMiner System)
|- data/                             -> Metadata Tables (Each sub directory (named with the project name) contains several metadata tables for every entity, please read the specifications for more details)
|    |- <PROJECT_NAME>_RNA/
|    |       |- project/
|    |       |- donor/
|    |       |- biospecimen/
|    |       |- reference_materials/
|    |       |- library/
|    |       |- sequencing/
|    |       |- datafile/
|    |       |- README.md
|    |       |- CHANGELOG
|- docs/                             -> Specifications and How-To
|- README.md                         -> Quick Start
|- CHANGELOG                         -> Change Log
|- LICENSE                           -> License File

cBioportal DataHub

A centralized location for storing curated data ready for inclusion in cBioPortal.

For Pipeline Developers

flowchart LR
  subgraph HPC [Testing on Local HPC]
    publications([Publications]) --Summarized by Pipeline Developer--> scripts[Pipeline with PBS Scripts]
    scripts --Evaluated by Pipeline Developer--> tested_pipeline[The Best Pipeline]
  end
  subgraph AliCloud [Running on AliCloud]
    app[Usable and Low-cost App] --Deployed by System Developer--> platform[ClinicoOmics Web System]
  end
  HPC -- Pipeline >>> App --> AliCloud
  HPC -. Developed by App Developer .-> AliCloud

For System Developers

The whole system mainly consists of the following parts：Omics Datasets' Repo, Metadata QC & QA System, Workflows & Bioinformatics Workflow Management System, Omics Data Commons(Similar to Genomics Data Commons), A Web Platform for Collaborative Multi-omic Data Visualization and Exploration(Simimar to cBioportal).

Components	Temporary Solution	Production	Description
Omics Datasets' Repo	TCOA DataHub SEQC DataHub	Same as the Temporary Solution	GitHub Repo, For Metadata Collaboration and Version Control
Metadata QC & QA System	Metabase Metadata Validator	Same as the Temporary Solution	For Metadata QC & QA
Workflows & Bioinformatics Workflow Management System	ClinicoOmics	Same as the Temporary Solution	For Bioinformatics Workflow Management
Omics Data Commons	Metabase BioPoem	Metabase BioPoem BioMiner Indexd	Metabase - For Metadata Management of the Omics Datasets Biopoem - For High-speed Data Transfering (Sequencing Center -> Local HPC -> NODE -> AliCloud/Local HPC) BioMiner Indexd - A hash-based data indexing and tracking service providing globally unique identifiers. NODE/GSA/SRA - For Level 1/2 Data Files OSS - For Level3 Data Files
A Web Platform for Collaborative Multi-omic Data Visualization and Exploration	cBioportal cBioportal DataHub	BioMiner Studio	cBioportal - Omics Data Exploration cBioportal DataHub - A centralized location for storing curated data ready for inclusion in cBioPortal. BioMiner Studio - Support Online Query and Exploration Analysis Modules（Similar to cBioportal, Stats and Machine Learning Modules

graph TD
    subgraph BioMiner's Management View
      BioMiner -- Manage <> Similar with GDC --> Metadata[Metadata];
      BioMiner -- Manage <> Similar with cBioportal --> AnalysisModules[Analysis Modules];
      Metadata -.- AnalysisModules;
    end

    Metadata -- Contains --> Info[Patient/Sample Information, etc.];
    Metadata -- File Link to --> Level1[Level1 Data - NODE/GSA/SRA];
    Metadata -- File Link to --> Level3[Level3 Data - Cloud Storage];
    SeqSite[Sequencing Center] -- Sync by Biopoem --> HPC[PGx HPC];
    HPC -- Sync by Biopoem --> Level1[Level1 Data - NODE/GSA/SRA];
    Level1 -- Transfer to Cloud Storage by BioPoem & Convert Level1 Data to Level3 Data by SeqPipe  --> Level3;
    AnalysisModules --> DownstreamAnalysis[Downstream Analysis];
    Info -- Part of Samples' Metadata --> Dataset[New Dataset];
    Level3 -- Part of Samples --> Dataset;
    Dataset -- As Input Data --> DownstreamAnalysis(Mining Dataset with Several Analysis Modules on BioMiner);
    DownstreamAnalysis --> Kownledges;
    
    style Level1 fill:#00758f;
    style Level3 fill:#00758f;
    style Info fill:#00758f;
    style Metadata fill:#dafbe1;
    style AnalysisModules fill:#dafbe1;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BioMiner

BioMiner Project

For Users

For Data Collaborators

Full Lifecycle Management of Metadata

DataHub for The Cancer Omics Atlas

SEQC DataHub for Quality Control

cBioportal DataHub

For Pipeline Developers

For System Developers

Pinned Loading

Repositories

People

Top languages

Most used topics