Data Commons

What is Data Commons?

Data Commons is a software platform along with a governance framework that together allows a community to manage, analyze and share its data

Motivations: Why do we need this?

Use aggregated data to speed up solutions for a particular problem ex get enough data to use ML
Standardised Data Format 2
The need to share data among communities; however, data governance needs to be concerned as well

With proper management, different level access to data is useful

Cloud Computing: the technology allows us to manage large data with many supporting features 3
Speed things up: each community doesn't need to curate data allowing them to focus and make results faster 3
Bring Communities together: Data commons reduce costs for each community to maintain their data, and it reduces the barriers to access the data 3

Data Commons Compositions

ARDC version - Australian Research Data Commons

It is composed of 4 elements: People & Policy, Platform & Software, Data & Services, and Storage & Compute

Alternate Version

In this version, it is composed of:

Data Governance and Data Management Framework to handle data

Framework to handle identifying datasets

Framework to handle analyzing and visualizing datasets

Framework to share and collaborate results

Examples

From the Alternate version, we will be using this to test the existing systems to see if it complies with its composition.

Bioinformatics Data Commons	Data Governance and Data Management	Identifying Datasets	Analyzing and Visualizing data	Share and Collaborate Results
UK Biobank	✅	✅	✅	✅
NCI Genomics Data Common	✅	✅	✅	✅
Haemosphere		✅	✅	✅
cBioPortal		✅	✅	✅
Stemformatics		✅	✅	✅

The table highlights that the majority of them have achieved the framework structure; however, it is evident that not all of them adequately address data governance and management aspects.

WEHI Goal for Data Commons

Aim to make it easier to streamline and setup data commons for a particular community
Aim to create an on-demand data commons infrastructure
Aim to create and configure extensively to suit each community's needs

User Stories for a Data Commons Framework

You can see the User Stories here.

Current Architectural Design of WEHI Data Commons Framework

Treat a Data Commons as an exclusive yacht club for datasets. The dress code is higher than for a common institute-wide dataset registry.
There may be multiple Data Commons in an institute, or across institutes.
It should be easy to create the core parts of a new Data Commons using a standard framework

Dataset Registry

The data registry is a place where metadata about the refined dataset and possibly sample information is stored. It should point to the location of the raw, processed and summarised data, as well as to the appropriate data portals.

It should allow the user to search for features of a dataset and also know if they can access a dataset straight away, or if they need to ask permission.

It should have an ecosystem of tools to allow data scientists to easily locate and access datasets.

Data Portals

There is interest in having research data that is summarised and easy to access for non-computational researchers. There are a few data portals such as cBioPortal, Aquila, Omero, and others that provide this type of functionality.

Because the data is so heteregenous, more than one data portal may be needed and different Data Commons may need different Data Portals.

We also need to be able to support our own Data Portals (eg. interactive data viz written in Shiny/R).

Screen Shot 2024-02-04 at 9 14 45 am

Screen Shot 2024-02-04 at 9 14 51 am

Roadmap

Create a proof of concept PoC for a single thematic that uses small public datasets
Review other systems and work with real data
Test alternative portals and data registries in parallel Migration to trial for one or two thematics
Maturation of services and push into production

Conclusion

This page provides a concise overview of what Data Commons entails, including how WEHI aims to shape its data commons infrastructure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Commons

What is Data Commons?

Motivations: Why do we need this?

Data Commons Compositions

Examples

WEHI Goal for Data Commons

User Stories for a Data Commons Framework

Current Architectural Design of WEHI Data Commons Framework

Dataset Registry

Data Portals

Roadmap

Conclusion

Reference

Home

Data Commons

Spatial Omics

Interview

Clone this wiki locally