Skip to content

Additional Information

rowlandm edited this page Oct 15, 2023 · 7 revisions

This page contains additional videos or documents that are found and proved to gain a better understanding of this topic.


Data Commons to Support Pediatric Cancer Research

The first few pages of this paper contain general information on what Data Common looks like. Other stuff could be overwhelming to understand.

  • Centralizing data helps to organize data and has a huge impact on research
  • Data Commons helps researchers not to think much about managing data + finding appropriate tools + eliminate Data Silos + promote Data Sharing
  • To interact between community and data commons there are several ways: 1) through submission of data 2) by requesting and downloading data 3) by collecting and analyzing data commons infrastructure

Data Commons Requirements

  1. Storage: resources must be available for cloud-based analysis of data. However, storing data on-premises + local is more effective
  2. Accessibility: must be publicly available (!= freely available)
  3. Analysis: Allow cloud-based analysis of data
  4. FAIR: conform to the digital compliance model (Findable, Accessible, Interoperable, Reusable)

Reference

Volchenboum, S.L. et al. (2017) ‘Data Commons to support pediatric cancer research’, American Society of Clinical Oncology Educational Book, (37), pp. 746–752. doi:10.1200/edbk_175029.


Opportunities and Barriers to Effective Use of AI in Cancer Research - Dr. Jennifer Couch (NCI, USA)

https://www.youtube.com/watch?v=suC9DQvifRA


Different types of spatial multi-omics

Multiomics: provides an integrated perspective to power discovery across multiple levels of biology

There are different types of data that are expected to be stored: imaging, single-cell transcriptomics, proteomics, and transcriptomics (non-single cell)


cBioPortal Notes

  1. Data is publicly available from https://github.com/cBioPortal/datahub/ which is updated regularly
  2. Three of the datasets have data_mutations.txt that use GRCh37
  3. Behind the scenes git-lfs is used to manage the large files. https://github.com/github/git-lfs
  4. https://github.com/cbioportal/datahub/blob/master/seedDB/README.md this says that they update it regularly
  5. Does that mean we will need to do the same with WEHI?
  6. This has patient data that is de-identified with a patient id. It also has genomic mutations though...? What is the data governance on this?

Other Project to look into

https://github.com/WEHI-ResearchComputing/Genomics-Metadata-Multiplexing/wiki

ISB-CGC

ISB-CGC, one of the National Cancer Institute's Cloud Resources, uniquely hosts cancer data including somatic mutations, copy number variations, gene and, protein expressions, etc. from widely used cancer datasets including TCGA, TARGET and many more in Google BigQuery.

Google BigQuery is a massively-parallel analytics engine ideal for tabular data. ISB-CGC has combined data scattered over tens of thousands of files into easily accessible BigQuery tables. This novel approach allows our users to quickly analyze data from thousands of patients in ISB-CGC curated BigQuery tables.

https://isb-cgc.appspot.com/how_to_discover/

Clone this wiki locally