Skip to content

compbiolabucf/omicsGAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

omicsGAN

omicsGAN is the generative adversarial network based framework that can integrate two omics data along with their interaction network to generate one synthetic data corresponding to each omics profile that can result in a better phenotype prediction.

Framework

Image description

Required Python packages

  • Numpy (>=1.17.2)
  • Pandas (>=0.25.1)
  • sklearn (>=0.21.3)
  • PyTorch (pytorch version >=1.5.0, torchvision version >=0.6.0)

Sample datasets

Sample datasets for breast cancer phenotype prediction are available below.
mRNA expression: https://drive.google.com/file/d/1u-tmptVnm9yAjYGiby1FWAIsire3g_QF/view?usp=sharing
miRNA expression: https://drive.google.com/file/d/18c2efgsuYm2GZu9XxqpwrnGqZLvcFXIB/view?usp=sharing
interaction network: https://drive.google.com/file/d/13AssxLZQdta4O-9bQhHaSgSuaslnJceO/view?usp=sharing
label data: https://drive.google.com/file/d/10SWmhoRVb_8sIw2JGeSHorHJiMMVy7n_/view?usp=sharing

Codes

omicsGAN.py Users only need to run this code for generating synthetic data through omicsGAN using command line arguments mentioned below.

omics1.py Called from omicsGAN.py and updates the first omics data

omics2.py Called from omicsGAN.py and updates the second omics data

Quick start guide

Users need to download all data necessary for a cancer analysis into the same folder as the three codes. Updated omics datasets will be saved in the same folder as well.

Input data All input data is in csv format.

omics data: Omics datasets should be in feature by sample format with first column being the names of the features and first row being names of the samples. Example images of omics data are attached below.

Image description Image description

Interaction network: Interaction netowrk should be in first omics data by second omics data format. First column should be the feature names of first omics data and first row is the feature names of second omics data.

Image description

Label: Label data should be a column vector with each row corresponding to a sample. The classifier is designed for binary classification only. For multi-class classification, SVM can be modified accordingly.

Image description

Command

omicsGAN.py, total number of update(K), first omics dataset, second omics dataset, interaction network, label
For example, to generate synthtic mRNA and miRNA expression using our provided dataset, users have to use the following command

Sample command: omicsGAN.py 5 mRNA.csv miRNA.csv bipartite_targetscan_gene.csv label.csv

For any concern or further assistance, contact t.ahmed@knights.ucf.edu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages