github for "Conditional Variational Autoencoder-based Generative Model for Gene Expression Data Augmentation" | Paper | Code
Gene expression data can be utilized in various studies, including the prediction of disease prognosis. However, there are challenges associated with collecting enough data due to cost constraints. In this paper, we propose a gene expression data generation model based on Conditional Variational Autoencoder. Our results demonstrate that the proposed model generates synthetic data with superior quality compared to two other state-of-the-art models for gene expression data generation, namely the Wasserstein Generative Adversarial Network with Gradient Penalty based model and the structured data generation models CTGAN and TVAE.
-
Test 2745 samples, 969 L1000 landmark genes.
-
Compare with datasets such as [Ramon Viñas, Helena Andrés-Terré, Pietro Liò, Kevin Bryson, Adversarial generation of gene expression data, Bioinformatics, Volume 38, Issue 3, February 2022, Pages 730–737]
In this study, samples of 15 common tissues (lung, breast, kidney, thyroid, colon, stomach, prostate, saliva, liver, esophageal myopathy, esophageal mucosa, esophageal gastrointestinal tract, bladder, uterus, and cervix) of GTEx and TCGA were used. We followed the pipeline described by Wang et al. (2018) to integrate data and modify the deployment effect. Since then, 969 common genes with the L1000 landmark gene set were selected to create a dataset consisting of 9,146 samples and 969 genes.
- GTEx(Genotype-Tissue Expression) Dataset
- TCGA(Cancer Genome Atlas) Dataset
- L1000 landmark
- RNA-seq(human transcriptomics) Dataset (9147 samples and 18154 genes )
- torch >= 1.12.1
- python >= 3.7
- Python packages
- umap-learn >= 0.5.3
- scikit-learn >= 1.1.1
969 landmark gene sets were pretreated using log2 (expression_value+1) and standardization. You can download sample data for learning and testing from the Google Drive link below.
python train.py
Please check the evaluation.ipynb
file.
If you have any question or problem, please send an email to sanseng@mju.ac.kr