Skip to content

Multi-omics integration method using AE and GCN

License

Notifications You must be signed in to change notification settings

deepsystemspharmacology/MoGCN

 
 

Repository files navigation

MoGCN

What is it?

MoGCN, a multi-omics integration method based on graph convolutional network.
Image text As shown in figure, inputs to the model are multi-omics expression matrices, including but not limited to genomics, transcriptomics, proteomics, etc. MoGCN exploits the GCN model to incorporate and extend two unsupervised multi-omics integration algorithms: Autoencoder algorithm (AE) based on expression matrix and similarity network fusion algorithm based on patient similarity network. Feature extraction is not necessary before AE and SNF.

Requirements

MoGCN is a Python scirpt tool, Python environment need:
Python 3.6 or above
Pytorch 1.4.0 or above
snfpy 0.2.2

Usage

The whole workflow is divided into three steps:

  • Use AE to reduce the dimensionality of multi-omics data to obtain multi-omics feature matrix
  • Use SNF to construct patient similarity network
  • Input multi-omics feature matrix and the patient similarity network to GCN
    The sample data is in the data folder, which contains the CNV, mRNA and RPPA data of BRCA.

Command Line Tool

python AE_run.py -p1 data/fpkm_data.csv -p2 data/gistic_data.csv -p3 data/rppa_data.csv -m 0 -s 0 -d cpu
python SNF.py -p data/fpkm_data.csv data/gistic_data.csv data/rppa_data.csv -m sqeuclidean
python GCN_run.py -fd result/latent_data.csv -ad result/SNF_fused_matrix.csv -ld data/sample_classes.csv -ts data/test_sample.csv -m 1 -d gpu -p 20

The meaning of the parameters can be viewed through -h/--help

Data Format

  • The input type of each omics data must be .csv, the rows represent samples, and the columns represent features (genes). In each expression matrix, the first column must be the samples, and the remaining columns are features. Samples in all omics data must be consistent. AE and SNF are unsupervised models and do not require sample labels.
  • GCN is a semi-supervised classification model, it requires sample label files (.csv format) during training. The first column of the label file is the sample name, the second column is the digitized sample label, the remaining columns are not necessary.

Contact

For any questions please contact Dr. Xiao Li (Email: lixiaoBioinfo@163.com).

License

MIT License

Citation

Li X, Ma J, Leng L, Han M, Li M, He F and Zhu Y (2022) MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis. Front. Genet. 13:806842. doi: 10.3389/fgene.2022.806842.

About

Multi-omics integration method using AE and GCN

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%