Skip to content

Prediction and Analysis of Multiple Protein Modified Sites Based on Conditional Wasserstein Generative Adversarial Networks

Notifications You must be signed in to change notification settings

Lab-Xu/MultiLyGAN

Repository files navigation

MultiLyGAN


A new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites

Requirements

  • Python>=3.7
  • Matlab2016a
  • Tensorflow =1.6.0

File description

  • In "Data" folder, we show the detailed information of seven types of lysine modified sites.
  • In "Data preprocessing" folder, we display the window cutting code and homologous sequences discarding code.
  • There are nine different encoding schemes in the folder named "Feature construction" which are AAindex, CKSAAP (Composition of K-space amino acid pairs), PWM (Position weight matrix), Reduced Alphabet, FoldAmyloid, BE (Binary Encoding), PC-PseAAC, SC-PseAAC, and Structure features. These programs can encode protein fragments into feature vectors of different dimensions.
  • The folder named "Dimensionality reduction" is used to acquire effective features and remove redundant features.
  • There are two sub-folders in the "sample augmentation" folder. To solve the data unbalanced issue, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methodology, were leveraged to generate synthetic samples.
  • The folder named "Classification" is based on Random Forest (RF) to stratify seven classes.

The pipeline of identification of multiple protein modified sites is visualized.

image

About

Prediction and Analysis of Multiple Protein Modified Sites Based on Conditional Wasserstein Generative Adversarial Networks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published