Skip to content

MSNet-4mC: Learning effective multi-scale representations for identifying DNA N4-methylcytosine sites

Notifications You must be signed in to change notification settings

LIU-CT/MSNet-4mC

Repository files navigation

MSNet-4mC

Author: Liu Chunting Affiliation: Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University E-mail: liuchunting@kuicr.kyoto-u.ac.jp

Details

  • Users can run the MSNet_4mC.py to identify DNA 4mC sites.
  • The folders of "Li_2020_dataset" and "Lin_2017 dataset" contain the datasets and the files for experiments.
  • The Lin_2017 dataset and Li_2020 dataset can be also accessed at http://DeepTorrent.erc.monash.edu/
  • The trained model weights for the test are provided in the folder "Models".

Dependency

  • Python 3.8.8 and Pytorch 1.11.0 or later versions

Installation Guide

Usage

For evaluation:

  • Run the default dataset
python MSNet_4mC.py --Dataset Lin_2017_Dataset --Species <Species>
python MSNet_4mC.py --Dataset Li_2020_Dataset --Species <Species> 
  • Make the prediction for customized data
python MSNet_4mC.py --Dataset User_Dataset --Species <Species> --Fasta_file <Fasta_file>  
  • For evaluation on the default dataset, users can also directly run test.py with corrected paths to the datasets and models.

For training

  • The training codes are given in the Dataset directories.
  • Train the default dataset
    • Run training_basemodel.py to train the base model.
    • Run training_scratch_and_finetuning.py to train the species-specific models from scratch or fine-tuning the hyperparameters to retrain the species-specific models based the base model with the different settings of load_pretrain and load_path.
  • Train the customized data
    • Firstly, use class_weight.py to calculate class weights for different species.
    • Secondly, use training_basemodel.py to train a base model on the merged training dataset.
    • Finally, use training_scratch_and_finetuning.py to retrain the species-specific model on each species training dataset.

About

MSNet-4mC: Learning effective multi-scale representations for identifying DNA N4-methylcytosine sites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages