Skip to content
A machine learning-based m5C predictor
Branch: master
Clone or download
Latest commit fdc6cf7 Oct 10, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
PEA-m5C-java/rna java Jan 5, 2018
R 0.1.3 Jan 6, 2018
data test Jan 5, 2018
man 0.1.3 Jan 6, 2018
.Rbuildignore 0.1.3 Jan 6, 2018
.gitignore 0.1.3 Jan 6, 2018
DESCRIPTION 0.1.3 Jan 6, 2018
NAMESPACE test Jan 5, 2018
PEA-m5C-java.zip java Jan 5, 2018
PEAm5C.pdf 0.13 Jan 9, 2018
PEAm5C_0.1.1.tar.gz 0.1.1 Jan 5, 2018
PEAm5c.Rproj 0.1.1 Jan 5, 2018
README.md Update README.md Oct 10, 2018

README.md

PEAm5C: An R package for plant m5C analysis.


We developed PEA-m5C, an accurate transcriptome-wide m5C modification predictor under machine learning framework with random forest algorithm. PEA-m5C was trained with features from the flanking sequences of m5C modifications. In addition, we also deposited all the candidate m5C modification sites in the Ara-m5C database (http://bioinfo.nwafu.edu.cn/software/Ara-m5C.html) for follow-up functional mechanism researches. Finally, in order to maximize the usage of PEA-m5C, we implement it into a cross-platform, user-friendly and interactive interface and an R package named “PEA-m5C” based R statistical language and JAVA programming language, which may advance functional researches of m5C.

Version and download

Depends

R environment

Global software environment

  • JAVA1.8 Environmentally dependent

Dependency installation

## Install rJAVA
sudo apt-get update
sudo apt-get install r-cran-rjava r-cran-rweka
## Install R Dependency
dependency.packages <- c("randomForest", "seqinr", "stringr", "FSelector", "bigmemory", "ggplot2", "PRROC", "pROC")
install.packages(dependency.packages)

Installation

install.packages("Download path/PEAm5C_0.11.tar.gz",repos = NULL, type = "source")

Contents

Predicting m5C sites

  • Read FASTA file and motif scanning
  • Feature encoding of sequences
  • m5C prediction using Random Forest models

User-defined model

  • Provide positive and negative sample information
  • Automatic verification of the training process
  • Prediction using user-defined models

Quick start

The basic data set can be finded in data.
More details can be seen from user manual.

1.Predicting m5C sites

  • 1.1 Read FASTA file and motif scanning
seq <- extra_motif_seq(input_seq_dir = paste0(system.file(package = "PEAm5c"),"/data/cdna.fa"),up = 5)
seq <- lapply(seq, c2s)
  • 1.2 Feature encoding of sequences
seq_feature <- FeatureExtract(seq)
  • 1.3 m5C prediction using Random Forest models
res <- predict_m5c(seq_feature)

2.User-defined model

  • 2.1 Provide positive and negative sample information
load(paste0(system.file(package = "PEAm5c"),"/data/samples.Rds"))
### The positive and negative sequence can be read and identified by extra_motif_seq and  feature encoding by FeatureExtract 
  • 2.2 Automatic verification of the training process
seq <- PEA_ml(pos_sample = pos_sample,neg_sample = neg_sample)
model <- extra_model(res = seq)
model
  • 2.3 Prediction using user-defined models
res <- predict_self_model(models = model,sequence_dir = paste0(system.file(package = "PEAm5c"),"/data/cdna.fa"))
table(res[,4])

Citation

Song, J., Zhai, J., Bian, E., Song, Y., Yu, J., & Ma, C. (2018). Transcriptome-Wide Annotation of m5C RNA Modifications Using Machine Learning. Frontiers in plant science, 9, 519.

Ask questions

Please use PEAm5C/issues for how to use PEAm5C and reporting bugs.

You can’t perform that action at this time.