Detecting SHAPE modification using direct RNA sequencing
For single gene is suitable for running one gene at a time.
For transcriptome is recommeded for running multiple genes.
Both will yield the same results.
Albacore (Oxford nanopore)
Nanopolish (https://github.com/jts/nanopolish) a modified copy that removes the outliers from fast5 is included (nanopolish-edited.zip)
Graphmap (https://github.com/isovic/graphmap)
R (https://www.r-project.org/)
dplyr
e1071
data.table
optparse
Rcpp
read_fast5_basecaller.py -i "location of fast5" -s "output_location" -r -k SQK-RNA001 -f FLO-MIN106 -o fast5,fastq --disable_filtering
cat fastq* | sed 's/U/T/g' > coverted.fastq
graphmap align -r "reference.fa" -d coverted.fastq -o gene.sam --double-index
samtools view -bT "reference.fa" -F 16 gene.sam > gene.bam
samtools sort gene.bam > gene.s.bam
samtools index gene.s.bam
nanopolish index -d "location of basecalled fast5" converted.fastq
nanopolish eventalign --reads converted.fastq --bam gene.s.bam --genome "reference.fa" --print-read-names --scale-events > gene.event
./Read_events.R -f gene.event -o combined.RData
./SVM.R -m "modified_gene.RData" -u "unmodified_gene.RData" -o "output file names.csv" -l length_of_transcipt(a number)
./split_events.sh "folder to store tmp files" gene.event
./combine.sh
./loop_for_Read_files.sh "number of parts" "input folder" "output folder"
./loop_SVM.sh -s "number of parts" "RData folder containing modified samples" "RData folder containing unmodified samples" "Output folder"
File found in Error_rates are used to calcuate the error rates of mismatch, deletion and insertion per position.
Li Chenhao for his help in getting me started and the calculation of error per strands (https://github.com/lch14forever)
Shen Yang for his code for aligning transcript positions to genomic position and for the TRipseq analysis (https://github.com/shenyang1981)
Zhang Yu for the calculation of error rates.
For combining of standard deviations with mean, standard deviations and number of samples. Headrick, T. C. (2010). Statistical Simulation: Power Method Polynomials and other Transformations. Boca Raton, FL: Chapman & Hall/CRC.