M2seq

##Analysis of 2D mutate-and-map signal in next-generation-sequencing data

Installation

Ensure that you have the following installed:

Novobarcode, part of the Novoalign software package, which is freely available for educational and not-for-profit use. Download the latest version of Novoalign at http://www.novocraft.com/support/download/
ShapeMapper 1.2, software developed by the Weeks lab at UNC Chapel Hill for 1D analysis of mutational profiling data. Available at http://www.chem.unc.edu/rna/software.html (Make sure you go into that directory and run make.)
BowTie2 is needed for ShapeMapper. Available here: https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.2.9/. Version 2.2.9 works.
the nwalign python package, which you can install via sudo pip install nwalign
numpy
The RDATkit for handling RDAT data files. Available at https://github.com/ribokit/RDATKit

Add the M2seq, novobarcode, etc. folders to your PATH. For example in your .bashrc add lines like PATH=$PATH:$HOME/src/M2seq.

Running the code

To test on example sequencing data, download the two example FASTQs from this link and move them to the Tutorial folder. Then, run:

m2seq.py P4P6.fa RTBbarcodes.fa Sample1_S1_L001_R1_001.fastq Sample1_S1_L001_R2_001.fastq --config example.cfg --offset 89

P4P6.fa [required] is a fasta-formatted file with the name and sequence of the RNA. Note that the name of the fasta file and the sequence in the fasta file must match the name of the reference sequence given in the config file.
RTBbarcodes.fa [required] is a Novobarcode-formatted file with the names and sequences of the barcodes in the RTB primers.
The read 1 and read 2 FASTQs are required inputs.
example.cfg [optional] is a config file in the format required by ShapeMapper. If a config file is provided, the 2D dataset will be generated by ShapeMapper analysis followed by conversion of binary mutation-counted files to 2D data that are output as RDAT files. The correct names of the conditions in the pilot experiment used for this tutorial are already input in the [alignments] and [profiles] sections of the config file. These sections must be edited for different experiments.
Make sure no spaces are in any of the folder names parent to this directory -- you may see fails of bowtie2 otherwise.
Useful tip: Especially if this is your first run, tou may want to do a 'pilot run' with test fastq files that hold just a few test lines from your full FASTQ files. Prepare these as head -n 10000 Sample1_S1_L001_R1_001.fastq > test/Sample1_S1_L001_R1_001.fastq and head -n 10000 Sample1_S1_L001_R2_001.fastq > test/Sample1_S1_L001_R2_001.fastq and supply test/RTBbarcodes.fa Sample1_S1_L001_R1_001.fastq test/Sample1_S1_L001_R2_001.fastq in the m2seq.py command line.
If you need to rerun, you'll need to cleanly remove the previous output directories: rm -rf 1_Demultiplex/ 2_ShapeMapper/ 3_M2seq/.

The m2seq.py command performs the following:

First, it uses Novobarcode to demultiplex the pairs of sequencing reads using the barcodes provided in RTBbarcodes.fa
Then, it performs ShapeMapper analysis with the settings in the config file to compare the reference sequence to the paired sequencing reads for each barcode
ShapeMapper outputs a mutation strings text file for each barcode recording the mutations per read compared to the reference sequence, which is then converted to binary format
The binary-formatted mutation strings are analyzed with simple_to_rdat.py to generate 2D datasets

The output of the pipeline includes, for each barcode:

2D reactivity RDAT
2D raw counts RDAT
counted mutations over all sequencing reads
log files The ExampleResults archive contains these expected outputs from running the analysis on the example sequencing data at the link above.

Visualizing the mutate-and-map output

The key modes of visualizing the M2seq data are as a 2D plot (mutate-and-map style) and as mutation spectra (rates of mutation/deletion both across the sequence and on average).

To view the 2D plot, open MATLAB, make sure the matlab folder is in your MATLAB path, and run:

m2seqplot('3_M2seq/simple_files/RTB005_P4P6_P4P6.reactivity.rdat');

Compare the resulting plot to the 2D plot in the ExampleResults folder (you may need to unzip ExampleResults.zip): Example2D.eps

To view the mutation spectra, open MATLAB and run:

mut_heatmap('2_ShapeMapper/output/counted_mutations/RTB005_P4P6.csv',103:260,'',13);

Compare the resulting plot to the mutation spectra in the Figures folder: ExampleSpectra.eps, ExampleSpectra_mut_avg.eps, ExampleSpectra_mut_max.eps

Structural analysis

Two modes of structural analysis are available:

M2-net is a neural-net-inspired analysis that detects helix signatures in the mutate-and-map data without making assumptions about RNA folding energy models. It largely reproduces visual analysis.
The RNAstructure secondary structure modeling package (version 6.0, expected to be released in mid-2017) can take in M2seq data to guide modeling of all the helices in a model and estimate confidence in the structures.

Both styles of analysis can be run through MATLAB scripts available in the Biers repository – check out installation instructions and some examples on the P4-P6 and GIR1 RNAs on RiboKit.

Citation

If you use this software, please cite:

Cheng, C. Kladwang, W. Yesselman, J. Das, R. (2017) "RNA structure inference through chemical mapping after accidental or intentional mutations" BioRxiv https://doi.org/10.1101/169953.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
Tutorial		Tutorial
matlab		matlab
test/20160602_MaP_WK		test/20160602_MaP_WK
.gitignore		.gitignore
ExampleResults.zip		ExampleResults.zip
LICENSE		LICENSE
README.md		README.md
m2seq.py		m2seq.py
simple_to_rdat.py		simple_to_rdat.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial

Tutorial

matlab

matlab

test/20160602_MaP_WK

test/20160602_MaP_WK

.gitignore

.gitignore

ExampleResults.zip

ExampleResults.zip

LICENSE

LICENSE

README.md

README.md

m2seq.py

m2seq.py

simple_to_rdat.py

simple_to_rdat.py

Repository files navigation

M2seq

Installation

Running the code

Visualizing the mutate-and-map output

Structural analysis

Citation

About

Releases 1

Packages

Contributors 3

Languages

License

ribokit/M2seq

Folders and files

Latest commit

History

Repository files navigation

M2seq

Installation

Running the code

Visualizing the mutate-and-map output

Structural analysis

Citation

About

Resources

License

Stars

Watchers

Forks

Languages