LtrDetector

https://www.biorxiv.org/content/biorxiv/early/2018/10/22/448969.full.pdf

Compiling source code

You must have the GNU compiler. Follow the steps to obtain a compatible version.

MacOS

Use HomeBrew: brew install gcc

Run this command to see list of gcc files installed by HomeBrew: brew list gcc

Example: joseph:LtrDetector jdv$ brew list gcc

/usr/local/Cellar/gcc/8.2.0/bin/c++-8

/usr/local/Cellar/gcc/8.2.0/bin/cpp-8

/usr/local/Cellar/gcc/8.2.0/bin/g++-8

On Makefile, change the CXX value to g++-Version as indicated by the listing. In the example above, CXX = g++-8

Linux (Ubuntu)

The GNU compiler comes standard. If you do not have it, run:

sudo apt-get install gcc

Compile

To locate the source directory: cd src

The following command makes the required directories: make bin

The following command makes the binary that is located under the bin directory: make tr -j

To find the binary: cd ../bin

To find help information: LtrDetector -help

Example run with the defaults:

LtrDetector -chromDir ~/Data/zmays/Fa -destDir ~/Data/zmays/detector

Example run on 4 cores:

LtrDetector -chromDir ~/Data/thaliana/Fa -destDir ~/Data/thaliana/detector -nThreads 4

Output Explanation

LtrDetector output is formatted as a BED file with 18 columns, which correspond to:

Sequence identifier.
Retrotransposon start coordinate within sequence.
Retrotransposon end coordinate within sequence.
Left LTR start coordinate.
Left LTR end coordinate.
Right LTR start coordinate.
Right LTR end coordinate.
% Identity between left and right LTRs (0-100).
Left Target Site Duplication start coordinate.
Left Target Site Duplication end coordinate.
Right Target Site Duplication start coordinate.
Right Target Site Duplication end coordinate.
Polypurine Tract start coordinate.
Polupurine Tract end coordinate.
Strand on chromosome (+ or -).
Percentage of purines in Polypurine Tract (0-100).
TG motif start coordinate.
CA motif end coordinate.

Dashes on any column pertaining to the Target Site Duplication or Polypurine Tract indicate that no such element was found.

This output format provides up to three alternatives for the boundary of the element. Columns 2 and 3 are set to be immediately inside of the TSDs if they exist, otherwise they are the boundaries denoted by the alignment adjustment step. Columns 4 and 7 report the alignment adjusted boundaries directly. Columns 17 and 18 report the boundaries based on the TG..CA box, if one exists.

To run visualization tool:

Step 1: Run LtrDetector with these three flags -rawScores -cleanedScores -bedFormat

LtrDetector -chromDir ~/Data/thaliana/Fa -destDir ~/Data/thaliana/detector -rawScores -cleanedScores -bedFormat

Step 2: Create a virtual environment using venv module of Python3 python3 -m venv myEnvironment myEnvironment is the name of a folder that will house your virtual environment. You can name it whatever you want.

Step 3: Activate virtual environment source myEnvironment/bin/activate You must now run the following commands from the same terminal. If you want to use multiple terminals, the virtual environment must be activated in all of them.

Step 4: Install python dependencies using provided requirements.txt file. pip install -r requirements.txt

Step 5: To run the visualization tool on chr1, pass LTR-RTs found in chr1 (.bed), output directory where the graphs will be stored, and the scores file (raw or cleaned).

python visualize.py ~/Data/thaliana/detector/chr1Detector.bed ~/Data/thaliana/detector/test ~/Data/thaliana/detector/chr1RawScores.csv

-arg	Description	Default
-chromDir	Directory containing files to be scanned. Files must have .fa extension.	required
-destDir	Output directory where the results are stored.	required
( IMPORTANT: Files under the output directory are deleted at the start of the program.)
-minLen	Minimum length of complete LTR-RTs. Constrains scoring system and filters.	400
-maxLen	Maximum length of complete LTR-RTs. Constrains scoring system and filters.	22000
-minLenLTR	Minimum length of LTR direct repeat. Constrains filters.	100
-maxLenLTR	Maximum length of LTR direct repeat. Constrains filters.	6000
( Note run time is highly dependent on this parameter, as it provides an upper bound for alignment length in the boundary adjustment step.)
-id	Minimum identity [0-100] between 5' and 3' LTRs.	85
-k	Length of k-mers to adjust scoring system. Tradeoff between noise and resistance to mutation.	13
-plateauSeed	Minimum length of plateaus to be initially considered 'Keep' in merging step.	10
-nThreads	Number of cores to be used.	1
-gapTol	Number of base pairs that two plateaus can differ by in height/distance. Affects both plateau merging and pairing steps.	200
-seqLevel	Forces parallel execution on sequences within multi-FASTA file. Loads all sequences into memory	disabled
-rawScores	prints the raw scores to a file called xxxxRawScores.txt under the output directory.	disabled
-cleanedScores	prints the scores after merging to a file called xxxxCleanedScores.txt under the output directory.	disabled
-nested	searches for nested elements. Results are stored in seperate files (marked as xxxxNestedDetector.bed) under the output directory	disabled
-bedFormat	prints BED format without additional annotations (PPT and TSD).	disabled
-help	prints this help message.	disabled

License

Academic/non-profit use: The software is provided as-is under the GNU GPLv3.

Any restrictions to use for-profit: License needed.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
groundTruth		groundTruth
src		src
HowToCompile.txt		HowToCompile.txt
README.md		README.md
requirements.txt		requirements.txt
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LtrDetector

Compiling source code

MacOS

Linux (Ubuntu)

Compile

Example run with the defaults:

Example run on 4 cores:

Output Explanation

To run visualization tool:

License

About

Releases 1

Packages

Languages

BioinformaticsToolsmith/LtrDetector

Folders and files

Latest commit

History

Repository files navigation

LtrDetector

Compiling source code

MacOS

Linux (Ubuntu)

Compile

Example run with the defaults:

Example run on 4 cores:

Output Explanation

To run visualization tool:

License

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages