LTR Predictor

LTR predictor is a pure Python3-based BLAST-like endogenous retrovirus (ERV) alignment tool, helping you identify the chromosomal coordinates of the input consensus sequence. The LTR predictor can be run in Linux or Windows command line.

You can enter ONE FASTA file of the consensus sequence as a query sequence, and ONE FASTA file with exactly ONE chromosome sequence. (Please note that it may take ~2 hours to get the alignment result of human chromosome 1.)

The output file will be in BED format with the NAME of the chromosome, START & END POSITIONS of the feature in chromosomal coordinates.

Documentation

Operating systems: Linux, Windows
Requirements: Python (3.6+)
Supported Format: FASTA (.fa)

Quickstart Guide

Development

Load main.py in your favorite Python development platform.

Execute

python main.py

Basic usage

Optional Arguments:

-h, --help
show this help message and exit

-r REF, --ref REF
input the path of the reference FASTA file, default is ./tests/chr21.fa

-q QUERY, --query QUERY input the path of the query FASTA file, default is ./tests/LTR5_Hs.fa

-p PATH, --path PATH
define the path of an output BED file, default is ./build

-o OUTPUT, --output OUTPUT
define the name of the output BED file, default is build

-m MISMATCH, --mismatch MISMATCH
input the number of mismatches allowed during merging nearby seeds, default is 5

-g GAP, --gap GAP
input the value of gap penalty, default is -5

-t THRESHOLD, --threshold THRESHOLD input the threshold of Smith¨CWaterman score, default is 500

-e ESCORE, --Escore ESCORE
input the threshold E-score, default is 0.1

Visualization

The output BED file can be imported into a genomics data viewer to visualize, for instance, IGV, as shown above.

Acknowledgments

LTR predictor uses the following third-party libraries:

numpy for large matrices storage and processing
pandas for dataframe storage and processing
argparse for creating command-line interfaces
tqdm for creating progress bar
pyfaidx for reading FASTA files

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
build		build
docs		docs
tests		tests
Copying.md		Copying.md
License.md		License.md
Read_fasta.py		Read_fasta.py
Readme.md		Readme.md
SW_scoring.py		SW_scoring.py
Score_filter.py		Score_filter.py
Seeding.py		Seeding.py
Write_BED.py		Write_BED.py
main.py		main.py

License

Licenses found

Haoninghui/BMI3_Project1

Folders and files

Latest commit

History

Repository files navigation

LTR Predictor

Documentation

Quickstart Guide

Development

Execute

Basic usage

Optional Arguments:

Visualization

Acknowledgments

About

Resources

License

Licenses found

Stars

Watchers

Forks

Languages