Phylogenomic Protein Function Prediction

This project seeks to implement a full phylogenetic pipeline to serve as a tool for transmembrane helix prediction. A target sequence and neighboring homologs are annotated by TMHMM and RAxML's maximum likelihood constructor is used to determine contribution scores to each sequence. See Presentation files for a brief overview of core functionality.

Implementing a full phylogenomic pipeline

Given an input sequence this software gathers the closest 100 homologs, generates a multiple sequence alignment, masks that alignment, and then uses RAxML's maximum likelhood estimator to generate a phylogenetic tree.

Annotation Transfer Protocol

This software uses an annotation transfer protocol based on evolutionary distances between proteins to transfer TMH annotations. That is, the more closely related a hit is to the target protein the more it's annotation at a particular site would matter in the determination of the target's true annotation.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
MSA_Analytics		MSA_Analytics
Notebooks		Notebooks
Presentation		Presentation
__pycache__		__pycache__
benchmarking		benchmarking
operating_reqs		operating_reqs
tree_construction		tree_construction
MoreThan70Percent.fasta		MoreThan70Percent.fasta
README.md		README.md
aligned40-70Percent.fasta		aligned40-70Percent.fasta
benchmarking.py		benchmarking.py
lessThan30-40.fasta		lessThan30-40.fasta
main.py		main.py
reformat.pl		reformat.pl
requirements.txt		requirements.txt
tree_accuracy.txt		tree_accuracy.txt

Sharabesh/Phylogenomics

Folders and files

Latest commit

History

Repository files navigation

Phylogenomic Protein Function Prediction

Implementing a full phylogenomic pipeline

Annotation Transfer Protocol

About

Topics

Resources

Stars

Watchers

Forks

Languages