On the automatic annotation of gene functions usingobservational data and phylogenetic trees

This repository contains all the materials that are necesary to reproduce the figures and tables of the paper, including a set of novel predictions.

There are two main sections: the simulations folder and the parameter-estimates folder. The first deals with the simulation study generating random annotated phylogenetic trees using the PANTHER database, the latter fits the pooled-data model using 138 different sets of annotations and makes the predictions, all using a combination between the GO dataset and the PANTHER phylogenetic trees.

The aphylo package

All of the methods presented in this paper are available in the R package aphylo. To install the aphylo package, you should use the following command:

devtools::install_github("USCbiostats/aphylo")

Overview of the repository

In general, each dataset, figure, or table has its own R script used to be generated. Furthermore, resulting files have the same name of the R script that created it, only changing in the extension, for example, the R script candidate_trees.r generates the file candidate_trees.rds, which has the 138 phylogenetic trees (including annotations) used throughtout the paper.

data-raw Contains raw data used in the paper. The main dataset here is the set of experimental annotations from GOA.
data Contains the scripts used to process the data, including, reading the panther trees, GOA annotations, and combining them into aphylo_tree objects. This also contains the resulting data.
fig Most figures of the paper, including the code used to generate the two trees featured in the paper (the low and high MAE).
parameter-estimates All the code used to fit the pooled models, including the obtained paramter estimates.
proposed-annotations Code used to generate annotation proposals on genes with no experimental annotations. The folder includes the actual table with proposed annotations.
sifter Code used to analyze SIFTER data.
simulations Code to generate the large simulation study in which we analyze the properties of MCMC and MLE estimates.

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
data-raw		data-raw
data		data
fig		fig
parameter-estimates		parameter-estimates
proposed-annotations		proposed-annotations
sifter		sifter
simulations		simulations
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md
aphylo-simulations.Rproj		aphylo-simulations.Rproj
global-paths.r		global-paths.r
go_annotation_file_format20.md		go_annotation_file_format20.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-raw

data-raw

data

data

fig

fig

parameter-estimates

parameter-estimates

proposed-annotations

proposed-annotations

sifter

sifter

simulations

simulations

.gitignore

.gitignore

.gitmodules

.gitmodules

Makefile

Makefile

README.md

README.md

aphylo-simulations.Rproj

aphylo-simulations.Rproj

global-paths.r

global-paths.r

go_annotation_file_format20.md

go_annotation_file_format20.md

Repository files navigation

On the automatic annotation of gene functions usingobservational data and phylogenetic trees

The aphylo package

Overview of the repository

About

Releases

Packages

Languages

USCbiostats/aphylo-simulations

Folders and files

Latest commit

History

Repository files navigation

On the automatic annotation of gene functions usingobservational data and phylogenetic trees

The aphylo package

Overview of the repository

About

Resources

Stars

Watchers

Forks

Languages