phylomapr

Gene founder events facilitate evolutionary innovations. phylomapr enables quick retrieval of precomputed gene age maps (phylomaps) in R. Gene age maps loaded from phylomapr integrate seamlessly with myTAI. Furthermore, carbon footprint of computational work is on the rise. This package helps alleviate that for gene age inference.

Installation

# install biomartr
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("ropensci/biomartr")

devtools::install_github("LotharukpongJS/phylomapr")

Use Cases

Retrieve gene age maps using `phylomapr`

Load the phylomap of Apostichopus japonicus (Japanese sea cucumber) generated using the GenEra.

# either
Aj.map <- phylomapr::Apostichopus_japonicus.PhyloMap
# or alternatively
library(phylomapr)
Aj.map <- Apostichopus_japonicus.PhyloMap

head(Aj.map)

  Phylostratum                         GeneID
1            2 tr|A0A0B6VS88|A0A0B6VS88_STIJA
2            1 tr|A0A0G2R1N3|A0A0G2R1N3_STIJA
3            1 tr|A0A0H4BK46|A0A0H4BK46_STIJA
4            3 tr|A0A0X7YCD7|A0A0X7YCD7_STIJA
5            1 tr|A0A1B2ZDN7|A0A1B2ZDN7_STIJA
6            2 tr|A0A1X9J403|A0A1X9J403_STIJA

To get the data description.

?Apostichopus_japonicus.PhyloMap

Apostichopus_japonicus.PhyloMap   package:phylomapr    R Documentation

Phylomap of Apostichopus japonicus

Description:

     Gene ages inferred using GenEra on refence protein sequences from
     Uniprot proteomes. Note: DIAMOND was run using the ultra-sensitive
     mode.

Usage:

     Apostichopus_japonicus.PhyloMap
     
Format:

     A tibble with 30,032 rows and 2 variables:

     Phylostratum dbl Phylostratum (or gene age) assignment

     GeneID chr GeneID annotation from UniProt

Source:

     <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02895-z>

Loading gene age maps into `myTAI`

myTAI facilitates evolutionary transcriptomic studies. Below are some ways in which gene age maps retrieved via phylomapr can be integrate seamlessly into myTAI.

Plot the developmental hourglass (on simulated gene expression data)

using simulated developmental gene expression of Apostichopus japonicus (Japanese sea cucumber).

Aj.map <- phylomapr::Apostichopus_japonicus.PhyloMap

Simulate developmental gene expression.

# Set the random seed for reproducibility
set.seed(123)

# Generate log-normally distributed counts (controversial) for each gene and developmental stage, and
# Create a data frame with the count table
Aj.ExpressionMatrix <- tibble::tibble(
  GeneID = Aj.map$GeneID,
  `24H` = stats::rlnorm(length(Aj.map$GeneID), meanlog = 3, sdlog = 1),
  `48H` = stats::rlnorm(length(Aj.map$GeneID), meanlog = 3, sdlog = 1),
  `72H` = stats::rlnorm(length(Aj.map$GeneID), meanlog = 3, sdlog = 1)
)

Aj.PES <- myTAI::MatchMap(Aj.map, Aj.ExpressionMatrix, remove.duplicates = FALSE, accumulate = NULL)

And test the hourglass on the simulated data.

myTAI::PlotSignature(tidyr::drop_na(Aj.PES))

Next, transform the simulated gene expression data

Note: this requires myTAI (version > 1.0.1.0000).

Aj.PES.log2 <- myTAI::tf(tidyr::drop_na(Aj.PES),FUN = log2, pseudocount = 1)
hist(Aj.PES.log2$`24H`)

Compare this to the distribution of raw abundance (TPM).

hist(Aj.PES$`24H`, breaks = 200)

myTAI::PlotSignature(tidyr::drop_na(Aj.PES.log2))

Tutorials

Gene names in different databases: GeneIDs can differ between databases. This could be an issue when the gene age is estimated with one gene naming convention and the RNA-seq mapping is done with another. This tutorial shows how one could convert gene IDs (convertID()) between databases.
Adding phylomaps to phylomapr: Advanced gene age (phylo)mappers who ran their own gene age inference may want to contribute to phylomapr, which is at its core a collaborative effort. This tutorial shows how one could add new phylomaps to phylomapr.

Citation

Citations are provided in the data description. Just put a ? in front of the dataset.

Acknowledgement

I would like to thank several individuals for making this mini-project possible.

First I would like to thank Hajk-Georg Drost for providing me with the intellectual environment that enabled this project.

Furthermore, I would like to thank Susana M. Coelho for hosting and facilitating this research, as well as the Max Planck Institute for Biology Tübingen and the Max Planck Society.

I also thank the BMBF-funded de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) (031A532B, 031A533A, 031A533B, 031A534A, 031A535A, 031A537A, 031A537B, 031A537C, 031A537D, 031A538A).

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github		.github
R		R
data-raw		data-raw
data		data
man		man
pkgdown		pkgdown
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
phylomapr.Rproj		phylomapr.Rproj

License

Licenses found

LotharukpongJS/phylomapr

Folders and files

Latest commit

History

Repository files navigation

phylomapr

Installation

Use Cases

Retrieve gene age maps using phylomapr

Loading gene age maps into myTAI

Plot the developmental hourglass (on simulated gene expression data)

Next, transform the simulated gene expression data

Tutorials

Citation

Acknowledgement

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Languages

Retrieve gene age maps using `phylomapr`

Loading gene age maps into `myTAI`