Skip to content

Commit

Permalink
2024 update
Browse files Browse the repository at this point in the history
  • Loading branch information
ahl27 committed Jan 26, 2024
1 parent 45439a3 commit 392bf11
Show file tree
Hide file tree
Showing 7 changed files with 50 additions and 35 deletions.
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: CompGenomicsBioc2022
Title: Comparative Genomics Analyses with SynExtend and DECIPHER
Version: 3.3.0
Version: 3.4.0
Authors@R:
person(given = "Aidan",
family = "Lakshman",
Expand All @@ -18,8 +18,8 @@ Description: In the past decade, the number of sequenced proteins with unknown f
- Gene calling and annotation using the DECIPHER package
- Identification of clusters of orthologous genes using the SynExtend package
- Construction of alignments and phylogenetic trees using the new TreeLine function in the DECIPHER package
- Prediction of functional associations using ProtWeaver in the SynExtend package
The talk will highlight our newest functionality, ProtWeaver, which implements several methods commonly used in the literature
- Prediction of functional associations using EvoWeaver in the SynExtend package
The talk will highlight our newest functionality, EvoWeaver, which implements several methods commonly used in the literature
to prediction protein functional association. We will briefly show how each algorithm works, then apply them to a real set of
proteins of unknown function. This workshop will teach participants how to extract useful information from large biological
sequence data with comparative genomics and predict the function of uncharacterized proteins.
Expand Down
11 changes: 8 additions & 3 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# Version 3.4.0
### 2024 Update
* `ProtWeaver` and `ProtWeb` updated to `EvoWeaver` and `EvoWeb` (respectively)
* Added some notes on correct package versions for use in this tutorial

# Version 3.3.0
### General Notes
* First update after Bioconductor conference
Expand Down Expand Up @@ -32,7 +37,7 @@

# Version 2.1.0
### General Notes
* `ProtWeaver` analysis now uses the correct dataset
* `EvoWeaver` analysis now uses the correct dataset
* Conclusions page updated
* `IdTaxa` training set fixed
* Download links moved to be on each page
Expand All @@ -44,11 +49,11 @@

# Version 2.0.0
### General Notes
* All examples except `ProtWeaver` analysis use the same dataset
* All examples except `EvoWeaver` analysis use the same dataset
* `IdTaxa` now uses the correct training set

### Future Work
* `ProtWeaver` example will be updated
* `EvoWeaver` example will be updated
* Conclusions section will be updated with actual takeaways
* Writeup within each section will be changed to reflect new examples
* Images will be added
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ The `SynExtend` and `DECIPHER` packages for R incorporate a wealth of easy to us

I've summarized on this page all the skills you can expect to learn by working through the tutorials on this site. When you're ready to get started, check out the [Overview](https://www.ahl27.com/CompGenomicsBioc2022/articles/CompGenomicsBioc2022.html) page!

**Note:** When this tutorial was originally given, multiple steps of the pipeline used a function called `ProtWeaver`. This has since been renamed to `EvoWeaver` (as well as `ProtWeb` renamed to `EvoWeb`). I've attempted to correct all the locations where this occurs, but you may encounter references to the old naming scheme in files available through the Docker image.

----------------------------------

## Topics Covered
Expand Down Expand Up @@ -36,7 +38,7 @@ Each COG comprises sets of conserved orthologs across species. These data, combi

### [Identifying Co-evolving Gene Collectives with `SynExtend`](https://www.ahl27.com/CompGenomicsBioc2022/articles/CoevolutionNetworks.html)

With these data, we can analyze patterns in evolutionary signal across COGs. Co-evolutionary signal between genes implies functional association, so finding COGs under shared selective pressure aids us in uncovering the mechanisms of intracellular pathways. Users will learn to use the `ProtWeaver` class to tease out subtle evidence of correlated evolutionary pressure in order to create co-evolutionary networks.
With these data, we can analyze patterns in evolutionary signal across COGs. Co-evolutionary signal between genes implies functional association, so finding COGs under shared selective pressure aids us in uncovering the mechanisms of intracellular pathways. Users will learn to use the `EvoWeaver` class to tease out subtle evidence of correlated evolutionary pressure in order to create co-evolutionary networks.

[<sup>Function Reference</sup>](https://www.ahl27.com/CompGenomicsBioc2022/reference/index.html#finding-co-evolving-gene-collectives)

Expand Down
8 changes: 4 additions & 4 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,9 @@ reference:
desc: >
Functions for finding co-evolutionary signal between COGs
contents:
- SynExtend::ProtWeaver
- SynExtend::predict.ProtWeaver
- SynExtend::EvoWeaver
- SynExtend::predict.EvoWeaver
- SynExtend::simMat
- SynExtend::ProtWeb
- SynExtend::plot.ProtWeb
- SynExtend::EvoWeb
- SynExtend::plot.EvoWeb

40 changes: 20 additions & 20 deletions vignettes/CoevolutionNetworks.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,14 @@ library(igraph)


```{r echo=FALSE, out.width='100%'}
knitr::include_graphics('images/PipelineProtWeaver.png')
knitr::include_graphics('images/PipelineEvoWeaver.png')
```

## Coevolutionary Analysis

At this point, we've walked through the steps to take a set of sequences and obtain
a set of COGs with phylogenetic reconstructions for each COG. We're now ready to look for signals of coevolution, which imply functional associations between COGs.
These methods are implemented via the `ProtWeaver` class in `SynExtend`, which
These methods are implemented via the `EvoWeaver` class in `SynExtend`, which
includes many commonly used methods for detecting coevolutionary patterns.

While the previous steps have only utilized a small subsample of the data, we're
Expand All @@ -41,7 +41,7 @@ We ran the complete pipeline of identifying and annotating genes with `DECIPHER`
is performed entirely within `SynExtend` and `DECIPHER`; no external packages
or data are required aside from the input genomes.

We now use the new `ProtWeaver` class to try to find COGs that show evidence of
We now use the new `EvoWeaver` class to try to find COGs that show evidence of
correlated evolutionary selective pressures, also referred to as 'coevolutionary
signal'.

Expand Down Expand Up @@ -123,7 +123,7 @@ plot(CogTrees[[7]], leaflab='none', main='COG 7')
```

There is a ton of data here, and we unfortunately don't have time to look at
all of it. To demonstrate some of the things we can do with `ProtWeaver`, we're
all of it. To demonstrate some of the things we can do with `EvoWeaver`, we're
going to look at subset of the data that is easier to investigate.

We'll subset the COGs to ones that meet the following characteristics:
Expand Down Expand Up @@ -207,22 +207,22 @@ for ( i in seq_along(CogsAnnot) ) {
}
```

Now we can make our `ProtWeaver` object. `ProtWeaver` has multiple input options,
Now we can make our `EvoWeaver` object. `EvoWeaver` has multiple input options,
either a list formatted like `COGs` (list with gene identifiers) or a list like
`CogTrees` (list with gene trees).

```{r eval=FALSE}
# ProtWeaver constructor
pw <- ProtWeaver(WTrees)
# EvoWeaver constructor
pw <- EvoWeaver(WTrees)
```

The `ProtWeaver` constuctor automatically detects the type of data you have and
The `EvoWeaver` constuctor automatically detects the type of data you have and
adjusts available predictors accordingly. While it functions best with a list
of dendrograms for each COG, it can also run with simple presence/absence patterns.
See the documentation file for `ProtWeaver` for more information on this functionality.
See the documentation file for `EvoWeaver` for more information on this functionality.

We're now ready to make predictions. Predicting functional associations is done
with the `predict.ProtWeaver` S3 method. Let's examine possible functional associations between the COGs we have.
with the `predict.EvoWeaver` S3 method. Let's examine possible functional associations between the COGs we have.

```{r echo=FALSE}
OutFile <- system.file('extdata', 'CoevNetworks',
Expand All @@ -240,13 +240,13 @@ print(preds)
```
## Viewing our results

Notice that `preds` is a `ProtWeb` object. This is just a simple S3 class with a
pretty print method wrapping a matrix of pairwise association scores. `ProtWeb`
Notice that `preds` is a `EvoWeb` object. This is just a simple S3 class with a
pretty print method wrapping a matrix of pairwise association scores. `EvoWeb`
inherits from the `simMat` class, which incorporates functionality similar
to `dist` objects (notably memory-efficient storage of symmetric matrices) but with matrix-like operations and pretty print functions.
See `?simMat` for more info.

`ProtWeb` objects are easily coercible to matrices using `as.matrix`, but be aware matrix representations require roughly double the storage. `as.data.frame` will coerce `ProtWeb` objects into a `3xN` matrix of pairwise scores.
`EvoWeb` objects are easily coercible to matrices using `as.matrix`, but be aware matrix representations require roughly double the storage. `as.data.frame` will coerce `EvoWeb` objects into a `3xN` matrix of pairwise scores.

We can use these functions to examine a histogram of association scores for all our predictions:

Expand All @@ -256,11 +256,11 @@ hist(pwData[upper.tri(pwData)], main='Histogram of Association Scores',
xlab='Strength of Association')
```

The `ProtWeb` class will be updated next release cycle to include more methods,
including a custom plotting function. The current `plot.ProtWeb` S3 method
The `EvoWeb` class will be updated next release cycle to include more methods,
including a custom plotting function. The current `plot.EvoWeb` S3 method
implements a force-directed embedding of the pairwise scores, but it's a
big work-in-progress. Stay tuned for the next release cycle for more functionality
regarding `ProtWeb`.
regarding `EvoWeb`.

In the meantime, we can use the `igraph` package to find clusters of coevolving
COGs.
Expand Down Expand Up @@ -300,11 +300,11 @@ nickle/cobalt transporter. We also see some other proteins that could have activ

We chose cluster 3 because it's small and easy to see common function. Louvain clustering identifies 4 total clusters, with cluster 1 seemingly related to stress response activity, cluster 2 related to transporting and using metals, cluster 3 related to urease activity, and cluster 4 relating to arabinogalactan/maltooligosaccharide transport.

It's important to note here that this analysis is agnostic to the presence/absence of annotations. We subset our proteins to ones with high confidence annotations in this example primarily for purposes of demonstration. In actual applications we would use `ProtWeaver` with annotated and non-annotated proteins to infer novel function of hypothetical proteins.
It's important to note here that this analysis is agnostic to the presence/absence of annotations. We subset our proteins to ones with high confidence annotations in this example primarily for purposes of demonstration. In actual applications we would use `EvoWeaver` with annotated and non-annotated proteins to infer novel function of hypothetical proteins.

## Methods Implemented in ProtWeaver
## Methods Implemented in EvoWeaver

By default, `predict.ProtWeaver` makes an ensemble prediction using as many individual
By default, `predict.EvoWeaver` makes an ensemble prediction using as many individual
models as it can run with the data provided. However, users are free to use any of
the individual models without the ensemble predictor. The methods implemented are
the following:
Expand Down Expand Up @@ -342,7 +342,7 @@ ResidueMI <- predict(pw, Method='ResidueMI')

## Runtime Considerations

Runtime of `ProtWeaver` depends on the number of COGs and the method used. Most of
Runtime of `EvoWeaver` depends on the number of COGs and the method used. Most of
the methods are extremely quick, on the order of 0-1 seconds per comparison.
The default `Method="Ensemble"` implements the fastest methods, giving a good
compromise between speed and accuracy. For a set of 100 COGs, `Ensemble` prediction
Expand Down
6 changes: 3 additions & 3 deletions vignettes/Conclusion.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,9 @@ our lab for more information!

If you've made it through this entire tutorial, thank you for following along!
I hope this series was informative and useful to your analyses. All code showcased
here is actively being worked on by members of our lab, especially the `ProtWeaver`
and `ProtWeb` functionalities. If you have any comments, suggestions, or feature requests
for `ProtWeaver`, `ProtWeb`, or this tutorial, please feel free to either email me at
here is actively being worked on by members of our lab, especially the `EvoWeaver`
and `EvoWeb` functionalities. If you have any comments, suggestions, or feature requests
for `EvoWeaver`, `EvoWeb`, or this tutorial, please feel free to either email me at
ahl27@pitt.edu or open an issue on GitHub.

## Other Resources
Expand Down
10 changes: 9 additions & 1 deletion vignettes/Setup.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,15 @@ This workshop depends on two main packages: `DECIPHER` and `SynExtend`.
These will be installed via [Bioconductor](http://bioconductor.org/), a package
manager for open source bioinformatics projects in R. These tutorials will use
`DECIPHER` version `2.25.0` and `SynExtend` version `1.9.6`, which are available on
the development version of Bioconductor.
the development version of Bioconductor. If you are viewing this tutorial in the future
(after 2022), just use the most recent release version of `DECIPHER` and `SynExtend`--
we plan to maintain backwards compatibility, so everything should work fine. If you encounter
issues using the latest release with these tutorials, please feel free to open an issue on GitHub
or email me.

**2024 Update: These functions have all been officially released in Bioconductor.**
**The most recent working release is SynExtend version 1.14.0, available fwrom Bioconductor version 3.18.**
**Similarly, `DECIPHER` version 2.30.0 should be used instead of 2.25.0**.

```{r eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
Expand Down

0 comments on commit 392bf11

Please sign in to comment.