# Tutorial: Creating phylogenetic trees with rTASSEL

## Enter your notebook title here

**Objective**: Describe the objective here  
**Data**: Describe your data set here  
**User and contact**: Enter your name and contact here

### Table of contents
* [Notes](#Notes) 
* [Libraries](#Libraries)
* [Parameters and functions](#Parameters-and-functions)
* [Data](#Data)
* [Analysis](#Analysis)
    + [Phylogeny using neighbor joining method](#Phylogeny-using-neighbor-joining-method)
    + [Phylogeny using UPGMA](#Phylogeny-using-UPGMA)
* [Visualization using ggtree](#Visualization-using-ggtree)
    + [Add metadata](#Add-metadata)
* [References and additional resources](#References-and-additional-resources)

## Notes

This tutorial assumes: 
1. You already know how to load your data via a BrAPI database into rTASSEL and will inspect your data:
    - See 01_rTASSEL_load_data.brapi.ipynb on how to load files via BrAPI databases.
2. You will filter your genotype data as appropriate for your data set and analysis:
    - See 02_rTASSEL_GenotypeFiltering.brapi.ipynb for a tutorial on how to filter genotype data when retrieving data via BrAPI and using rTASSEL.
3. You have a csv file for metadata with a "Taxa" field that matches the taxa in your genotype file.

Additional documentation on the `createTree()` function in rTASSEL can be found [here](https://rtassel.maizegenetics.net/reference/createTree.html).

In [None]:
getwd()

In [None]:
Sys.Date()

## Libraries

In [None]:
library(QBMS) #Retrieve data from BrAPI databases
library(rTASSEL) #R interface to TASSEL
library(ggplot2) #Plotting and visualization
library(ggtree) #Create phylogenetic trees 

## Parameters and functions

**Please edit the path to your own data:**

In [None]:
# Path to taxa metadata
myMetadataPath <- "/shared/commons/data/workshop_senegal/taxa_metadata.csv"

Create a function for setting the dimensions of a plot:

In [None]:
fig <- function(width, height) {
    options(
        repr.plot.width  = width,
        repr.plot.height = height
    )
}

## Data

Load metadata:

In [None]:
taxaMetadata <- read.csv(file = myMetadataPath)
taxaMetadata |> head()

### Retrieve BrAPI data and filter

**You will need to log into Gigwa using the BrAPI helper.**

In [None]:
geno_provider$gigwa_list_dbs()

**Please edit the code to set your database (db):**

In [None]:
geno_provider$gigwa_set_db("myDataBase")

In [None]:
geno_provider$gigwa_list_projects()

**Please edit the code to set your project:**

In [None]:
geno_provider$gigwa_set_project("myProject")

**Edit the below code to use appropriate filters for your data set and analysis, additional filtering can be done after retrieving the data and loading into rTASSEL.** 

In [None]:
genoDataFromGigwa <- geno_provider$gigwa_get_variants(
    max_missing = 0.2,
    min_maf = 0.05
)

#### Inspect genotype data in R

In [None]:
genoDataFromGigwa |> head()
genoDataFromGigwa |> dim()
genoDataFromGigwa |> names()

#### Load genotype data into rTASSEL

In [None]:
tasGeno <- genoDataFromGigwa |> rTASSEL::readGenotypeTableFromGigwa()

In [None]:
tasGeno

**Perform filtering steps in rTASSEL for your data set and analysis:**  
    - See 02_rTASSEL_GenotypeFiltering.ipynb for more details about filtering.

## Analysis

`rTASSEL` allows for interfacing with TASSEL's tree generation methods from
genotype information. This can be performed using the `createTree()` method
with a `TasselGenotypePhenotype` object containing genotype table information.

### Phylogeny using neighbor joining method

`Neighbor_Joining` - Neighbor Joining method. More info can be found
  [here](https://en.wikipedia.org/wiki/Neighbor_joining)

In [None]:
phyloTree_NJ <- createTree(
    tasObj = tasGeno,
    clustMethod = "Neighbor_Joining"
)

Upon creation, the `phyloTree` object is returned as a `phylo` object generated
by the [ape](https://cran.r-project.org/web/packages/ape/ape.pdf) package:

In [None]:
phyloTree_NJ

### Phylogeny using UPGMA

`UPGMA` - **U**nweighted **P**air **G**roup **M**ethod with **A**rithmetic 
  **M**ean. More info can be found [here](https://en.wikipedia.org/wiki/UPGMA).

In [None]:
phyloTree_UPGMA <- createTree(
    tasObj = tasGeno,
    clustMethod = "UPGMA"
)
phyloTree_UPGMA

## Visualization using ggtree

The `phylo` object created with rTASSEL can be used by common base-R methods (e.g. `plot()`) or other
visualization libraries such as
[ggtree](https://bioconductor.org/packages/release/bioc/html/ggtree.html).

First, set the plot dimensions with the figure function created at the start of the notebook: 

In [None]:
fig(8, 8)

Example using default parameters:

In [None]:
ggtree(phyloTree_UPGMA)

Example changing the default parameters:

In [None]:
fig(8, 8)
phyloTree_UPGMA |>
    ggtree(size = 1.5, layout = "circular", branch.length = "none") 

### Add metadata

In [None]:
taxaMetadata |> head()

In [None]:
fig(10, 10)
phyloTree_UPGMA |>
    ggtree(size = 1.5, layout = "circular", branch.length = "none") %<+%
    taxaMetadata +
    aes(color = Subpopulation)

## References and additional resources

To cite rTASSEL, please use the following citation:

Monier et al., (2022). rTASSEL: An R interface to TASSEL for analyzing genomic diversity. Journal of Open Source Software, 7(76), 4530, https://doi.org/10.21105/joss.04530.

**Please also cite QBMS using the following citation:**

Al-Shamaa K (2023). QBMS: Query the Breeding Management System(s). R package version 0.9.1, https://icarda-git.github.io/QBMS/.You can find more information about rTASSEL [here](https://rtassel.maizegenetics.net)

and an rTASSEL tutorial in binder [here](https://mybinder.org/v2/gh/btmonier/rTASSEL_sandbox/HEAD?labpath=getting_started.ipynb).

**Please also cite QBMS using the following citation:**

Al-Shamaa K (2023). QBMS: Query the Breeding Management System(s). R package version 0.9.1, https://icarda-git.github.io/QBMS/.