# Tutorial: Principal Coordinate Analysis (PCA) using rTASSEL

## Enter your title here  

**Objective**: Describe the objective of this analysis   
**Data**: Describe your data       
**User and contact**: your name, your contact     

### Table of contents
* [Notes](#Notes) 
* [Libraries](#Libraries)
* [Parameters and functions](#Parameters-and-functions)
* [Data](#Data)
* [Analysis](#Analysis)
    + [Filter genotype data in rTASSEL](#Filter-genotype-data-in-rTASSEL)
    + [PCA with genotype data](#PCA-with-genotype-data)
    + [Add metadata to scatterplot](#Add-metadata-to-scatterplot)
* [References and additional resources](#References-and-additional-resources)

## Notes

This tutorial assumes: 
1. You already know how to load your data via a flat file into rTASSEL and will inspect your data:
    - See 01_rTASSEL_Load_Data.ipynb for a tutorial on how to load flat tiles into rTASSEL
2. You will filter your genotype data as appropriate for your data set and analysis:
    - See 02_rTASSEL_GenotypeFiltering.ipynb for a tutorial on how to filter genotype data in rTASSEL.
3. You have a csv file for metadata with a "Taxa" field that matches the taxa in your genotype file. 

More on the `pca()` function can be found [here](https://rtassel.maizegenetics.net/reference/pca.html), `plotScree()` [here](https://rtassel.maizegenetics.net/reference/plotScree.html) and `plotPCA()` [here](https://rtassel.maizegenetics.net/reference/plotPCA.html).

In [None]:
getwd()

In [None]:
Sys.Date()

## Libraries

In [None]:
library(rTASSEL) 

## Parameters and functions

**Please edit the paths to your data:**

In [None]:
# Path to genotype data
myGenoPath <- "/path/to/genotype/data"

# Path to taxa metadata 
myMetadataPath <- "/path/to/metadata"

Create a function for the setting the dimensions of a plot:

In [None]:
fig <- function(width, heigth) {
    options(
        repr.plot.width  = width, 
        repr.plot.height = heigth
    )
}

## Data

Load metadata into R:

In [None]:
taxaMetadata <- read.csv(file = myMetadataPath)
taxaMetadata |> head()

Load genotype data into rTASSEL:

In [None]:
tasGeno <- rTASSEL::readGenotypeTableFromPath(
    path = myGenoPath
)
tasGeno

## Analysis

### Filter genotype data in rTASSEL

Perform additional filtering steps in rTASSEL for your data set and analysis:  
- See 02_rTASSEL_GenotypeFiltering.brapi.ipynb for more details about filtering.

In [None]:
# Example only
#tasGeno |>
#    filterGenotypeTableTaxa(
#        minNotMissing = .5
#    ) |>
#    filterGenotypeTableSites(
#        siteMinAlleleFreq = 0.05,
#        maxHeterozygous = 0.5
#    )

### PCA with genotype data 

Run principle coordinate analysis on your genotype data using the `pca()` function in rTASSEL:

In [None]:
pcaGeno <- tasGeno |> rTASSEL::pca()

In [None]:
pcaGeno

In [None]:
pcaGeno |> reportNames()

In [None]:
pcaGeno |> tableReport("Eigenvalues_Datum") |> head()

In [None]:
Set plot dimensions with the figure function created at the start of the notebook:

In [None]:
fig(10,10)

Create a scree plot using the eigenvalues generated in your PCA with the `plotScree()` function:

In [None]:
pcaGeno |> plotScree()

In [None]:
pcaGeno |> tableReport("PC_Datum") |> head()

Create a scatter plot with your chosen principal components using `plotPCA()`:

In [None]:
pcaGeno |> plotPCA(
    x = 1,
    x = 2
)

### Add metadata to scatterplot

In [None]:
taxaMetadata |> head()

In [None]:
pcaGeno |> plotPCA(
    x = 1,
    x = 2,
    metadata = taxaMetadata,
    mCol = "Subpopulation")

## References and additional resources

To cite rTASSEL, please use the following citation:

Monier et al., (2022). rTASSEL: An R interface to TASSEL for analyzing genomic diversity. Journal of Open Source Software, 7(76), 4530, https://doi.org/10.21105/joss.04530.

You can find more information about rTASSEL [here](https://rtassel.maizegenetics.net)

and an rTASSEL tutorial in binder [here](https://mybinder.org/v2/gh/btmonier/rTASSEL_sandbox/HEAD?labpath=getting_started.ipynb).