# Tutorial: Calculate and visualize linkage disequilibrium with rTASSEL

## Enter your notebook title here

**Objective**: Describe the objective here  
**Data**: Describe your data set here  
**User and contact**: Enter your name and contact here

### Table of contents
* [Notes](#Notes)
* [Libraries](#Libraries)
* [Parameters and functions](#Parameters-and-functions)
* [Data](#Data)
* [Analysis](#Analysis)
    + [Filter TASSEL genotype object by position](#Filter-TASSEL-genotype-object-by-position)
    + [Generate linkage disequilbrium metrics](#Generate-linkage-disequilbrium-metrics)
* [Visualize](#Visualize)
* [References and additional resources](#References-and-additional-resources)

## Notes

This tutorial assumes: 
1. You already know how to load your data via a BrAPI database into rTASSEL and will inspect your data:
    - See 01_rTASSEL_load_data.brapi.ipynb on how to load files via BrAPI databases.
2. You will filter your genotype data as appropriate for your data set and analysis:
    - See 02_rTASSEL_GenotypeFiltering.brapi.ipynb for a tutorial on how to filter genotype data when retrieving data via BrAPI and using rTASSEL.

About calculating LD in rTASSEL:   

Linkage disequilibrium between any set of polymorphisms can be estimated by initially filtering a genotype dataset and then using `linkageDiseq()` or `ldPlot()`. At this time, $D'$, $r^2$ and $P$-values will be estimated. The current version calculates LD between haplotypes with known phase only (unphased diploid genotypes are not supported; see PowerMarker or Arlequin for genotype support). 

- $D$' is the standardized disequilibrium coefficient, a useful statistic for determining whether recombination or homoplasy has occurred between a pair of alleles. 

- $r^2$ represents the correlation between alleles at two loci, which is informative for evaluating the resolution of association approaches. 

$D'$ and $r^2$ can be calculated when only two alleles are present. If more than two alleles, only the two most frequent alleles are used. $P$-values are determined by a two-sided Fisher's Exact test is calculated. Since LD is meaningless when scored with very small sample sizes, a minimum of 20 taxa must be present to calculate LD and there must be 2 or more minor alleles.
 
Additional documentation on the `linkageDiseq()` function in rTASSEL can be found [here](https://rtassel.maizegenetics.net/reference/linkageDiseq.html) and `ldPlot()` [here](https://rtassel.maizegenetics.net/reference/ldPlot.html).

In [None]:
getwd()

In [None]:
Sys.Date()

## Libraries

In [None]:
library(QBMS)
library(rTASSEL)

## Parameters and functions

Create a function for setting the dimensions of a plot:

In [None]:
fig <- function(width, height) {
    options(
        repr.plot.width  = width,
        repr.plot.height = height
    )
}

## Data

### Retrieve BrAPI data and filter

**You will need to log into Gigwa using the BrAPI helper.**

In [None]:
geno_provider$gigwa_list_dbs()

**Please edit the code to set your database (db):**

In [None]:
geno_provider$gigwa_set_db("myDatabase")

In [None]:
geno_provider$gigwa_list_projects()

**Please edit the code to set your project:**

In [None]:
geno_provider$gigwa_set_project("myProject")

**Edit the below code to use appropriate filters for your data set and analysis, additional filtering can be done after retrieving the data and loading into rTASSEL.** 

In [None]:
genoDataFromGigwa <- geno_provider$gigwa_get_variants(
    max_missing = 0.2,
    min_maf = 0.05
)

#### Inspect genotype data in R

In [None]:
genoDataFromGigwa |> head()
genoDataFromGigwa |> dim()
genoDataFromGigwa |> names()

#### Load genotype data into rTASSEL

In [None]:
tasGeno <- genoDataFromGigwa |> rTASSEL::readGenotypeTableFromGigwa()

In [None]:
tasGeno

**Perform filtering steps in rTASSEL for your data set and analysis:**  
    - See 02_rTASSEL_GenotypeFiltering.ipynb for more details about filtering.

## Analysis

### Filter TASSEL genotype object by position
**Please edit for your interested position range:**

In [None]:
tasGenoFilter <- filterGenotypeTableSites(
    tasObj              = tasGeno,
    siteRangeFilterType = "position",
    startPos            = 228e6,
    endPos              = 300e6,
    startChr            = 2,
    endChr              = 2
)

### Generate linkage disequilbrium metrics 

This is an example of how to use the `linkageDiseq()` function, please edit the parameters for your analysis. See [here](https://rtassel.maizegenetics.net/reference/linkageDiseq.html) for a description of the options.

In [None]:
ldCalc <- linkageDiseq(
  tasGenoFilter,
  ldType = "All",
  windowSize = NULL,
  hetCalls = "ignore",
  verbose = TRUE
)

In [None]:
ldCalc |> head()

## Visualize 

Use can also use your filtered TASSEL object to visualize linkage disequilibrium in a plot generated in rTASSEL.

This is an example of how to use the `ldPlot()` function, please edit the parameters for your analysis. See [here](https://rtassel.maizegenetics.net/reference/ldPlot.html) for a description of the options.

In [None]:
myLDplot <- ldPlot(
    tasObj  = tasGenoFilter,
    ldType  = "All",
    windowSize = NULL,
    hetCalls = "ignore",
    plotVal = "r2",
    verbose = TRUE
)

First, set the plot dimensions with the figure function created at the start of the notebook: 

In [None]:
fig(10,10)

Display plot:

In [None]:
myLDplot

## References and additional resources

To cite rTASSEL, please use the following citation:

Monier et al., (2022). rTASSEL: An R interface to TASSEL for analyzing genomic diversity. Journal of Open Source Software, 7(76), 4530, https://doi.org/10.21105/joss.04530.

**Please also cite QBMS using the following citation:**

Al-Shamaa K (2023). QBMS: Query the Breeding Management System(s). R package version 0.9.1, https://icarda-git.github.io/QBMS/.

You can find more information about rTASSEL [here](https://rtassel.maizegenetics.net)

and an rTASSEL tutorial in binder [here](https://mybinder.org/v2/gh/btmonier/rTASSEL_sandbox/HEAD?labpath=getting_started.ipynb).