# Tutorial: Calculate relationship matrices with rTASSEL

## Enter your notebook title here

**Objective**: Describe the objective here  
**Data**: Describe your data set here  
**User and contact**: Enter your name and contact here

### Table of contents
* [Notes](#Notes)
* [Libraries](#Libraries)
* [Data](#Data)
* [Analysis](#Analysis)
    + [Distance matrix](#Distance-matrix)
    + [Kinship matrix](#Kinship-matrix)
    + [Working with `TasselDistanceMatrix` objects](#Working-with-TasselDistanceMatrix-objects)
    + [Coerce a `TasselDistanceMatrix` into an R object](#Coerce-a-TasselDistanceMatrix-into-an-R-object)
* [References and additional resources](#References-and-additional-resources)

## Notes

This tutorial assumes: 
1. You already know how to load your data via a BrAPI database into rTASSEL and will inspect your data:
    - See 01_rTASSEL_load_data.brapi.ipynb on how to load files via BrAPI databases.
2. You will filter your genotype data as appropriate for your data set and analysis:
    - See 02_rTASSEL_GenotypeFiltering.brapi.ipynb for a tutorial on how to filter genotype data when retrieving data via BrAPI and using rTASSEL.

Additional documentation on the `distanceMatrix()` function in rTASSEL can be found [here](https://rtassel.maizegenetics.net/reference/distanceMatrix.html) and `kinshipMatrix()` [here](https://rtassel.maizegenetics.net/reference/kinshipMatrix.html).

In [None]:
getwd()

In [None]:
Sys.Date()

## Libraries

In [None]:
library(QBMS)
library(rTASSEL)

## Data

### Retrieve BrAPI data and filter

**You will need to log into Gigwa using the BrAPI helper.**

In [None]:
geno_provider$gigwa_list_dbs()

**Please edit the code to set your database (db):**

In [None]:
geno_provider$gigwa_set_db("myDatabase")

In [None]:
geno_provider$gigwa_list_projects()

**Please edit the code to set your project:**

In [None]:
geno_provider$gigwa_set_project("myProject")

**Edit the below code to use appropriate filters for your data set and analysis, additional filtering can be done after retrieving the data and loading into rTASSEL.** 

In [None]:
genoDataFromGigwa <- geno_provider$gigwa_get_variants(
    max_missing = 0.2,
    min_maf = 0.05
)

#### Inspect genotype data in R

In [None]:
genoDataFromGigwa |> head()
genoDataFromGigwa |> dim()
genoDataFromGigwa |> names()

#### Load genotype data into rTASSEL

In [None]:
tasGeno <- genoDataFromGigwa |> rTASSEL::readGenotypeTableFromGigwa()

In [None]:
tasGeno

**Perform filtering steps in rTASSEL for your data set and analysis:**
    - See 02_rTASSEL_GenotypeFiltering.ipynb for more details about filtering.

## Analysis

### Distance matrix

TASSEL/rTASSEL calculate distance as 1 - IBS (identity by state) similarity, with IBS defined as the probability that alleles drawn at random from two individuals at the same locus are the same. For clustering, the distance of an individual from itself is set to 0. More about how the distance matrix is calculated rTASSEL can be found [here](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/DistanceMatrix/DistanceMatrix).

In [None]:
tasDist <- rTASSEL::distanceMatrix(
    tasObj = tasGeno
)
tasDist

### Kinship matrix

rTASSEL will create a kinship matrix from genotype data with the `kinshipMatrix()` function. You can then use this matrix in downstream analyses, such as in mixed linear model analyses.

In [None]:
tasKin <- rTASSEL::kinshipMatrix(
    tasObj = tasGeno
)
tasKin

`kinshipMatrix()` provides options to set the algorithm that will be used with the `method`
parameter. This default is Centered_IBS. Other options include Normalized_IBS, Dominance_Centered_IBS, and Dominance_Normalized_IBS. More info about these methods can be found
[here](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Kinship/Kinship). 

For example: 

In [None]:
tasKinNorm <- kinshipMatrix(
  tasObj = tasGeno,
  method = "Normalized_IBS",
  maxAlleles = 3,
  algorithmVariation = "Observed_Allele_Freq"
)
tasKinNorm

### Working with `TasselDistanceMatrix` objects

`tasDist()` and `kinshipMatrix()` generate a pairwise matrix (e.g. $m \times m$ 
dimensions). The return object is an rTASSEL class, `TasselDistanceMatrix`.
To look at an example of a `TasselDistanceMatrix` object run either of the examples created above:

In [None]:
tasKin #displays the first four rows and columns and the last row and column

This object, similar to the `TasselGenotypePhenotype` class, essentially holds
pointers to the Java/TASSEL object in memory. Some base R methods can be used with `TasselGenotypePhenotype` similar to `matrix` objects. For example:

In [None]:
tasKin |> colnames() |> head()
tasKin |> rownames() |> head()

tasKin |> dim()

tasKin |> nrow()
tasKin |> ncol()

A `TasselDistanceMatrix` object from kinshipMatrix() can be used in subsequent analyses in rTASSEL, for example: 

In [None]:
# Calculate a mixed linear model (MLM) with a kinship
#tasMLM <- rTASSEL::assocModelFitter(
#    tasObj = tasGeno,
#    formula = yourTrait ~ .,
#    fitMarkers = TRUE,
#    kinship = tasKin
#)

You can read more about association analysis in rTASSEL [here](https://rtassel.maizegenetics.net/articles/rtassel_walkthrough.html).

### Coerce a `TasselDistanceMatrix` into an R object

Additional R methods (e.g. plotting, new models, etc.) and be used if the `TasselDistanceMatrix` object is coerced into a general R data object, in this case, a `matrix`
object using the base method `as.matrix()`:

In [None]:
tasKinR <- tasKin |> as.matrix()

We can inspect the first 5 rows and columns of our new R matrix object:

In [None]:
tasKinR[1:5, 1:5]

Now the kinship matrix created very quickly in rTASSEL can be used in any other R package, for example running genomic prediction in the R package *sommer*.

You can also write the `matrix` object to a file to save:

In [None]:
write.table(tasKinR, file = "tasKin.txt")

## References and additional resources

To cite rTASSEL, please use the following citation:

Monier et al., (2022). rTASSEL: An R interface to TASSEL for analyzing genomic diversity. Journal of Open Source Software, 7(76), 4530, https://doi.org/10.21105/joss.04530.

You can find more information about rTASSEL [here](https://rtassel.maizegenetics.net)

and an rTASSEL tutorial in binder [here](https://mybinder.org/v2/gh/btmonier/rTASSEL_sandbox/HEAD?labpath=getting_started.ipynb).