# Tutorial: Use BrAPI to load phenotype and genotype data into rTASSEL 

## Enter your notebook title here

**Objective**: Enter your objective here  
**Data**: Describe your data set here  
**User and contact**: Enter your name and contact here

### Table of contents
* [Notes](#Notes) 
* [Libraries](#Libraries)
* [Phenotype data](#Phenotype-data)
    + [Retrieve BrAPI data](#Retrieve-BrAPI-data)
    + [Inspect phenotype data in R](#Inspect-phenotype-data-in-R)
    + [Load phenotype data into rTASSEL](#Load-phenotype-data-into-rTASSEL)
* [Genotype data](#Genotype-data)
    + [Retrieve BrAPI data](#Retrieve-BrAPI-data)
    + [Inspect genotype data into R](#Inspect-genotype-data-into-R)
    + [Load genotype data into rTASSEL](#Load-genotype-data-into-rTASSEL)
* [Combine phenotype and genotype data into rTASSEL](#Combine-phenotype-and-genotype-data-into-rTASSEL)
* [References and additional resources](#References-and-additional-resources)

## Notes

If you are unfamiliar with the data format requirements for TASSEL and rTASSEL please review the following resources:
- [TASSEL user manual](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Load/Load)
- [Instructional video](https://www.youtube.com/watch?v=4W3Ohw6Zckg)

**rTASSEL genotype file requirements**:  

rTASSEL accepts the following genotype formats:
- hapmap (HMP)
- HDF5 (hierarchical data format version 5)
- VCF (variant call format)
- Plink

For more detail see the manual [here](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Load/Load). 

**rTASSEL phenotype requirements**: 

In summary, rTASSEL accepts data structured per the following: 
- A header that defines data structure and a body containing the main data. Tabs should be used as delimiters.
- Data is organized as a two-dimensional table with observations as rows and attributes as columns. The first attribute (column) should always be taxa. Subsequent columns can be data, covariate, or factor. Attributes of type "data" are modeled as dependent variables and must be numerical and continuous.

|`<Phenotype>`| | | |
|:---:|:---:|:---:|:---:|
|taxa|factor|data|covariate|
|Taxa|rep|EarHT|Q1|
|33-16|1|64.75|0.014|
    
For more detail see the [manual](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Load/Load) under "Numerical Data."

In [None]:
getwd()

In [None]:
Sys.Date()

## Libraries

In [None]:
library(data.table) #Efficient I/O handling for delimited data
library(rTASSEL) #R interface to TASSEL
library(QBMS) #Retrieve data from BrAPI databases
library(tidyverse) #Data wrangling

## Phenotype data

**You will need to log into BMS using the BrAPI helper.**

### Retrieve BrAPI data

In [None]:
pheno_provider$list_crops()

**Please edit the code to set your crop:**

In [None]:
pheno_provider$set_crop("myCrop")

In [None]:
pheno_provider$list_programs()

**Please edit the code to set your program:**

In [None]:
pheno_provider$set_program("myProgram")

In [None]:
pheno_provider$list_trials()

**Please edit the code to set your trial:**

In [None]:
pheno_provider$set_trial("myTrial")

In [None]:
pheno_provider$list_studies()

**Please edit the code to set your study:**

In [None]:
pheno_provider$set_study(study_name = "myStudy")

In [None]:
get_study_info()

In [None]:
PhenoDataFromGigwa <- get_study_data()

### Inspect phenotype data in R

In [None]:
PhenoDataFromGigwa |> head()
PhenoDataFromGigwa |> dim()
PhenoDataFromGigwa |> names()

**You will need to manipulate your dataframe to assign the correct class to your data.**  

For example:  

- In the code below, a new dataframe is created from PhenoDataFromGigwa created above.
- Select() is used to include only the columns that will be used in the subsequent analysis in the new dataframe.
- Mutate() is used to modify the columns, applying as.numeric() to change character vectors 'chr' to numeric vectors 'dbl'.

**You will need to edit the example for your own data.**

In [None]:
#PhenoDataFromGigwa_modified <- PhenoDataFromGigwa |> select(germplasmName, trait1, trait2, trait3, trait4) |>
#    mutate(
#        trait1 = as.numeric(trait1),
#        trait2 = as.numeric(trait2),
#        trait3 = as.numeric(trait3),
#        trait4 = as.numeric(trait4)
#        )

In [None]:
PhenoDataFromGigwa_modified |> head()
PhenoDataFromGigwa_modified |> dim()
PhenoDataFromGigwa_modified |> names()

### Load phenotype data into rTASSEL

**In order to load phenotype data into rTASSEL you will need to modify your data to follow TASSEL formatting requirments discussed at the top of this notebook.**

For example:  

- In the code below, an rTASSEL object 'tasPheno' is created by passing the PhenoDataFromGigwa_modified dataframe through readPhenotypeFromDataFrame(). 
- taxaID is set to 'germplasmName' since our dataframe does not use the default 'Taxa'.
- attributeTypes is used to specify which type of data is being loaded for each non-taxa column. The first data column is set as 'data' and with the following 3 columns set to 'covariate'. Otherwise the default for attributeTypes is to set all non-taxa columns as 'data'.

**You will need to edit the example for your own data.**

In [None]:
#tasPheno <- rTASSEL::readPhenotypeFromDataFrame(
#   phenotypeDF = PhenoDataFromGigwa_modified,
#   taxaID = "germplasmName",
#   attributeTypes = c("data", rep("covariate", 3))
#)
#tasPheno

## Genotype data

**You will need to log into Gigwa using the BrAPI helper.**

### Retrieve BrAPI data

In [None]:
geno_provider$gigwa_list_dbs()

**Please edit the code to set your database (db):**

In [None]:
geno_provider$gigwa_set_db("myDataBase")

In [None]:
geno_provider$gigwa_list_projects()

**Please edit the code to set your project:**

In [None]:
geno_provider$gigwa_set_project("myProject")

In [None]:
genoDataFromGigwa <- geno_provider$gigwa_get_variants()

### Inspect genotype data into R

In [None]:
genoDataFromGigwa |> head()
genoDataFromGigwa |> dim()
genoDataFromGigwa |> names()

### Load genotype data into rTASSEL

In [None]:
tasGeno <- genoDataFromGigwa |> rTASSEL::readGenotypeTableFromGigwa()

In [None]:
tasGeno

## Combine phenotype and genotype data into rTASSEL

In [None]:
tasGenoPheno <- rTASSEL::readGenotypePhenotype(
    genoPathOrObj = tasGeno,
    phenoPathDFOrObj = tasPheno
)
tasGenoPheno

## References and additional resources

**To cite rTASSEL, please use the following citation:**

Monier et al., (2022). rTASSEL: An R interface to TASSEL for analyzing genomic diversity. Journal of Open Source Software, 7(76), 4530, https://doi.org/10.21105/joss.04530.

You can find more information about rTASSEL [here](https://rtassel.maizegenetics.net)

and an rTASSEL tutorial in binder [here](https://mybinder.org/v2/gh/btmonier/rTASSEL_sandbox/HEAD?labpath=getting_started.ipynb).

**Please also cite QBMS using the following citation:**

Al-Shamaa K (2023). QBMS: Query the Breeding Management System(s). R package version 0.9.1, https://icarda-git.github.io/QBMS/.