# Tutorial: Load local phenotype and genotype data into rTASSEL 

## Enter your notebook title here

**Objective**: Loading phenotype and genotype data into rTASSEL  
**Data**: Describe your data set here  
**User and contact**: Enter your name and contact here

### Table of contents
* [Notes](#Notes) 
* [Libraries](#Libraries)
* [Parameters and functions](#Parameters-and-functions)
* [Data](#Data)
    + [Load and inspect data in R](#Load-and-inspect-data-in-R)
    + [Load phenotype data into rTASSEL](#Load-phenotype-data-into-rTASSEL)
    + [Load genotype data into rTASSEL](#Load-genotype-data-into-rTASSEL)
    + [Combine data into rTASSEL](#Combine-data-into-rTASSEL)
* [References and additional resources](#References-and-additional-resources)

## Notes

If you are unfamiliar with the data format requirements for TASSEL and rTASSEL please review the following resources:
- [TASSEL user manual](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Load/Load)
- [Instructional video](https://www.youtube.com/watch?v=4W3Ohw6Zckg)

**rTASSEL genotype file requirements**:  

rTASSEL accepts the following genotype formats:
- hapmap (HMP)
- HDF5 (hierarchical data format version 5)
- VCF (variant call format)
- Plink

For more detail see the manual [here](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Load/Load). 

**rTASSEL phenotype requirements**: 

In summary, rTASSEL accepts data structured per the following: 
- A header that defines data structure and a body containing the main data. Tabs should be used as delimiters.
- Data is organized as a two-dimensional table with observations as rows and attributes as columns. The first attribute (column) should always be taxa. Subsequent columns can be data, covariate, or factor. Attributes of type "data" are modeled as dependent variables and must be numerical and continuous.

|`<Phenotype>`| | | |
|:---:|:---:|:---:|:---:|
|taxa|factor|data|covariate|
|Taxa|rep|EarHT|Q1|
|33-16|1|64.75|0.014|

For more detail see the [manual](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Load/Load) under "Numerical Data."

In [None]:
getwd()

In [None]:
Sys.Date()

## Libraries

In [None]:
library(data.table) #Efficient I/O handling for delimited data
library(rTASSEL) #R interface to TASSEL

## Parameters and functions

**Please edit the paths to your own data:**

In [None]:
# Path to phenotype data
myPhenoPath <- "/path/to/phenotype/data"

# Path to genotype data
myGenoPath <- "/path/to/genotype/data"

## Data

### Load and inspect data in R

Inspect phenotype data

In [None]:
myPhenoTable <- data.table::fread(myPhenoPath, skip = 2)

In [None]:
myPhenoTable |> head()

In [None]:
myPhenoTable |> dim()

In [None]:
myPhenoTable |> names() |> cat(sep = "\n")

Inspect genotype data

In [None]:
myGenoTable <- data.table::fread(myGenoPath)

In [None]:
myGenoTable |> head()

In [None]:
myGenoTable |> dim()

In [None]:
myGenoTable |> names() |> cat(sep = "\n") 

### Load phenotype data into rTASSEL

In [None]:
tasPheno <- rTASSEL::readPhenotypeFromPath(
    path = myPhenoPath
)

Note:
If you need to manipulate your phenotype data in R prior to loading into rTASSEL you can load your data from an R dataframe rather than from a path.  
The code below provides an example of how to do this. **You will need to replace "myPhenoDataframe" with your dataframe in the code below and set your taxaID and attributeTypes.**

In [None]:
#tasPheno <- rTASSEL::readPhenotypeFromDataFrame(
#    phenotypeDF = myPhenoDataframe,
#    taxaID = "Taxon",
#    attributeTypes = NULL
#)
#tasPheno

In [None]:
tasPheno

### Load genotype data into rTASSEL

In [None]:
tasGeno <- rTASSEL::readGenotypeTableFromPath(
    path = myGenoPath
)
tasGeno

### Combine data into rTASSEL

In [None]:
tasGenoPheno <- rTASSEL::readGenotypePhenotype(
    genoPathOrObj = myGenoPath,
    phenoPathDFOrObj = myPhenoPath
)

tasGenoPheno

## References and additional resources

To cite rTASSEL, please use the following citation:

Monier et al., (2022). rTASSEL: An R interface to TASSEL for analyzing genomic diversity. Journal of Open Source Software, 7(76), 4530, https://doi.org/10.21105/joss.04530.

You can find more information about rTASSEL [here](https://rtassel.maizegenetics.net)

and an rTASSEL tutorial in binder [here](https://mybinder.org/v2/gh/btmonier/rTASSEL_sandbox/HEAD?labpath=getting_started.ipynb).