## Introduction to rTASSEL

### Overview
Thanks for checking out `rTASSEL`! In this notebook, we will go over the functionalities used to work with the TASSEL software via R.

TASSEL is a software package used to evaluate traits associations, evolutionary patterns, and linkage disequilibrium. Strengths of this software include:

1. The opportunity for a number of new and powerful statistical approaches to association mapping such as a General Linear Model (GLM) and Mixed Linear Model (MLM). MLM is an implementation of the technique which our lab's published Nature Genetics paper - Unified Mixed-Model Method for Association Mapping - which reduces Type I error in association mapping with complex pedigrees, families, founding effects and population structure.

2. An ability to handle a wide range of indels (insertion & deletions). Most software ignore this type of polymorphism; however, in some species (like maize), this is the most common type of polymorphism.


### Motivation
The main goal of developing this package is to construct an R-based front-end to connect to a variety of highly used TASSEL methods and analytical tools. By using R as a front-end, we aim to utilize a unified scripting workflow that exploits the analytical prowess of TASSEL in conjunction with R's popular data handling and parsing capabilities without ever having the user to switch between these two environments.

More information can be found on our [Bitbucket repository](https://bitbucket.org/bucklerlab/rtassel/src/master/).

In [1]:
# Introduction to rTASSEL ----
library(rTASSEL) 

"package 'rTASSEL' was built under R version 3.6.3"Welcome to rTASSEL (version 0.9.16)
 • Consider starting a TASSEL log file (see ?startLogger())



In [2]:
# Load hapmap data
genoPathHMP <- system.file(
    "extdata",
    "mdp_genotype.hmp.txt",
    package = "rTASSEL"
)
genoPathHMP

In [3]:
# Load in hapmap file
tasGenoHMP <- rTASSEL::readGenotypeTableFromPath(
    path = genoPathHMP
) 
tasGenoHMP

A TasselGenotypePhenotype Dataset
  Class.............. TasselGenotypePhenotype 
  Taxa............... 281 
  Positions.......... 3093 
  Taxa x Positions... 869133 
---
  Genotype Table..... [x]
  Phenotype Table.... [ ]

In [4]:
# Load in phenotype information
phenoPath  <- system.file("extdata", "mdp_traits.txt", package = "rTASSEL")
phenoPath

In [5]:
# Load into rTASSEL `TasselGenotypePhenotype` object
tasPheno <- rTASSEL::readPhenotypeFromPath(
    path = phenoPath
) 
tasPheno

A TasselGenotypePhenotype Dataset
  Class.............. TasselGenotypePhenotype 
  Taxa............... 301 
  Positions.......... NA 
  Taxa x Positions... NA 
---
  Genotype Table..... [ ]
  Phenotype Table.... [x]
---
  Traits: Taxa EarHT dpoll EarDia 

In [6]:
# Read genotype and phenotype information simultaneously
tasGenoPheno <- rTASSEL::readGenotypePhenotype(
    genoPathOrObj = tasGenoHMP,
    phenoPathDFOrObj = tasPheno
)
tasGenoPheno

A TasselGenotypePhenotype Dataset
  Class.............. TasselGenotypePhenotype 
  Taxa............... 279 
  Positions.......... 3093 
  Taxa x Positions... 862947 
---
  Genotype Table..... [x]
  Phenotype Table.... [x]
---
  Traits: Taxa EarHT dpoll EarDia 

In [7]:
# Run association - GLM
# Calculate GLM
tasGLM <- rTASSEL::assocModelFitter(
    tasObj = tasGenoPheno,             # <- our prior TASSEL object
    formula = list(EarHT, dpoll) ~ .,  # <- only EarHT and dpoll are ran
    fitMarkers = TRUE,                 # <- set this to TRUE for GLM
    kinship = NULL,
    fastAssociation = FALSE
)

Running all non <data> traits and/or <taxa>...
Association Analysis : GLM


In [8]:
# Inspect data
tasGLM$GLM_Stats[, 1:6]

DataFrame with 473 rows and 6 columns
          Trait      Marker         Chr       Pos         marker_F
    <character> <character> <character> <integer>        <numeric>
1         EarHT  PZA00447.8           1   9024005 13.6180300229943
2         EarHT  PZB00718.1           1  17601375 7.78128017162806
3         EarHT  PZD00098.1           1  23267898 13.2838791337737
4         EarHT  PZA02921.4           1  25035053 13.1884715987084
5         EarHT PZA00654.12           1  32583282 12.8128934291243
...         ...         ...         ...       ...              ...
469       dpoll  PZA03711.2          10 121491031 18.0228509985239
470       dpoll  PZA03711.3          10 121491202 12.7772833682735
471       dpoll  PZA03710.4          10 121491895 11.7034257765584
472       dpoll  PZA03229.1          10 139507065 8.05478095341354
473       dpoll  PZA03267.4          10 139717741 7.16481998885837
                       p
               <numeric>
1   0.000274961381885706
2   0.0005240455