/
README.Rmd
172 lines (141 loc) · 5.88 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
output: github_document
references:
- id: Touloumis2016
title: "HDTD: Analyzing Multi-tissue Gene Expression Data"
author:
- family: Touloumis
given: Anestis
- family: Marioni
given: John C.
- family: Tavaré
given: Simon
container-title: Bioinformatics
volume: 32
URL: 'https://doi.org/10.1093/bioinformatics/btw224'
issue: 14
page: 2193--2195
type: article-journal
issued:
year: 2016
- id: Touloumis2015
title: "Testing the Mean Matrix in High-Dimensional Transposable Data"
author:
- family: Touloumis
given: Anestis
- family: Tavaré
given: Simon
- family: Marioni
given: John C.
container-title: Biometrics
volume: 71
URL: 'http://onlinelibrary.wiley.com/doi/10.1111/biom.12054/full'
issue: 1
page: 157--166
type: article-journal
issued:
year: 2015
- id: Touloumis2013
title: "Hypothesis Testing for the Covariance Matrix in High-Dimensional Transposable Data with Kronecker Product Dependence Structure"
author:
- family: Touloumis
given: Anestis
- family: Marioni
given: John C.
- family: Tavaré
given: Simon
container-title: Statistica Sinica
volume: 31
URL: 'http://www3.stat.sinica.edu.tw/statistica/J31N3/J31N309/J31N309.html'
issue: 3
page: 1309--1329
type: article-journal
issued:
year: 2021
csl: Biometrics.csl
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
tidy = TRUE,
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# HDTD: Analyzing High-Dimensional Transposable Data
[![Travis-CI Build Status](https://travis-ci.org/AnestisTouloumis/HDTD.svg?branch=master)](https://travis-ci.org/AnestisTouloumis/HDTD)
[![Project Status: Active The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
## Installation
You can install the release version of `HDTD`:
```{r eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("HDTD")
```
The source code for the release version of `HDTD` is available on Bioconductor at:
- http://bioconductor.org/packages/release/bioc/html/HDTD.html
Or you can install the development version of `HDTD`:
```{r eval=FALSE}
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("AnestisTouloumis/HDTD")
```
To use `HDTD`, you should load the package as follows:
```{r}
library("HDTD")
```
## Usage
This package offers functions to estimate and test the matrix parameters of transposable data in high-dimensional settings. The term _transposable data_ refers to datasets that are structured in a matrix form such that both the rows and columns correspond to variables of interest and dependencies are expected to occur _among_ rows, _among_ columns and _between_ rows and columns. For example, consider microarray studies in genetics where multiple RNA samples across different tissues are available per subject. In this case, a data matrix can be created with row variables the genes, column variables the tissues and measurements the corresponding expression levels. We expect dependencies to occur among genes, among tissues and between genes and tissues. For more examples of transposable data see references in @Touloumis2013, @Touloumis2015 and @Touloumis2016.
There are four core functions:
- `meanmat.hat` to estimate the mean matrix of the transposable data,
- `meanmat.ts` to test the overall mean of the row (column) variables across groups of column (row) variables,
- `covmat.hat` to estimate the row and column covariance matrix,
- `covmat.ts` to test the sphericity, identity and diagonality hypothesis test for the row/column covariance matrix.
There are also three utility functions:
- `transposedata` for interchanging the role of rows and columns,
- `centerdata` for centering the transposable data around their mean matrix,
- `orderdata` for rearranging the order of the row and/or column variables.
## Example
We replicate the analysis that can be found in the vignette based on the mouse dataset
```{r}
data(VEGFmouse)
```
This dataset contains expression levels for $40$ mice. For each mouse, the expression levels of $46$ genes (rows) that belong to the vascular endothelial growth factor signalling pathway were measured across $9$ tissues (adrenal gland, cerebrum, hippocampus, kidney, lung, muscle, spinal cord, spleen and thymus) that are displayed in the columns.
One can estimate the mean relationship of the gene expression levels across the $9$ tissues
```{r}
sample_mean <- meanmat.hat(datamat = VEGFmouse,N=40)
sample_mean
```
and test whether the overall gene expression is constant across the $9$ tissues:
```{r}
tissue_mean_test <- meanmat.ts(datamat = VEGFmouse,N=40,group.sizes=9)
tissue_mean_test
```
In this case, the overall gene expression is not conserved.
To analyze the gene-wise and tissue-wise dependence structure, one needs to estimate the two covariance matrices:
```{r}
est_cov_mat <- covmat.hat(datamat=VEGFmouse,N=40)
est_cov_mat
```
Finally, the package allows users to perform hypothesis tests for the covariance matrix of the genes
```{r}
genes_cov_test <- covmat.ts(VEGFmouse,N=40)
genes_cov_test
```
and of the tissues:
```{r}
tissues_cov_test <- covmat.ts(VEGFmouse,N=40,voi="columns")
tissues_cov_test
```
At a $5\%$ significance level, it appears that the genes are correlated but we do not have enough evidence to reject the hypothesis that the tissues are uncorrelated.
## Getting help
The statistical methods implemented in `HDTD` are described in @Touloumis2013, @Touloumis2015 and @Touloumis2016. Detailed examples of `HDTD` can be found in @Touloumis2016 or in the vignette:
```{r, eval=FALSE}
browseVignettes("HDTD")
```
## How to cite
```{r, echo=FALSE, comment=""}
print(citation("HDTD"), bibtex = TRUE)
```
# References