-
Notifications
You must be signed in to change notification settings - Fork 2
/
Phyloseq_script.Rmd
323 lines (276 loc) · 13.5 KB
/
Phyloseq_script.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
---
title: "Analyse des communautés fongiques / Fungal communities analysis"
author: "Alexis Carteron & Simon Morvan"
date: "`r Sys.time()`"
output:
html_document:
toc: true
---
<br>
#### Pourquoi Phyloseq?
* Pour des raisons pratiques: utilisation simple (une fois que l'objet S4 est assemblé), rapide pour faire graphiques et utiliser les fonctions
* Améliore le partage, le stockage et la reproducibilité
* Actuellement developpé et amélioré par des biostatisticiens (FAQ importante)
<br>
Mais pas le seul outils à faire cela (e.g.: [Rhea](https://lagkouvardos.github.io/Rhea/) pipeline)
<br>
<br>
#### <span style="color:darkblue">Why Phyloseq?</span>
* <span style="color:darkblue">For practical reasons: Easy to use at first (once you have built the S4 object), quick and nice functions and figures</span>
* <span style="color:darkblue">Improve sharin, storagre and reproducibility</span>
* <span style="color:darkblue">Currently being developed and improved by biostatistician (great FAQ)</span>
<br>
<span style="color:darkblue">Other great resources are available (e.g.: [Rhea](https://lagkouvardos.github.io/Rhea/) pipeline)</span>
<br>
<br>
### 1. Commençons! / <span style="color:darkblue"> Let's start! </span>
#### Charger les librairies / <span style="color:darkblue"> Load packages </span>
```{r, include = TRUE, message = FALSE}
library(phyloseq)
library(vegan)
library(ggplot2)
```
R version `r getRversion()`
Phyloseq version `r packageVersion('phyloseq')`
Linux machine image 4.4.0-64-generic
<br>
#### Importer les données / <span style="color:darkblue"> Data import </span>
```{r, include = TRUE}
## Sample data
soiltab <- read.table('data/soil.csv', sep = ';', row.names = 1, header = TRUE)
# Taxonomy table
load("data/taxotab.01.05.rdata")
# ASV table
load("data/seqtab.nochim.rdata")
```
<br>
#### Construire l'objet Phyloseq / <span style="color:darkblue"> Construct the Phyloseq object </span>
```{r, include = TRUE}
ps <- phyloseq(otu_table(seqtab.nochim, taxa_are_rows=FALSE),
sample_data(soiltab),
tax_table(as.matrix(taxo$tax)))
ps
```
<br>
#### Réordonner les facteurs / <span style="color:darkblue"> Reorder factors </span>
```{r, include = TRUE}
sample_data(ps)$horizon = factor(sample_data(ps)$horizon, levels = c('L', 'F', 'H', 'Ae', 'B'))
sample_data(ps)$forest = factor(sample_data(ps)$forest, levels = c('AS', 'Mixed', 'FG'))
```
<br>
#### Pour commencer directement avec l'objet Phyloseq / <span style="color:darkblue"> To start directly from Phyloseq object </span>
```{r, include = TRUE}
#load(file = "data/ps.rdata")
#ps
```
<br>
### 2. Filtrage et transformation / <span style="color:darkblue"> Filtering and transformation </span>
#### Subset of the dataset with only fungi
```{r, include = TRUE}
ps.fungi = subset_taxa(ps, Kingdom=="k__Fungi")
ntaxa(ps)
ntaxa(ps.fungi)
```
Filtered `r ntaxa(ps) - ntaxa(ps.fungi)` non-fungal taxa
<br>
<br>
#### Enlever les taxons avec seulement 1 ou 2 sequences au total / <span style="color:darkblue"> Remove singletons and doubletons </span>
```{r, include = TRUE}
ps.fungi.nosd = filter_taxa(ps.fungi, function(x) sum(x) > 2, TRUE)
ntaxa(ps.fungi.nosd)
```
<br>
#### Transformation log + 1 pour normaliser les données / <span style="color:darkblue"> Shifted log transformation to normalize </span>
```{r, include = TRUE}
ps.fungi.nosd.log <- transform_sample_counts(ps.fungi.nosd, function(x) log(x+1))
```
<br>
#### Agglomerer les taxons au niveau specifique / <span style="color:darkblue"> Agglomerate taxa at the species level </span>
```{r, include = TRUE, cache = TRUE}
ps.fungi.nosd.log.sp = tax_glom(ps.fungi.nosd.log, "Species", NArm = FALSE) # NArm = FALSE in order to keep ESV not taxonomically assigned at the species level
```
<br>
#### Exemple de transmformation / <span style="color:darkblue"> Example of other transformation </span>
```{r, include = TRUE}
ps.fungi.nosd.hel = transform_sample_counts(ps.fungi.nosd, function(x) sqrt(x/sum(x))) # Hellinger transformation
```
<br>
### 3. Exploration des données / <span style="color:darkblue"> Data exploration </span>
#### Details de l'objet Phyloseq / <span style="color:darkblue"> Details of the object Phyloseq </span>
```{r, include = TRUE}
ntaxa(ps.fungi.nosd.log)
nsamples(ps.fungi.nosd.log)
rank_names(ps.fungi.nosd.log)
sample_variables(ps.fungi.nosd.log)
otu_table(ps.fungi.nosd.log)[1:2, 1:2]
tax_table(ps.fungi.nosd.log)[1:2, 1:2]
```
<br>
>##### <span style="color:red">CHALLENGE</span>
Pouvez-vous trouvez le nombre de phylum de champignons?
<span style="color:darkblue"> Can you find the number of fungal phylum? </span>
<br>
#### Profondeur de sequencage après filtrage de la canalisation Dada2 / <span style="color:darkblue"> Sequencing depth per samples after filtering of Dada2 pipeline</span>
```{r, include = TRUE, cache = TRUE, fig.height = 5, fig.width = 9}
plot_bar(ps, x = "Sample", y = "Abundance", fill = "Kingdom") +
geom_bar(stat="identity", color = "black") +
labs(x = "Sample ID", y = "Reads abundance")
plot_bar(ps.fungi, x = "Sample", y = "Abundance", fill = "Kingdom") +
geom_bar(stat="identity", color = "black") +
labs(x = "Sample ID", y = "Reads abundance of fungi")
plot_bar(ps.fungi, x = "Sample", y = "Abundance", fill = "Phylum") +
geom_bar(stat="identity") +
labs(x = "Sample ID", y = "Reads abundance of fungi")
```
<br>
<br>
#### Schéma expérimental / <span style="color:darkblue"> Sampling design </span>
![](Other_materials/map_plot3.png)
**facet_wrap(~forest)**
```{r, include = TRUE, cache = TRUE, fig.height = 5, fig.width = 9}
plot_bar(ps.fungi, fill = "Phylum") +
geom_bar(stat="identity") +
facet_wrap(~forest, ncol = 3, scales = "free_x") +
labs(x = "Sample ID", y = "Reads abundance of fungi")
```
<br>
<br>
**facet_wrap(horizon~forest)**
```{r, include = TRUE, cache = TRUE, fig.height = 9, fig.width = 9}
plot_bar(ps.fungi, fill = "Phylum") +
geom_bar(stat="identity") +
facet_wrap(horizon~forest, ncol = 3, scales = "free_x") +
labs(x = "Sample ID", y = "Reads abundance of fungi")
```
### 3. Ordination non contrainte / <span style="color:darkblue"> Unconstrained ordination </span>
#### NMDS avec measure de dissimilarité de Bray-Curtis / <span style="color:darkblue"> NMDS with Bray-Curtis measure dissimilarity (aka Steinhaus) </span>
```{r, include = TRUE, cache = FALSE}
ord.nmds.bray <- ordinate(ps.fungi.nosd.log.sp, method="NMDS", k = 2, try = 20, distance = "bray") # /!\ number of tries need to be high enough to reach convergence
```
```{r, include = FALSE, cache = TRUE}
ord.nmds.bray <- ordinate(ps.fungi.nosd.log.sp, method="NMDS", k = 2, try = 1000, distance = "bray")
```
<br>
#### Figure rapide / <span style="color:darkblue"> Quick plot </span>
```{r, include = TRUE, cache = TRUE, fig.height = 6, fig.width = 8}
plot_ordination(ps.fungi.nosd.log.sp, ord.nmds.bray, type="samples", color = "horizon")
```
#### Figure detaillée / <span style="color:darkblue"> Full plot </span>
```{r, include = TRUE, cache = TRUE, fig.height = 10, fig.width = 15}
plot_ordination(ps.fungi.nosd.log.sp, ord.nmds.bray, type="samples", color = "horizon") +
geom_point(size = 6, alpha = .5, show.legend = TRUE) +
scale_color_manual(name = "Horizon", values = c('darkgreen', 'sienna4', 'grey1', 'grey60', 'darkorange2')) +
labs(x = "Axis 1", y = "Axis 2", caption = paste("Stress =", round(ord.nmds.bray$stress, 2))) +
facet_wrap(~forest) +
stat_ellipse(type = "t", level = 0.8) + # 80% confidence interval ellispes
coord_fixed() +
theme(text = element_text(size=20), strip.text = element_text(size=20))
```
#### NMDS avec les données binaire / <span style="color:darkblue"> NMDS with binary data </span>
```{r, include = TRUE, cache = TRUE}
ord.nmds.bin <- ordinate(ps.fungi.nosd.log.sp, method="NMDS", k = 2, try = 20, distance = "bray", binary =TRUE)
```
```{r, include = FALSE, cache = TRUE}
ord.nmds.bin <- ordinate(ps.fungi.nosd.log.sp, method="NMDS", k = 2, try = 1000, distance = "bray", binary =TRUE)
```
<br>
#### Figure de la NMDS avec données binaires / <span style="color:darkblue"> NMDS plot with binary data </span>
```{r, include = TRUE, cache = TRUE, fig.height = 10, fig.width = 15}
plot_ordination(ps.fungi.nosd.log.sp, ord.nmds.bin, type="samples", color = "horizon") +
geom_point(size = 6, alpha = .5, show.legend = TRUE) +
scale_color_manual(name = "Horizon", values = c('darkgreen', 'sienna4', 'grey1', 'grey60', 'darkorange2')) +
labs(x = "Axis 1", y = "Axis 2", caption = paste("Stress =", round(ord.nmds.bin$stress, 2))) +
facet_wrap(~forest) +
stat_ellipse(type = "t", level = 0.8) + # 80% confidence interval ellispes
coord_fixed() +
theme(text = element_text(size=20), strip.text = element_text(size=20))
```
>##### <span style="color:red">CHALLENGE</span>
Pouvez-vous faire une analyse en composantes principales?
<span style="color:darkblue"> Can you run a principal component analysis? </span>
<br>
### 4. Analyse de variance: les communautés sont-elles statistiquement differentes? / <span style="color:darkblue"> Variance analysis: Are communities statistically different? </span>
```{r, include = TRUE}
metadata <- as(sample_data(ps.fungi.nosd.log.sp), "data.frame") # Export data
```
<br>
#### PERMANOVA avec la measure de Bray-Curtis / <span style="color:darkblue"> PERMANOVA with Bray-Curtis measure </span>
```{r, include = TRUE, cache = TRUE}
adonis(phyloseq::distance(physeq = ps.fungi.nosd.log.sp, method="bray") ~ horizon * forest, strata = metadata$block,# Horizon and forest effect with interaction and block as strata
data = metadata,
permutations = 99999)
```
<br>
#### Test de l'homogénéité de dispersion / <span style="color:darkblue"> Test for the homogeneity of dispersion </span>
```{r, include = TRUE, cache = TRUE}
betadisp <- betadisper(phyloseq::distance(physeq = ps.fungi.nosd.log.sp, method="bray"), metadata$forest)
permutest(betadisp, permutations = 99999)
betadisp <- betadisper(phyloseq::distance(physeq = ps.fungi.nosd.log.sp, method="bray"), metadata$horizon)
permutest(betadisp, permutations = 99999)
```
<br>
#### PERMANOVA avec les données binaires / <span style="color:darkblue"> PERMANOVA with binary data </span>
```{r, include = TRUE, cache = TRUE}
adonis(phyloseq::distance(ps.fungi.nosd.log.sp, method="bray", binary = TRUE) ~ horizon * forest, strata = metadata$block, # Horizon and forest effect with interaction and block as strata
data = metadata,
permutations = 99999)
```
<br>
#### Test de l'homogénéité de dispersion / <span style="color:darkblue"> Test for the homogeneity of dispersion </span>
```{r, include = TRUE, cache = TRUE}
betadisp <- betadisper(phyloseq::distance(ps.fungi.nosd.log.sp, method="bray", binary = TRUE), metadata$forest)
permutest(betadisp, permutations = 99999)
```
<br>
```{r, include = TRUE, cache = TRUE}
betadisp <- betadisper(phyloseq::distance(ps.fungi.nosd.log.sp, method="bray", binary = TRUE), metadata$horizon)
permutest(betadisp, permutations = 99999)
```
<br>
### 5. Ordination sous contraites: analyse de redondance / <span style="color:darkblue"> Constrained ordination: Redundancy Analysis (RDA) </span>
```{r, include = TRUE, cache = TRUE}
ord.rda <- ordinate(ps.fungi.nosd.log.sp, formula = ps.fungi.nosd.log.sp ~ Carbon+Nitrogen+pH, method="CAP", distance = "bray") # Uses Capscale function of Vegan package
```
<br>
#### Obtenir les statistiques de la RDA / <span style="color:darkblue"> RDA stats </span>
```{r, include = TRUE, cache = TRUE}
RsquareAdj(ord.rda)
anova.cca(ord.rda)
```
<br>
#### Verifications des facteurs d'inflation de la variance / <span style="color:darkblue"> Check variance inflation factors (VIF) </span>
```{r, include = TRUE, cache = TRUE}
vif.cca(ord.rda) # VIF < 10 should be fine
```
<br>
>##### <span style="color:red">CHALLENGE</span>
Que feriez-vous pour exclure les données d'azote de la RDA?
<span style="color:darkblue"> What if you want to exclude Nitrogen data of the RDA? </span>
<br>
#### Figure rapide / <span style="color:darkblue"> Quick plot </span>
```{r, include = TRUE, cache = TRUE, fig.height = 6, fig.width = 8}
plot_ordination(ps.fungi.nosd.log.sp, vegan::scores(ord.rda, scaling=1), type="sites", color = "horizon")
```
#### Enregistrer les valeurs propres / <span style="color:darkblue"> Get eigenvector values </span>
```{r, include = TRUE, cache = TRUE}
evals <- ord.rda$CCA$eig/sum(ord.rda$CCA$eig)
```
<br>
#### Figure detaillée / <span style="color:darkblue"> Full plot </span>
```{r, include = TRUE, cache = TRUE}
arrowmat = vegan::scores(ord.rda, display = "bp", scaling = 1) # Add the environmental variables as arrows
arrowdf <- data.frame(arrowmat) # Add labels
rownames(arrowdf) <- c("Carbon","Nitrogen","pH") # Add rownames
arrowdf <- data.frame(labels = c("Carbon","Nitrogen","pH"), arrowmat) # Make a data.frame
```
```{r, include = TRUE, cache = TRUE, fig.height = 10, fig.width = 15}
plot_ordination(ps.fungi.nosd.log.sp, vegan::scores(ord.rda, scaling = 1), type="sites", color = "horizon") +
coord_fixed(sqrt(evals[2] / evals[1])) +
geom_point(size = 8, alpha = .5) +
scale_color_manual(name = "Horizon", values = c('darkgreen', 'sienna4', 'grey1', 'grey60', 'darkorange2')) +
geom_segment(aes(xend = CAP1, yend = CAP2, x = 0, y = 0), size = .7, data = arrowdf, color = "black", arrow = arrow(length = unit(0.05, "npc"))) +
geom_text(aes(x = CAP1*1.15, y = CAP2*1.15, color = NULL, label = labels), size = 6, data = arrowdf, show.legend = FALSE) +
facet_wrap(~forest) +
theme(text = element_text(size=18), strip.text = element_text(size=20)) +
labs(x = paste("Constrained axis 1 (",round(evals[1]*100, 0),"%)"), y = paste("Constrained axis 2 (",round(evals[2]*100, 0),"%)"))
```