09-B-networks.Rmd

---
layout: topic
title: "Network analysis"
author: "Beth & Nicole"
output: html_document
---

**Assigned Reading:**

> Dormann et al. 2009 [DOI: 10.2174/1874213000902010007](http://dx.doi.org/10.2174/1874213000902010007)

**Supplemental Reading:**

> Montoya et al. 2006 [DOI: 10.1038/nature04927](http://dx.doi.org/10.1038/nature04927)


```{r include = FALSE}
# This code block sets up the r session when the page is rendered to html
# include = FALSE means that it will not be included in the html document

# Write every code block to the html document 
knitr::opts_chunk$set(echo = TRUE)

# Write the results of every code block to the html document 
knitr::opts_chunk$set(eval = TRUE)

# Define the directory where images generated by knit will be saved
knitr::opts_chunk$set(fig.path = "images/09-B/")

# Set the web address where R will look for files from this repository
# Do not change this address
repo_url <- "https://raw.githubusercontent.com/fukamilab/BIO202/master/"
```

### Key Points

Anything that can be represented as an interaction matrix can be represented as a network (e.g., food webs, plant-pollinator networks, plant-fungal networks, microbe co-occurrence networks, facultative networks, host-parasite networks, etc.)

Networks are interesting because they are a way to organize and quantify community structure beyond simple species identities and numbers. Network structure is purportedly related to community robustness (the ability of the community to resist change after a disturbance), resilience (the ability of the community to recover after a disturbance), evolutionary history, ecosystem service provision, and biodiversity maintenance.

Columns and rows of interaction matrices represent interacting species (columns as pollinators and rows as plants, for example), and cell values represent either the presence/absence of an interaction between two species (binary networks) or the frequency of interactions between two species (weighted networks)

* Binary and weighted interaction networks are analyzed differently


Networks are often unipartite (all species in the same guild), bipartite (two guilds), or tripartite (three guilds). We focus today on **bipartite networks**.

There are many R packages to analyze and visualize networks including: 

*	`vegan` (not strictly a network package, but required by `bipartite`)
*	`bipartite`
*	`igraph`
*	`sna`
*	`d3network`
*	`graphviz` (java)
*	`qgraph`

For the example analysis we use `vegan` and `bipartite`.

Network metrics are used to quantify network structural properties (for example, a very simple one would be the average number of interactions per species). What network metric you pick depends on what question you’re trying to answer. 

Some common *network* metrics that appear often in the literature are: 

*	**Connectance**: the number of observed interactions divided by the number of possible interactions
    +	has been proposed that connectance is a measure of complexity
    +	indices: ‘connectance’, ‘weighted connectance’

*	**Nestedness**: a network in which the species with many interactions interact with the species with few interactions (and vice versa) is a nested network
    +	common in mutualistic networks
    + indices: ‘nestedness’, ‘weighted nestedness’, 'NODF’, 'weighted NODF’
  
*	**Modularity**: a network in which the species form discrete interaction compartments, with little interactions between species outside of each compartment, is a modular network
    +	common in antagonistic networks
    +	indices: 'computeModules',  'modularity' (in `igraph`)
  
*	**Specialization** – two potential meanings
    +	specialized networks = species have few interaction partners 
        +	indices: 'links per species'
    +	specialized networks = networks where species are specific in their interaction partner choice (not opportunistic)
        +	indices: '$H_2$' 

Some common *species level* (node level) metrics that appear often in the literature are: 

*	**Species degree**: number of interaction partners
    +	indices: 'species degree'
  
*	**Centrality** - 2 kinds
    +	**Closeness centrality**: the sum of the number of shortest distances (i.e. number of interactions, also known as 'path lengths') between the species in question and all other species in the network
        +	indices: 'closeness_w'
    +	**Betweenness centrality**: the sum of the number of pairs of species whose shortest path lengths are connected through the species in question
        +	indices: 'betweenness_w'
*	**Specialization** (specificity): species level metric similar to '$H_2$'
    +	the deviation of observed interactions from opportunistic, abundance-based interactions 
    +	takes into account interaction partner diversity, relative abundance of other possible interaction partners, and reciprocal specialization between interaction partners
    +	Index: 'd' 

Use null models to determine if observed network structure is significantly different from random network structure.

*	Use the z-score $(observed - mean(null)/sd(null))$ to calculate the statistical significance of observed network structure (more or less than two standard deviations away from the mean $\rightarrow p < 0.05$) 

*	Null models are common in community ecology in general (e.g., niche overlap, species-area relationship, species co-occurrence, phylogenetic diversity, etc.) so this null model approach is useful for ecologists in general

A common binary null model: 

* 'shuffle.web'
    + shuffles cells around the interaction matrix, keeps nothing constant

Common weighted null models:

*	'r2dtable' 
    + changes cell values while keeping row and column sums constant

*	'vaznull' 
    + changes cell values while keeping connectance (number of links in the matrix) constant

*	'swap.web' 
    + changes cell values while keeping both connectance and row/column sums constant 

Choose null model wisely - how restrictive do you want to be?

Also use null models if you want to compare structural properties between networks.

*	All network metrics are affected by network size (number of rows and columns) and shape (ratio of rows:columns) so it is not really possible to reliably compare network metrics between networks of different sizes and shapes

*	Common solution is to calculate the z-score to standardize metrics first, then compare the z-scores of a metric between networks

That's networks in a nutshell! Proceed at your own risk… 

<br>

### Analysis Example

In this analysis example, we're going to use the `bipartite` R package to visualize networks and to generate null models for significance of different network patterns. We will be using datasets that are already included in the package. These are plant-pollinator interaction matrices (pollinators species as columns and plant species as rows). 

Each dataset, or interaction matrix, is a summary of observed interactions between a pollinator species and a particular plant species in a specific country (or site). The number of these pairwise interactions is given by the value in each matrix cell. Each dataset was obtained in a different country, so the dataset/matrix and the network constructed from that will be named after its country of origin (or region).

```{r include = TRUE}

# Clear environment
rm(list=ls())

# Load required packages for bipartite
library(permute)
library(lattice)
library(vegan)
library(statnet.common)
library(network)
library(sna)


# Load main package
library(bipartite)

# Read all the interaction network matrices into a list
webs <- list(Safariland, barrett1987, elberling1999, 
             memmott1999, motten1982, olesen2002aigrettes)

# Re-name the datasets according to the sites for each plant-pollinator network
webs.names <- c("Argentina", "Canada", "Sweden", "UK", "USA", "Azores") 
names(webs) <- webs.names

# View data (interaction matrix, for generating a network, or a web)
lapply(webs, head, n = 2L) # Only display the first two rows in the dataset

```

As you can see, these are all quantitative or *weighted networks* (cells contain values ranging from $0$ to any whole number $n$, as supposed to just $0$ and $1$, as in binary networks). 

<br>

####Visualizing networks

We can visualize these networks using the `visweb()` and `plotweb()` functions in the `bipartite` R package. For the viswebs, the plants are in rows and the pollinators are in columns. For the plotwebs, the plant species are colored in green and the pollinators are in blue.

```{r include = TRUE}

# Visualize the observed networks from the datasets

visweb(webs$Argentina)
plotweb(webs$Argentina, text.rot=90, col.low = "green", col.high = "blue")

visweb(webs$Canada)
plotweb(webs$Canada, text.rot=90, col.low = "green", col.high = "blue")

visweb(webs$Sweden)
plotweb(webs$Sweden, text.rot=90, col.low = "green", col.high = "blue")

visweb(webs$UK)
plotweb(webs$UK, text.rot=90, col.low = "green", col.high = "blue")

visweb(webs$USA)
plotweb(webs$USA, text.rot=90, col.low = "green", col.high = "blue")

visweb(webs$Azores)
plotweb(webs$Azores, text.rot=90, col.low = "green", col.high = "blue")

```

All networks look different, since they are essentially very different networks (different species and different ecosystems) even though they are all plant-pollinator networks. 

<br>

####Network structure analysis

Next, we're going to calculate some network metrics, or indices. We're going to look at 'nestedness' and 'links per species' at the network level. You can also repeat the same analyses for indices at the species level (e.g., degree, which could be thought of as the equivalent of 'links per species' at a different level). We'll be using the function `networklevel()` (and `specieslevel()` can be used for species level analyses).

```{r include = TRUE}

# Calculate network metric nestedness for all plant-pollinator sites
net.metrics.nest <- lapply(webs, networklevel, index = 'nestedness') 

# Calculate network metric links per species for all plant-pollinator sites
net.metrics.links <- lapply(webs, networklevel, index = 'links per species') 

```

Then we will create null models, which are new networks that have been randomized to a certain extent (based on the original observed networks) to remove any possible patterns. We will use three different null model types that work for weighted networks: 

1. r2dtable
2. vaznull
3. swap.web

and see how the significance varies depending on which null model type we use as a fun exercise. We use the function `nullmodel()` to create our nulls (we set the number of nulls to 500) and we can use the `method` argument to specify the null model type.

```{r include = TRUE}

# Time consuming step!

# Load environment (already saved objects)
#load("data/network_analysis_example.RData")

# Make null models for all sites using the r2dtable null
net.nulls.r2d <- lapply(webs, nullmodel, method = "r2dtable", N = 500) 

# Make null models for all sites using the vaznull null
net.nulls.vaz <- lapply(webs, nullmodel, method = "vaznull", N = 500) 

# Make null models for all sites using the swap.web null
net.nulls.swap <- lapply(webs, nullmodel, method = "swap.web", N = 500)

# Save null objects
#save(net.nulls.r2d, net.nulls.vaz, net.nulls.swap, file = "data/network_analysis_example.RData")


```

Then, we need to calculate the same indices for the different nulls (as we already did with the observed networks: 'nestedness' and 'links per species') created by each null model type. Since there will be a lot of repetition, we start by defining a function for each index where we can specify a null model type for generating the null distribution. 

```{r include = TRUE}

# Null distribution function for nestedness - calculates the network nestedness for each null (using a particular null method) for each site 
net.null.nest = function(nulls){
  net.null.metric <- list()
  for (i in 1:length(nulls)) {
    net.null.metric[[i]] = do.call('rbind', 
                                 lapply(nulls[[i]], networklevel, index = 'nestedness'))
    }
  names(net.null.metric) <- webs.names
  return(net.null.metric)
}

# Null distribution function for links per species - calculates the network links per species metric for each null (using a particular null method) for each site 
net.null.links = function(nulls){
  net.null.metric <- list()
  for (i in 1:length(nulls)) {
    net.null.metric[[i]] = do.call('rbind', 
                                 lapply(nulls[[i]], networklevel, index = 'links per species'))
    }
  names(net.null.metric) <- webs.names
  return(net.null.metric)
}

```


```{r include = TRUE}

# Time consuming step!

# Load environment (already saved objects)
#load("data/network_analysis_example.RData")

r2d.nest <- net.null.nest(net.nulls.r2d)
vaz.nest <- net.null.nest(net.nulls.vaz)
swap.nest <- net.null.nest(net.nulls.swap)

r2d.links <- net.null.links(net.nulls.r2d)
vaz.links <- net.null.links(net.nulls.vaz)
swap.links <- net.null.links(net.nulls.swap)

# Save environment
#save.image(file="data/network_analysis_example.RData")


```

Next, we will define a function for calculating the z-score when comparing the observed network with the null networks. 

```{r include = TRUE}

# Z-score function for comparing different networks
net.zscore = function(obsval, nullval) {
  (obsval - mean(nullval))/sd(nullval)  
} 

```

Then, we apply the `net.zscore()` function to obtain our z-scores for each network site. 

```{r include = TRUE}

# Function that perform z-score calculation of nestedness using the observed and null networks
nest.zscore = function(nulltype){
  net.nest.zscore <- list() 
  for(i in 1:length(net.metrics.nest)){
    net.nest.zscore[[i]] = net.zscore(net.metrics.nest[[i]]['nestedness'], 
                                      nulltype[[i]][ ,'nestedness'])
    }
  names(net.nest.zscore) <- webs.names
  return(net.nest.zscore)
}

# Function that perform z-score calculation of links per species using the observed and null networks
links.zscore = function(nulltype){
  net.links.zscore <- list() 
  for(i in 1:length(net.metrics.links)){
    net.links.zscore[[i]] = net.zscore(net.metrics.links[[i]]['links per species'], 
                                      nulltype[[i]][ ,'links per species'])
    }
  names(net.links.zscore) <- webs.names
  return(net.links.zscore)
}

```


```{r include = TRUE}

r2d.nest.zscore <- nest.zscore(r2d.nest)
vaz.nest.zscore <- nest.zscore(vaz.nest)
swap.nest.zscore <- nest.zscore(swap.nest)

r2d.links.zscore <- links.zscore(r2d.links)
vaz.links.zscore <- links.zscore(vaz.links)
swap.links.zscore <- links.zscore(swap.links)

```

Lastly, we calculate the two-sided p-value for significance of the network property for nestedness and links per species. We start by defining a function below and then apply it for the various null models and metrics.

```{r include = TRUE}

# Function that adds p-values according to the obtained z-scores
add.pvalues = function(net.metric.zscore){
  # Change the output class from list of a list into a matrix
  net.metric.zscore <- do.call('rbind', net.metric.zscore) 
  
  # Convert z-scores to p-values (two-sided)
  net.metric.pvalue <- 2*pnorm(-abs(net.metric.zscore))
  
  # Change matrix into a dataframe
  net.metric.pvalue <- as.data.frame(as.table(net.metric.pvalue))
  colnames(net.metric.pvalue) <- c('site', 'metric', 'pvalue')
  
  net.metric.pvalue <- within(net.metric.pvalue, {
    significance <- ifelse(pvalue <= 0.001, "***", 
                         ifelse(pvalue <= 0.01, "**",
                         ifelse(pvalue <= 0.05, "*", "not significant")))
    })
  return(net.metric.pvalue)
} 
```


```{r include = TRUE}

# Add the p-values to our nestedness results
r2d.test.nest <- add.pvalues(r2d.nest.zscore)
vaz.test.nest <- add.pvalues(vaz.nest.zscore)
swap.test.nest <- add.pvalues(swap.nest.zscore)

# Add the p-values to our links per species results
r2d.test.links <- add.pvalues(r2d.links.zscore)
vaz.test.links <- add.pvalues(vaz.links.zscore)
swap.test.links <- add.pvalues(swap.links.zscore)

# Print the nestedness results
print(r2d.test.nest)
print(vaz.test.nest)
print(swap.test.nest)

# Print the links per species results
print(r2d.test.links)
print(vaz.test.links)
print(swap.test.links)

# Save environment
#save.image(file="data/network_analysis_example.RData")
```


<br>

### Discussion Questions

* Based on the final output above, how do the different null methods compare for 
    + nestedness?
    + links per species?
  
* Which null method is more conservative?

* How does nestedness compare between the different sites (i.e., the networks constructed for each geographic region)?

* Why does the 'r2dtable' null method work for 'links per species' but not 'vaznull' and 'swap.web' (i.e., why do we obtain `NaN` values for these null methods)?

* Do you think bipartite networks are useful for your work?