# Statistical Inference in Networks

[Alex Hanna](http://alex-hanna.com), University of Toronto/Google

In this module, we're going to focus on statistical inference in networks, namely, on a type of modeling known as the **Exponential Random Graph Models**, or ERGMs. Our learning objective for this section is the following:

- Motivate the use of ERGMs
- Introduce the `statnet` and `network` packages
- Build an ERGM progressively, term-by-term

Unlike the previous modules in this workshop, I am going to rely on the Benjamin Lind's [excellent ERGM tutorial based on the Grey's Anatomy hookup network](http://badhessian.org/2012/09/lessons-on-exponential-random-graph-modeling-from-greys-anatomy-hook-ups/). After the introductories, we'll head over there and continue the tutorial. As such, there are no exercises like in the previous tutorials, but a future incarnation of this module may involve that.

I should add the caveat that I'm generally adverse to using sexualized datasets like this, since they typically have a very heteronormative and cisgender bend to them (only [queer datasets](http://badhessian.org/2013/03/lipsyncing-for-your-life-a-survival-analysis-of-rupauls-drag-race/), please). But this tutorial is one of the clearest I think I've seen on ERGMs and the data are pretty interesting, both structurally and in terms of the node attributes.

## Motivating ERGMs

So far in this workshop, we've mostly been *describing* the various contours of existing networks, either quantitatively or visually. So why would we want to use statistical inference to understand a network? Primarily, we want to try *to explain and predict tie formation*. We do this to understand the processes which made the network what it is. We predict tie formation based on two classes of variables: node attributes (e.g. gender, age, status) or some network structure (e.g. triangles, number of surrounding edges). 

Another cool thing about ERGMs is that we can use an ERGM as a generative model. That is, we can simulate new networks based on the estimated parameters from the model.

## Introduction `statnet` and `network`

To use ERGMs, we need to use some new underlying packages -- `statnet` and `network`. `statnet` is the underlying machinery for the ERGM. It implements the ERGM parameter estimation with Monte Carlo Markov Chains (MCMC). 

`network` is a network package created by the mathematical sociologist Carter T. Butts at the University of California-Irvine. `network` has its own descriptive and visualization capabilities. The syntax for accessing vertex and edge attributes is similar but different from `igraph`. The visualizations ultimately look more or less similar.

Let's start off by unloading the `igraph` package, then installing and loading the `ergm` package. Loading `ergm` will also load `statnet` and `network`.

In [None]:
## unload igraph to avoid any namespace collisions
detach("package:igraph")

## install and load ergm, which will load statnet and network
install.packages("ergm")
library(ergm)

## Loading Grey's Anatomy data

The links to the data are broken on the Bad Hessian site, so you can use the following lines to load them (if you downloaded the repository earlier, you'll have to download it again to get the new data files.

There are two files. The first is the network edges in the form of an adjacency matrix (also called a sociomatrix, if you're very old school). The second are attributes related to the nodes themselves.

In [1]:
## load the edges in the form of an adjacency matrix
ga.matrix <- read.table("data/Grey's Anatomy - sociomat.tsv", sep = "\t", quote = "\"", header = T, row.names = 1)
ga.matrix <- as.matrix(ga.matrix)

In [2]:
## load the node attributes
ga.attributes <- read.table("data/Grey's Anatomy - attributes.tsv",
    sep = "\t", header = T, quote = "\"", 
    stringsAsFactors = F, strip.white = T, as.is = T)

In [None]:
## create the network using `network`
ga.net <- network(ga.matrix, vertex.attr=ga.attributes, vertex.attrnames=colnames(ga.attributes),
                directed=F, hyper=F, loops=F, multiple=F, bipartite=F)

From there, I'll let Ben take over. Let's [go over there](http://badhessian.org/2012/09/lessons-on-exponential-random-graph-modeling-from-greys-anatomy-hook-ups/) and work through the tutorial.