In [None]:
suppressPackageStartupMessages({
    library(sna)
    library(testthat)
    library(network)
    library(ergm)
})

# Assessment Network Structure
In this assessment you will use the appraoches in the tutorial in week 10 but apply it to a directed network. The first part of this notebook provides some helpful routines for directed networks and outlines some important ways in which routines have to be modified.

First we begin with a warmup question.

### Question 1 (3 points)

When simulating random networks, you are allocating some propobabitily to each possible network. For an undirected network on $n$ nodes there a total of $2^{n(n-1)/2}$ possible networks. For a directed network on $n$ nodes, how many possible networks are there? Calculate the number of possible directed networks on $n$ nodes in terms of $n$:

In [None]:
enumerate.nets <- function(n)
{
# YOUR CODE HERE
stop('No Answer Given!')
return(total)
}    

In [None]:
# this cell contains tests (including some hidden tests) that your answer to Question 1 should pass

# Directed networks
In this assignment you will follow the approach in Tutorial 2 (Week 10) but apply it to directed networks. The relevant metrics for directed networks are those in Tutorial 1 (Week 9). In terms of simulation models, for directed networks there are some minor differenced in what the arguments you use. Below, the syntax is provided based on Kapferer's tailors. Throughout we are going to draw 1000 networks whenvever we use a null distribution


### Kapferer's tailors
Load Kapferer's (1972) Zambian $n=39$ tailors network.

In [None]:
temp <- tempfile()
download.file("https://raw.githubusercontent.com/johankoskinen/BayesERGM/main/data/kapferer.txt",temp)
kapf  <-  read.table(temp)# four adjacency matrices stacked on top of each other
unlink(temp)
# "instrumental" (work- and assistance-related) interactions
X <- as.matrix( kapf[(2*39+1):(3*39),])# KAPFTI1 non-symmetric, binary: time 1
n <- dim(X)[1]

# Directed Bernoulli model
Draw 1000 networks. We will say that for these networks $X \thicksim Bern(p)$

In [None]:
m <- 1000
Xsim <- rgraph( n, # match network size
                 m= m, # generate 1000 random networks
                 tprob = gden(X), # match tie-probability to density
                 mode='digraph') # this is the main difference from tutorial 2

# Conditionally uniform density

The conditionally uniform distribution, conditional on the density, now needs the number of *arcs*, which is the total degree
$$
\sum_{i=1}^n\sum_{j=1}^n X_{ij}
$$
Consequently you do not divide the sum of the matrix by 2 (in fact, since dyads need not be symmetric, you may have an odd number of arcs). We use the notation that $X \thicksim U | L $. Now do draw 1000 networks

In [None]:
Xunif <- rgnm( n =m,# generate 1000 random networks # match network size
                 nv = n, # the size of the networks
                 m = sum(X), # match the number of ties
                 mode='digraph') # make sure these are undirected graphs

# Conditionally uniform conditional on degree distribution

Since directed networks have both an *indegree* as well as an *outdegree* distribution, the conditionally uniform distribution conditional on the degree distribution can condition on either or both. For an adjacency matrix $X$, we let $X_{\cdot,+} = \left( \sum_j X_{ij}\right)$ be the vector of out-degrees, and
$$
X_{+,\cdot} = \left( \sum_i X_{ij} \right)^{T}
$$
be the vector of indegrees.

We can define three distributions. We let $X \thicksim U \mid X_{\cdot+}=d$ mean that the distribution is uniform on all the graphs that have *the exact same outdegree distribution*. Thus
$$ 
\Pr( X = x) = \left\{
\begin{array}{lr}
	c^{-1},&\text{if } x_{\cdot,+}=d\\
	0,&\text{else}
\end{array} 
\right. {\text{,}}
$$
where $c$ is the number of graphs with outdegree distribution $d$. When it is unambiguous, we will write  $X \thicksim U \mid d_{out}$.

We let $X \thicksim U \mid X_{+,\cdot}=d$ mean that the distribution is uniform on all the graphs that have *the exact same indegree distribution*. Thus
$$ 
\Pr( X = x) = \left\{
\begin{array}{lr}
	c^{-1},&\text{if } x_{+,\cdot}=d\\
	0,&\text{else}
\end{array} 
\right. {\text{,}}
$$
where $c$ is the number of graphs with indegree distribution $d$. When it is unambiguous, we will write  $X \thicksim U \mid d_{in}$.

We let $X \thicksim U \mid X_{\cdot,+}=d_{out},X_{+,\cdot}=d_{in}$ mean that the distribution is uniform on all the graphs that have *the exact same outdegree and indegree distribution*. Thus
$$ 
\Pr( X = x) = \left\{
\begin{array}{lr}
	c^{-1},&\text{if } x_{\cdot,+}=d_{out} \text{ and } x_{+,\cdot}=d_{in}\\
	0,&\text{else}
\end{array} 
\right. {\text{,}}
$$
where $c$ is the number of graphs with outdegree distribution $d_{out}$ and indegree distribution $d_{in}$. When it is unambiguous, we will write  $X \thicksim U \mid d_{out},d_{in}$.

The simulation method is the same as in Tutorial 2 (week 10) but with the specific degree distribution. Firstly, recall that the simulation function requires the starting network to be a network object:


In [None]:
X.net <- as.network(X,directed=TRUE)
X.net

To simulate, again, use a function from the `ergm` package.

For $X \thicksim U \mid d_{out}$

In [None]:
Xoutdegs <- simulate(X.net~edges,
                    coef=c(0),# the role of coefficients will become clear further on
                    constraints=~odegrees,# this guarantees that only networks with the same *outdegree* are generated
                    nsim=m,# set the number of draws
                    control=control.simulate(MCMC.burnin=100000))# you need to bump up the default burnin, otherwise 
                                        # the networks are too similar to the starting point, the observed network

For $X \thicksim U \mid d_{in}$

In [None]:
Xindegs <- simulate(X.net~edges,
                    coef=c(0),# the role of coefficients will become clear further on
                    constraints=~idegrees,# this guarantees that only networks with the same *indegree* are generated
                    nsim=m,# set the number of draws
                    control=control.simulate(MCMC.burnin=100000))# you need to bump up the default burnin, otherwise 
                                            #the networks are too similar to the starting point, the observed network

Finally, for $X \thicksim U \mid X_{\cdot,+}=d_{out},X_{+,\cdot}=d_{in}$

In [None]:
Xudegs <- simulate(X.net~edges,
                    coef=c(0),# the role of coefficients will become clear further on
                    constraints=~degrees,# this guarantees that only networks with the same in/ and outdegree are generated
                    nsim=m,# set the number of draws
                    control=control.simulate(MCMC.burnin=100000))# you need to bump up the default burnin, otherwise the 
                                            # networks are too similar to the starting point, the observed network

# Conditional U | MAN
For directed networks there is also an additional random graph model, the conditional $U \mid MAN$ random graph model. The  $U \mid MAN$ generates random graphs with a prescribed dyad census. For example, for the Kapferer's tailors data, we simulate

In [None]:
obs.dyad.census <- dyad.census(X)
Xuman <- rguman( n =m,# generate 1000 random networks # match network size
                 nv = n, # the size of the networks
               mut = obs.dyad.census[1],# the number of mutual dyads
               asym = obs.dyad.census[2],# the number of asymetric dyads
               null = obs.dyad.census[3],# the number of null dyads
               method='exact') # make sure these are undirected graphs

> Note: remember to use the correct dyad census. The triad census must add up to $n(n-1)/2$.

# Calculating metrics for simulated networks

We can calculate metrics for simulated directed networks the way we calculated metrics for undirected networks in Tutorial 2. Here we outline some additional things to think about.

### Dyad census

When you calculate an the dyad census for an $m\times n \times n$ array, sna returns it as an $m \times 3$ matrix, for example for Kapferer's tailors:

In [None]:
head(dyad.census(Xuman))# see! all graphs have the same dyad census

Even if you have a list of networks, `sna` returns the dyad census as the same matrix. To do a histogram of, say, the number of mutual dyads, just pick the first column

In [None]:
dyad.udeg <- dyad.census(Xudegs)
hist( dyad.udeg[,1] ,
      xlim = range(dyad.udeg[,1],obs.dyad.census[1]),
      main = 'Mutual dyads')
abline( v = obs.dyad.census[1], col ='red')

### Triad census

Similar to the dyad census across a number of simulate graphs, triad census for directed networks returns an $m\times 16$ array 


In [None]:
head( triad.census(Xudegs ))

### Degree distributions
For simulated networks we can now compare both indegree and outdegree distributions. You can plot them using the functions of Tutorial 2 but be careful to pick the right `cmode`


In [None]:
max.deg <- 15
par(mfrow=c(1,2))
degrees <-degree(Xsim,g=c(1:m), cmode='indegree')# you can calculate the degree distributions for all graphs
obs.deg <- degree( X , cmode='indegree')

deg.sist <- cbind( matrix( colSums(degrees==0),m,1),
                  t(apply(degrees,2,function(x) tabulate(x, nbins=max.deg) ) ) )# when tabulating we need to add isolates 

obs.deg <- c( sum(obs.deg==0) ,tabulate(obs.deg, nbins=max.deg)  )
  
matplot(c(0:(max.deg)), t(deg.sist) ,
        type ='l',col='grey',
        main='degree distribution' ,xlab ='indegree',ylab='frequency',
        ylim = range( deg.sist,obs.deg))
lines(c(0:(max.deg)),obs.deg,pch=24,
      col='black',lwd=3 )
#### === outdegre
degrees <-degree(Xsim,g=c(1:m), cmode='outdegree')# you can calculate the degree distributions for all graphs
obs.deg <- degree( X , cmode='outdegree')

deg.sist <- cbind( matrix( colSums(degrees==0),m,1),
                  t(apply(degrees,2,function(x) tabulate(x, nbins=max.deg) ) ) )# when tabulating we need to add isolates 

obs.deg <- c( sum(obs.deg==0) ,tabulate(obs.deg, nbins=max.deg)  )
  
matplot(c(0:(max.deg)), t(deg.sist) ,
        type ='l',col= 'grey',
        main='degree distribution' ,xlab ='outdegree',ylab='frequency',
        ylim = range( deg.sist,obs.deg))
lines(c(0:(max.deg)),obs.deg,pch=24,lty=1,
      col='black',lwd=3 )

For a single network you can plot the indegree distribution against the outdegree distribution

In [None]:
plot( degree(X, cmode='indegree'), degree(X, cmode='outdegree'))

One way to summarise the association between indegree and outdegree is to calculate the correlation. In order to avoid big code chunks, define a function that calculates the correlation between in- and out-degre:

In [None]:
deg.cor <- function(X)
{
  degcorr <- cor( degree(X,cmode='indegree'), degree(X,cmode='outdegree'))
  degcorr
}

You can use this function in the exact same way you use, for example, `centralization`:

In [None]:
deg.cor.obs <-  deg.cor( X )

deg.cor <- apply(Xsim,
                 1,
                 deg.cor ) 

hist(deg.cor, xlim=range(deg.cor.obs,deg.cor))
abline(v =deg.cor.obs, col='red')

> Functions in `sna` will work the same on $m \times n \times n$ arrays of networks and lists of networks of length $m$, and return the same output. User-defiend functions may be different.  

For *lists* of network variables, you need to use `lapply` and then convert the output from a list to a vector:

In [None]:
deg.cor.obs <-  deg.cor(X)
# Xudegs is a list of networks, so we need to use
# 'lapply' rather than 'apply'
deg.cor <- lapply(Xoutdegs ,
                 deg.cor )# the function you want to apply to each element  Xoutdegs[[k]]
deg.cor <- unlist(deg.cor)# 'lapply' returns a list and here we want a verctor
hist(deg.cor, xlim=range(deg.cor.obs,deg.cor))
abline(v =deg.cor.obs, col='red')

For centralization, note that you can distinguish between indegree and outdegree centrality

In [None]:
centralization(X,degree,cmode="indegree")
centralization(X,degree,cmode="outdegree")

# Tasks

This is the main part of the assessment.

The exercise is based on the s50 dataset. Go to the page (https://www.stats.ox.ac.uk/~snijders/siena/s50_data.htm) and read the data description.

### Question 2 (2 points)

What is the maximum number of names respondents are allowed to nominate and what is the distribution of male and female students?




YOUR ANSWER HERE

## Load the s50 data

Load the s50 data and plot it

In [None]:
temp <- tempfile()
download.file("https://www.stats.ox.ac.uk/~snijders/siena/s50_data.zip",temp)
X <- as.matrix( read.table(unz(temp, "s50-network1.dat")) )
unlink(temp)
gplot(X)

Define `n` and `obs.dyad.census`

In [None]:
# you do not have to use these exact variable names, but
# if you chose different ones, make sure to be consistent in the sequel

# YOUR CODE HERE
stop('No Answer Given!')
n
obs.dyad.census


# Generate your random networks
Using the **code provided above**, to generate networks from
* $X \thicksim Bern(p)$
* $X \thicksim U | d_{out}$
* $X \thicksim U | d_{in}$
* $X \thicksim U | d_{out},d_{in}$
* $X \thicksim U | MAN $

For each model make sure that you set the right parameters to match the s50 dataset.

## Bernoulli

Simulate your networks (again, you do not have to use the same variable names)


In [None]:
m <- 1000
# YOUR CODE HERE
stop('No Answer Given!')

gplot( Xbern[1,,] )# plot the first one of the networks - note that this requires
# that you called your array of networks XBern

## Uniform conditional on outdegree $X \thicksim U | d_{out}$

Draw networks that fixed outdegree, the starting network needs to be translated to a `network` object as the `ergm` function requires it):

In [None]:
X.net <- as.network(X,directed=TRUE)
# YOUR CODE HERE
stop('No Answer Given!')
plot( Xoutdeg[[1]] ) #  plot the first network in the *list* of networks
# NOTE: the `network` package knows that 'plot' for a network object means 'gplot'


## Uniform conditional on indegree $X \thicksim U | d_{in}$

Draw networks that fixed outdegree

In [None]:
# Use the code from the Kapferer example an be careful to adopt the right bits
# YOUR CODE HERE
stop('No Answer Given!')
plot( Xindegs[[1]] ) #  plot the first network in the *list* of networks

## Uniform conditional on outdegree and indegree $X \thicksim U | d_{out},d_{in}$

Draw networks that fixed both outdegree and outdegree

In [None]:
# Use the code from the Kapferer example an be careful to adopt the right bits
# YOUR CODE HERE
stop('No Answer Given!')

plot( Xdegs[[1]] ) # plot the first network in the *list* of networks

## Uniform conditional on dyad census $X \thicksim U | MAN $

Draw networks that have the right prescribed dyad census


In [None]:
# Use the code from the Kapferer example an be careful to adopt the right bits
# YOUR CODE HERE
stop('No Answer Given!')
# If you get an ERROR, make sure that you have the right dyad census and right 'n'
gplot( Xuman[1,,] ) # plot the first network in the m times n times n *array of networks*

# Dyad census

After you had read in the dataset you calculated the dyad census. Now test if there is more mutual dyads than you would expect, holding different properties constant. You may based this test on histograms alone or formally testing the numbers as in the Week 10 tutorial. Think carefully about which null-distributions to use

In [None]:
# Write your R-code in this code chunk - there are no hidden tests here so 
# you can use this space for what you need in order to answer Question 3
# YOUR CODE HERE
stop('No Answer Given!')

### Question 3 (4 points)

For your distributions, what was the larges proportion of simulated graphs that had as large a number of mutual dyads or greater? Provide your numerical answer in the code block below

In [None]:
prop <- c()
# YOUR CODE HERE
stop('No Answer Given!')

In [None]:
# this cell contains a hidden test that your answer to Question 3 should pass

### Question 4 (4 points)

What can be conclude based upon $U \mid d_{out},d_{in}$, can you provide a brief (one sentence) plain language interpretation




YOUR ANSWER HERE

# Centralization

Plot the indegree distribution

In [None]:
plot( table( degree( X , cmode='indegree') ) , type='b')

Does it seem as if popularity is evenly distributed in the network or not? Let us calculate the centralization in indegrees

In [None]:
Cent.obs <- centralization(X,# the observed network
                           degree, # the centrality index we want
                           cmode="indegree",# and the type we want the index to be based on
                           normalize=FALSE, mode='digraph')
Cent.obs

Now, test centralization using your simulations from the null distributions. You may based this test on histograms alone or formally testing the numbers as in the Week 10 tutorial. Think carefully about which null-distributions to use

In [None]:
# Write your R-code in this code chunk - there are no hidden tests here so 
# you can use this space for what you need in order to answer Question 5
# YOUR CODE HERE
stop('No Answer Given!')

### Question 5 (4 points)

Are any of the models able to explain centralization, if so provide a brief (one sentence) plain language interpretation




YOUR ANSWER HERE

## Assortativity

Plot the indegrees against the outdegrees

In [None]:
plot( jitter( degree(X, cmode='indegree'), factor =.25 ),# jitter adds some random noise so 
     jitter(degree(X, cmode='outdegree'), factor=.25))# that the points are on on top of eachother

Does it look that people who send a lot of ties also receive a lot of ties? 

Calculate the correlation between indegree and outdegree in the network

In [None]:
deg.cor.obs <-  deg.cor(X)
deg.cor.obs

Using the code above, test whether this correlation is higher than expected using appropriate null distributions. Like in the previous questions, you may use 

In [None]:
# Write your R-code in this code chunk - there are no hidden tests here so 
# you can use this space for what you need in order to answer Question 5
# YOUR CODE HERE
stop('No Answer Given!')

### Question 6 (4 points)

Is there a process that seems to account for the correlation between popularity and activity? Try to provide an interpretation if this is the case




YOUR ANSWER HERE

## Triad census

Recall, that for directed networks there are 16 different triads.

![The 16 MAN triads for directed networks .](https://raw.githubusercontent.com/johankoskinen/BayesERGM/main/data/triadcensus.png)

Each triad is labeled by its dyad census - this is called the MAN labeling scheme.

Calculate the triad census for the observed network

In [None]:
triad.obs <- triad.census(X)
triad.obs 

### Test triads

Here you are going to focus on the transitive triad 030T, the cyclic triad 030C, and the dense triad (of Simmilean tie) 300. In the order of triads, these are number 9, 10, and 16, respectively.

For each random graph model in turn, plot the distribution of these three triads and test whether the observed network has significantly more of these triads than expected under the random graph


### Bernoulli 

In [None]:
# You can use this as a template 

par(mfrow=c(1,3))
triad.sim <- triad.census(Xbern)
for (k in c(9,10,16)){
hist( triad.sim[,k] ,
      xlim = range(triad.sim[,k],triad.obs[k]),
      main = paste('prop ',colnames(triad.obs)[k], mean( triad.sim[,k]>triad.obs[k] ) )  )
abline( v = triad.obs[k]  )  
}


###  Conditional on oudegree $U | d_{out}$ (1 point)


In [None]:
# Write your R-code in this code chunk - there are no hidden tests here so 
# you can use this space for what you need in order to test and plot
par(mfrow=c(1,3))
# YOUR CODE HERE
stop('No Answer Given!')

###  Conditional on inegree $U | d_{in}$ (1 point)

In [None]:
# Write your R-code in this code chunk - there are no hidden tests here so 
# you can use this space for what you need in order to test and plot
par(mfrow=c(1,3))
# YOUR CODE HERE
stop('No Answer Given!')

###  Conditional on outdegree and inegree $U | d_{out},d_{in}$ (1 point)

In [None]:
# Write your R-code in this code chunk - there are no hidden tests here so 
# you can use this space for what you need in order to test and plot
par(mfrow=c(1,3))
# YOUR CODE HERE
stop('No Answer Given!')

###  Conditional on dyad census $U |MAN$ (1 point)

In [None]:
# Write your R-code in this code chunk - there are no hidden tests here so 
# you can use this space for what you need in order to test and plot
par(mfrow=c(1,3))
# YOUR CODE HERE
stop('No Answer Given!')

### Question 7 (4 points)

looking at 030T compare the results for Bernoulli, $U | d_{out}, d_{in}$, and $U | MAN$, and explain this in terms of status hierarchy




YOUR ANSWER HERE

### Question 8 (3 points)

looking at 300, interpret the result with reference to $U | MAN$.




In [None]:
# YOUR CODE HERE
stop('No Answer Given!')