![UKDS Logo](./images/UKDS_Logos_Col_Grey_300dpi.png)

# Social Network Analysis:Techniques and Methods of Analysis

Welcome to the <a href="https://ukdataservice.ac.uk/" target=_blank>UK Data Service</a> training series on *Computational Social Science*. This series guides you through some of the most common and valuable new sources of data available for social science research: data collected from websites, social media platorms, text data, conducting simulations (agent based modelling), to name a few. To help you get to grips with these new forms of data, we provide webinars, interactive notebooks containing live programming code, reading lists and more.

* To access training materials for the entire series: <a href="https://github.com/UKDataServiceOpen/computational-social-science" target=_blank>[Training Materials]</a>

* To keep up to date with upcoming and past training events: <a href="https://ukdataservice.ac.uk/news-and-events/events" target=_blank>[Events]</a>

* To get in contact with feedback, ideas or to seek assistance: <a href="https://ukdataservice.ac.uk/help.aspx" target=_blank>[Help]</a>

<a href="https://www.research.manchester.ac.uk/portal/julia.kasmire.html" target=_blank>Dr Julia Kasmire</a> and <a href="https://www.research.manchester.ac.uk/portal/diarmuid.mcdonnell.html" target=_blank>Dr Diarmuid McDonnell</a> <br />
UK Data Service  <br />
University of Manchester <br />
September 2020

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span><ul class="toc-item"><li><span><a href="#Aims" data-toc-modified-id="Aims-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Aims</a></span></li><li><span><a href="#Lesson-details" data-toc-modified-id="Lesson-details-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Lesson details</a></span></li></ul></li><li><span><a href="#Guide-to-using-this-resource" data-toc-modified-id="Guide-to-using-this-resource-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Guide to using this resource</a></span><ul class="toc-item"><li><span><a href="#Interaction" data-toc-modified-id="Interaction-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Interaction</a></span></li><li><span><a href="#Learn-more" data-toc-modified-id="Learn-more-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Learn more</a></span></li></ul></li><li><span><a href="#Social-Network-Analysis:-The-Basics" data-toc-modified-id="Social-Network-Analysis:-The-Basics-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Social Network Analysis: The Basics</a></span><ul class="toc-item"><li><span><a href="#What-is-Social-Network-Analysis?" data-toc-modified-id="What-is-Social-Network-Analysis?-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>What is Social Network Analysis?</a></span></li><li><span><a href="#Key-concepts" data-toc-modified-id="Key-concepts-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Key concepts</a></span></li></ul></li><li><span><a href="#Analysing-Social-Networks" data-toc-modified-id="Analysing-Social-Networks-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Analysing Social Networks</a></span><ul class="toc-item"><li><span><a href="#Preliminaries" data-toc-modified-id="Preliminaries-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Preliminaries</a></span></li><li><span><a href="#Network-level-measures" data-toc-modified-id="Network-level-measures-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Network-level measures</a></span></li><li><span><a href="#Node-level-measures" data-toc-modified-id="Node-level-measures-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Node-level measures</a></span></li></ul></li><li><span><a href="#Advanced-approaches" data-toc-modified-id="Advanced-approaches-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Advanced approaches</a></span><ul class="toc-item"><li><span><a href="#Relational-Event-Model" data-toc-modified-id="Relational-Event-Model-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Relational Event Model</a></span></li></ul></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Conclusion</a></span></li><li><span><a href="#Bibliography" data-toc-modified-id="Bibliography-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Bibliography</a></span></li><li><span><a href="#Further-reading-and-resources" data-toc-modified-id="Further-reading-and-resources-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Further reading and resources</a></span></li><li><span><a href="#Appendices" data-toc-modified-id="Appendices-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Appendices</a></span><ul class="toc-item"><li><span><a href="#Calculating-the-size-of-a-network" data-toc-modified-id="Calculating-the-size-of-a-network-9.1"><span class="toc-item-num">9.1&nbsp;&nbsp;</span>Calculating the size of a network</a></span></li></ul></li></ul></div>

## Introduction

Vast swathes of our social interactions and personal behaviours are now conducted online and/or captured digitally. Thus, computational methods for collecting, cleaning and analysing data are an increasingly important component of a social scientist’s toolkit.

In this training series we cover some of the essential knowledge and skills needed to engage in **Social Network Analysis (SNA)**, a methodological approach that provides concepts, tools and techniques for uncovering and understanding social structures, relations and networks of assocation. We focus on the three major stages of SNA:
1. Understanding fundamental concepts and terms [ [LINK] ](https://github.com/UKDataServiceOpen/social-network-analysis/blob/master/code/ukds-sna-fundamentals-2020-09-01.ipynb).
2. Collecting and cleaning social network data from various sources. [ [LINK] ](https://github.com/UKDataServiceOpen/social-network-analysis/blob/master/code/ukds-sna-getting-data-2020-09-15.ipynb)
3. Performing basic and intermediate analyses of social network data. [Focus of this notebook]

By the end of these lessons you should be confident in your understanding of key SNA concepts and terms, proficient in the handling and cleaning of social network data, and able to apply a range of analytical techniques to derive substantive insight about social structures and relations. In addition, you will gain fluency in the use of the Python programming language for SNA and other computational social science tasks.

### Aims

This lesson - **Social Network Analysis: Techniques and Methods of Analysis** - has two aims:
1. Define and implement a range of social network analysis techniques and measures.
2. Cultivate your computational skills through coding examples. For example, there are a number of opportunities for you to exectute and adapt the analyses conducted during this lesson.

### Lesson details

* **Level**: Intermediate, for individuals with some prior knowledge and experience of social network concepts and data.
* **Duration**: 30-60 minutes.
* **Pre-requisites**: You are strongly encouraged to complete the previous lessons:
    * [Social Network Analysis: Basic Concepts](https://github.com/UKDataServiceOpen/social-network-analysis/tree/master/webinars)
    * [Social Network Analysis: Getting and Marshalling Data](https://github.com/UKDataServiceOpen/social-network-analysis/tree/master/webinars)
* **Audience**: Researchers and analysts from any disciplinary background interested in employing network analysis for social science research purposes.
* **Programming language**: Python.
* **Learning outcomes**:
	1. Understand a range of basic and intermediate analytical methods for use with social network data.
	2. Be able to use Python for analysing social network data.

## Guide to using this resource

This learning resource was built using <a href="https://jupyter.org/" target=_blank>Jupyter Notebook</a>, an open-source software application that allows you to mix code, results and narrative in a single document. As <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>Barba et al. (2019)</a> espouse:
> In a world where every subject matter can have a data-supported treatment, where computational devices are omnipresent and pervasive, the union of natural language and computation creates compelling communication and learning opportunities.

If you are familiar with Jupyter notebooks then skip ahead to the main content (*What is Social Network Analysis?*). Otherwise, the following is a quick guide to navigating and interacting with the notebook.

### Interaction

**You only need to execute the code that is contained in sections which are marked by `In []`.**

To execute a cell, click or double-click the cell and press the `Run` button on the top toolbar (you can also use the keyboard shortcut Shift + Enter).

Try it for yourself:

In [None]:
print("Enter your name and press enter:")
name = input()
print("\r")
print("Hello {}, enjoy learning more about Python and SNA!".format(name))

### Learn more

Jupyter notebooks provide rich, flexible features for conducting and documenting your data analysis workflow. To learn more about additional notebook features, we recommend working through some of the <a href="https://github.com/darribas/gds19/blob/master/content/labs/lab_00.ipynb" target=_blank>materials</a> provided by Dani Arribas-Bel at the University of Liverpool. 

## Social Network Analysis: The Basics

We will quickly cover the essential concepts and elements of Social Network Analysis (SNA), though we strongly advise you to work through our [previous webinar](https://www.youtube.com/watch?v=PJOM0m_WeTA) on this topic.

### What is Social Network Analysis?

Social network analysis (SNA) is a methodological and conceptual toolbox for the measurement, systematic description, and analysis of patterns in relational structures in the social world (Caiani, 2014). 

A relation is a distinctive type of connection or tie between two entities (Wasserman & Faust, 1994). For example, a married couple share a spousal relation, a brother and sister share a sibling relation, co-workers share a collegial relation etc. 

Relations are the building blocks of networks, and thus SNA is concerned with and most appropriate for analyses of data capturing relations between units of analysis (Scott, 2017).

### Key concepts

A network is constructed from two key components (Owen-Smith, 2017):
1. The **entities** that are (or can be) connected.
2. The **connections** that exist (or could exist) between entities.

For example, a family tree is a network containing individuals (entities) that are related through some type of familial tie (connection). Therefore a network is an aggregation or collection of these entities and their connections. For example, here is the familial network of the members of the UK Royal Family ([BBC, 2020](https://www.bbc.com/news/uk-23272491
)):

![UK Royal Family](./images/royal-family.png)

## Analysing Social Networks

When analysing social networks, it is useful to think about three levels of analysis (Caiani, 2014):
1. Macro-level - producing summaries of the overall network
2. Meso-level - producing summaries of components and subsets
3. Micro-level - producing summaries of nodes and ties

Key theoretical distinction of social network theory: structure affects outcomes at the micro level. Through what mechanisms does this occur? (Caiani, 2014):
* Access to resources
* Closeness/proximity affects influence and diffusion
* Position in network affects exposure to risk, opportunities and outcomes
* Position in network creates obligations and commitments

A point to remember: network structure is not always the result of deliberate action on the part of the entities i.e., their is no vision or strategy for the network, it simply arises through the many small decisions of the entities.

> Quantitative metrics let you differentiate networks, learn about their topologies, and turn a jumble of nodes and edges into something you can learn from. (Programming Historian, 2017)

### Preliminaries

In this section we import the Python modules we need for our analysis, as well as our network data set.

#### Python modules

In [None]:
import pandas as pd # data manipulation
import numpy as np # mathematical operations
import networkx as nx # network analysis
import matplotlib.pyplot as plt # data visualisation
from operator import itemgetter # suite of standard Python operators

#### Data set

We will use a data set capturing connections between funders of charitable and similar organisations under a scheme designed to tackle the impact of Covid-19 on said organisations: https://covidtracker.threesixtygiving.org/

In [None]:
data = pd.read_csv("./data/funder-network-covid19-2020-09-24.csv", index_col = 0)
data.sample(5)

Each row is a funder (e.g., the National Lottery, Children in Need), each column is a funder, and each cell represents how many organisations a pair of funders both supported. This is an example of an **adjacency matrix**: it maps who is next to whom in a social space. Saying two nodes are adjacent is another way of describing the presence of a tie between them. In addition we can also speak to the *strength* of the ties between funders: some jointly support 2 organisations, some jointly support 20 or more.

Therefore we have a **valued, undirected** network of funders.

Finally, we are ready to convert the matrix into a `networkx` graph object:

In [None]:
fundgraph = nx.from_pandas_adjacency(data)
print(nx.info(fundgraph))

### Network-level measures

#### Visualisation

Visualising networks is an appealing activity and is often the first step in any analysis. However, as will become apparent, visualisation is an insufficient and often unrevealing output for all but the simplest networks (Hanneman and Riddle, 2005). For example, here are three different representations of our funder network:

*Random Layout*

![Funder Network Random Layout](./images/funder-network-random-2020-08-27.png)

*Spring Layout*

![Funder Network Spring Layout](./images/funder-network-spring-2020-08-27.png)

*Kamada Kawai Layout*

![Funder Network Kamada Kawai Layout](./images/funder-network-kamada-2020-08-27.png)

I'm sure you'll agree that deriving insight on the essential properties of this network is difficult-to-impossible based on these representations (and there are others we didn't use).

#### Size

*Size* is defined as the number of nodes and ties in a network. It is a simple and interesting measure in its own right, but it is also valuable for standardising other measures when we want to compare different networks.

In [None]:
print(nx.info(fundgraph))

There are 640 ties (edges) between the 83 funders. *Average degree* is a measure of the mean number of ties a node has with other nodes. In our example, a funder is, on average, connected to fifteen others through the organisations it supports.

#### Degree distribution

The distribution of the number of ties (degrees) in the network can be visualised using a historgram: 

In [None]:
degrees = [fundgraph.degree(n) for n in fundgraph.nodes()]

plt.hist(degrees)
plt.title('Degree distribution')
plt.xlabel('Number of ties')
plt.ylabel('Number of funders')

plt.show()

As you can see, there are a decent amount of connections between funders in this network. For example, around 18 funders are each connected to 20 others. While this seems like a lot of interconnection, remember the following: a funder connected to 20 others does not mean that each pair fund **the same** organisations. Funders A and B can support Organisation 1, Funders A and C can support Organisation 2 etc.

In [None]:
nx.number_of_isolates(fundgraph)

We see that there are eight funders with no connections to other funders in the network: these are known as **isolates**. Basically these are funders who provide grants to a set of organisations who do not receive support from any of the other funders.

#### Density

How cohesive or dense is this network? That is, how many of the possible connections between charities have been realised? We can use the `nx.density()` function to calculate a measure ranging from 1 (all connections realised) to 0 (no connections between nodes).

In [None]:
density = nx.density(fundgraph)
print("Network density:", density)

The network is reasonably dense - 19% of all possible ties between funders are realised. While this is quite high for a "real" network, remember that we have relatively few funders (83) to organisations (c. 10,000), and that many of the funders will have similar funding criteria for their grants (thus resulting in organisations qualifying for grants from more than one funder).

#### Clustering

To what extent are nodes in the network clustered together? That is, do groups of nodes tend to realise all possible connections between them. *Transitivity* is one such measure of clustering: it is defined as the ratio of all triads realised to all possible triads. A possible triad exists when one node is connected to two others: in such a scenario we can assume that the other two nodes have a good opportunity to connect to each other. Put another way, *transitivity* calculates the probability that two individuals who share a common acquaintance, will end up connecting with each other directly i.e., a friend of a friend becomes a friend. 

It's a measure of a social structure's tendency to stability or equilibrium (Hanneman and Riddle, 2005).

See the simple examples below to gain an understanding of this concept:

*Potential triad*

![Potential Triad](./images/potential-triad-2020-08-26.png)

*Realised triad*

![Realised Triad](./images/triad-2020-08-26.png)

In [None]:
triadic_closure = nx.transitivity(fundgraph)
print("Triadic closure:", triadic_closure)

Transitivity measure is reasonably high: 49% of possible triads are realised. This suggests that where two funders are supporting the same organisation, it is as likely as not that a third funder will support the same organisation.

#### Diameter

The diameter of a network is a measure of the distance from one "end" of the network to the other. More technically speaking, the diameter represents the largest geodesic distance in the network: the pair of nodes who are separated by the most distance.

In [None]:
diameter = nx.diameter(fundgraph)
print("Network diameter of largest component:", diameter)

Hmm, we get an error telling us it is impossible to calculate the diameter of a network that "is not connected". This means there are nodes with no connections in network (*isolates*) whose presence precludes calculating the diameter &mdash; what sense does it make to say an isolate is *x* distance from another node? Therefore, we need to calculate the diameter of the largest **component** in the network as a proxy for the entire network:

In [None]:
# Identify largest component in network
components = nx.connected_components(fundgraph)
largest_component = max(components, key=len)

# Represent this component as a network itself
subgraph = fundgraph.subgraph(largest_component)

# Calculate diameter of component
diameter = nx.diameter(subgraph)
print("Network diameter of largest component:", diameter)

The diameter is 4, meaning that the most distant, connected funders in the network can reach each other in four steps.

#### Components

We introduced the concept of **components** when measuring the diameter of a network. A component is a subset of the network where every node is connected to every other, either directly or indirectly. A network will often have more than one component: if it does then note that, by definition, no two components are connected. Put another way, there will be no direct or indirect ties between the nodes in one component and the nodes in another.

In [None]:
print(nx.number_connected_components(fundgraph))

Our network of funders is sub-divided into nine components but this is a bit misleading as *isolates* (nodes with no ties) are counted as components. In reality we have one large component and eight isolates:

In [None]:
# Create subgraph for each component and print network summary info
S = [fundgraph.subgraph(c).copy() for c in nx.connected_components(fundgraph)]
for G in S:
    print(nx.info(G))

Regardless of the number of components, analytical attention is often solely focused on the largest component (like the **diameter** measure). The largest component can be represented visually as follows:

*Largest component*

![Funder Network Largest Component](./images/funder-network-largest-component-2020-08-27.png)

#### Cliques

A **clique** is similar to a component in that it a subset of the network where every node is connected to every other &mdash; the crucial difference is that the nodes are share *direct* ties only.

In [None]:
nx.graph_number_of_cliques(fundgraph)

There are quite a large number of cliques in the network, though this may not be surprising given what we have learned so far, especially:
* The *transitivity* measure is high
* There are a handful of funders with a large number of ties

Let's focus instead on the largest clique in the network:

In [None]:
# Find all cliques and select the largest
cliques = nx.find_cliques(fundgraph)
cgraph = fundgraph.subgraph(max(cliques, key=len))
print(nx.info(cgraph))

*Largest clique*

![Funder Network Largest Clique](./images/funder-network-largest-clique-2020-08-27.png)

### Node-level measures

#### Centrality

The centrality of a node is a measure of how it important it is in a network. A node may act as a hub, or broker connections between other nodes, or be positioned close to many other nodes. As such, there are difference measures of centrality we can apply to understand the importance of nodes.

##### Degree centrality

This measures how "popular" or well connected a node is in the network (Scott, 2017). It is a normalised measure of the number of ties a node possesses (usually direct ties but can be calculated at other distances).

First, let's examine the raw number of ties each node has.

In [None]:
# Calculate number of degrees for each node and add as an attribute
degree_dict = dict(fundgraph.degree(fundgraph.nodes()))
nx.set_node_attributes(fundgraph, degree_dict, "degree")

# Sort by number of degrees and examine top 20 connected nodes
sorted_degree = sorted(degree_dict.items(), key=itemgetter(1), reverse=True)
print("Top 20 nodes by degree:")
for d in sorted_degree[:20]:
    print(d)

The best connected node is the The National Lottery Community Fund, which is connected to 61 of the other funders in the network (you can look up the identifiers in the original [grants data set](https://github.com/UKDataServiceOpen/social-network-analysis/blob/master/code/data/threesixtygiving-covid19-grants-2020-09-24.csv)).

In [None]:
# Calculate degree centrality for each node and add as an attribute
degree_centrality = nx.degree_centrality(fundgraph)
nx.set_node_attributes(fundgraph, degree_centrality, "degree centrality")

# Sort by degree centrality and examine top 20 connected nodes
sorted_degree_centrality = sorted(degree_centrality.items(), key=itemgetter(1), reverse=True)
print("Top 20 nodes by degree centrality:")
for d in sorted_degree_centrality[:20]:
    print(d)

Both measures &mdash; the raw and normalised number of ties &mdash; capture the same concept: *importance / popularity*. The normalised measure (number of degrees / number of nodes - 1) is preferred when making comparisons between networks or components.

The best connected connected nodes can be considered as **hubs** in the network.

##### Betweenness centrality

This captures the idea of brokerage in a network i.e., whether a node faciliates indirect ties between other nodes (Owen-Smith, 2017). Consider the simple example below &mdash; Jane acts a broker between Josie and John, who otherwise are not connected (directly or through another node).

Therefore betweenness centrality is a measure of the proportion of times a node lies on the geodesic (shortest) path between all other pairs of nodes in the network. Put simply, it captures how often a node lies along the shortest path between two other nodes.

![Indirect Tie](./images/indirect-tie-2020-08-26.png)

In [None]:
# Calculate betweenness centrality for each node and add as an attribute
betweenness_centrality = nx.betweenness_centrality(fundgraph)
nx.set_node_attributes(fundgraph, betweenness_centrality, "betweenness centrality")

# Sort by betweenness centrality and examine top 20 broker nodes
sorted_betweenness_centrality = sorted(betweenness_centrality.items(), key=itemgetter(1), reverse=True)
print("Top 20 nodes by betweenness centrality:")
for d in sorted_betweenness_centrality[:20]:
    print(d)

We see some consistency between our degree and betweenness centrality measures: for example, not only is The National Lottery Community Fund the best connected funder, it also sits along the shortest path between many of the other funders.

You may have come across the term *structural hole* when analysing a social network. This refers to a scenario where there is a lack of a direct contact or tie between two or more entities (Burt, 1992). Therefore, a broker can fill this structural hole and generate an indirect tie between two or more unconnected nodes. As a simple example, think of what would happen if Jane were to disappear from the simple friendship network below:

![Indirect Tie](./images/indirect-tie-2020-08-26.png)

In [None]:
# Calculate structural hole measure (constraint) for each node and add as an attribute
sh = nx.constraint(fundgraph)
nx.set_node_attributes(fundgraph, sh, "structural hole constraint")

# Sort by structural hole measure (constraint) and examine top 20 connected nodes
sorted_sh = sorted(sh.items(), key=itemgetter(1), reverse=False)
print("Top 20 nodes by constraint:")
for d in sorted_sh[:20]:
    print(d)

##### Closeness centrality

This captures the idea of proximity in a network i.e., whether a node is situated close to others. Understanding proximity is important as it affects influence over and diffusion between nodes (Caiani, 2014). A node that is proximate to many others can be considered to occupy a position of strategic significance in the network (Scott, 2017).

Closeness centrality is a measure of the average distance a node is from all other nodes, with distance represented as the shortest path between a pair of nodes (we cover this in the next section).

In [None]:
# Calculate degree centrality for each node and add as an attribute
closeness_centrality = nx.closeness_centrality(fundgraph)
nx.set_node_attributes(fundgraph, closeness_centrality, "closeness centrality")

# Sort by degree centrality and examine top 20 connected nodes
sorted_closeness_centrality = sorted(closeness_centrality.items(), key=itemgetter(1), reverse=True)
print("Top 20 nodes by closeness centrality:")
for d in sorted_closeness_centrality[:20]:
    print(d)

Again, there's a high degree of consistency in our network between the best connected funders (*degree centrality*), those with the greater potential for brokering connections (*betweenness centrality*), and those that are most proximate (*closeness centrality*).

#### Distance

The distance between two nodes is a measure of how closely connected or reachable they are: two nodes with a direct tie are separated by a distance of 1 (e.g., two friends); two nodes that possess an indirect tie are separated by a distance of 2 or more (e.g., a friend of a friend). 

One way of measuring the distance between nodes is **geodesic distance**: this is the shortest (i.e., optimal, most efficient) path between two nodes (Hanneman and Riddle, 2005). A path is a sequence between two nodes that does not reuse a node or tie; put simply, there is no doubling back on a path between two nodes.

In sum, the geodesic distance between two nodes captures the shortest number of steps (ties) between them.

A simple path looks like this:

In [None]:
G = nx.path_graph(8)
nx.draw(G)
plt.show()

First, let's take a look at the various paths that exist between between two of the least connected funders in the network. As you can see there are numerous ways of travelling between these two nodes.

![Paths](./images/funder-network-paths-2020-08-27.png)

But what is the shortest path?

In [None]:
print(nx.shortest_path(fundgraph, source="GB-CHC-247941", target="GB-CHC-1146484"))

Is there more than one shortest path between these two funders?

In [None]:
for p in nx.all_shortest_paths(fundgraph, source="GB-CHC-247941", target="GB-CHC-1146484"):
    print(p)

Finally, the shortest path length between these funders is 3 (number nodes on the path minus 1):

In [None]:
print(nx.shortest_path_length(fundgraph, source="GB-CHC-247941", target="GB-CHC-1146484"))

## Advanced approaches

Social network analysis is a rich methodological approach, and advances in how network data can be analysed are made frequently. Many are beyond the scope of this lesson &mdash; such as Exponential Random Graph Models (ERGMs), Stochastic Actor-Oriented Models (SAOMs) &mdash; but we will highlight one that we consider particularly interesting: the **Relational Event Model**.

### Relational Event Model

Relational Event Model (REM) is a statistical modelling framework that allows you to capture the ordering/sequencing of events in a network e.g. when a connection occurred, not just that it did or  how frequently (Tranmer et al., 2015).

In our example of funders, REM would allow us to analyse the timing of the connections that formed: for instance, The National Lottery Community Fund is connected to 61 other funders &mdash; who did it form a connection with first? On what date?

REM is a close cousin to Event History/Survival Analysis and Sequence Analysis.

In [None]:
df = pd.read_csv("./data/funder-network-edgelist-2020-08-27.csv", index_col=False)
df.head(10)

In [None]:
from IPython.display import IFrame    
IFrame("../reading-list/Schecter-relational-event.pdf", width=600, height=650)

## Conclusion

Social Network Analysis (SNA) is a broad, rich and increasingly relevant methodology for investigating patterns in social structures and relations. There is a rich array of social concepts and constructs that can measured and analysed using social network data. This lesson demonstrated the use of some of these measures and techniques of analysis but there are many more to be discovered (see the [`networkx` documentation](https://networkx.github.io/documentation/stable/reference/algorithms/index.html)).

Good luck on your data-driven travels!

## Bibliography

Barba, Lorena A. et al. (2019). *Teaching and Learning with Jupyter*. <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>https://jupyter4edu.github.io/jupyter-edu-book/</a>.

Bourdieu, P. (1986). The Forms of Capital. In J. Richardson (Ed.), *Handbook of Theory and Research for the Sociology of Education* (pp. 241-258). Westport, CT: Greenwood.

Burt, R. S. (1992). *Structural Holes: The Social Structure of Competition*. Cambridge, MA: Harvard University Press.

Caiani, M. (2014). Social Network Analysis. In D. Della Porta (Ed.), *Methodological Practices in Social Movement Research* (pp. 368-396). Oxford: Oxford University Press.

Grannovetter, M. (1973). The Strength of Weak Ties. *American Journal of Sociology, 78*(6), pp. 1360-1380. 

Hanneman, R. A., & Riddle, M. (2005). *Introduction to social network methods*. <a href="http://faculty.ucr.edu/~hanneman/nettext/" target=_blank>http://faculty.ucr.edu/~hanneman/nettext/</a>.

Owen-Smith, J. (2017). Networks: The Basics. In I. Foster et al. (Eds.), *Big Data and Social Science: A Practical Guide to Methods and Tools* (pp. 215-240). Boca Raton, FL: CRC Press.

Scott, J. (2017). *Social Network Analysis* (4th edition). London: SAGE Publications Inc.

Smith, K. P. & Christakis, N. A. (2008). Social Networks and Health. *Annual Review of Sociology, 34*, pp. 405-429.

Wasserman, S. & Faust, K. (1994). *Social Network Analysis*. Cambridge: Cambridge University Press.

## Further reading and resources

We maintain a list of useful books, papers, websites and other resources on our SNA Github repository: <a href="https://github.com/UKDataServiceOpen/social-network-analysis/tree/master/reading-list/" target=_blank>[Reading list]</a>

The help documentation for the `networkx` module is refreshingly readable and useful: <a href="https://networkx.github.io/documentation/stable/index.html" target=_blank>https://networkx.github.io/documentation/stable/index.html/</a>

You may also be interested in the following articles and lessons specifically relating to social network analysis:
* <a href="https://programminghistorian.org/en/lessons/exploring-and-analyzing-network-data-with-python" target=_blank>Exploring and Analyzing Network Data with Python </a>
* <a href="https://programminghistorian.org/en/lessons/creating-network-diagrams-from-historical-sources" target=_blank>From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources</a>

## Appendices

### Calculating the size of a network

We can calculate the total number of possible ties in a network using a simple formula, which we adjust slightly depending on whether we are dealing with *directed* or *undirected* ties.

#### Directed ties

\begin{equation*}
Y = n * (n - 1)
\end{equation*}

Where *Y* = number of ties, and *n* = number of nodes

#### Undirected ties

\begin{equation*}
Y = \frac{n * (n - 1)}{2}
\end{equation*}

Where *Y* = number of ties, and *n* = number of nodes

Apply these formulas yourself using the functions below:

In [None]:
def directed_ties(nodes):
    ties = nodes * (nodes - 1)
    print("The number of possible ties in this network is: {}".format(ties))
    
def undirected_ties(nodes):
    ties = int((nodes * (nodes - 1)) / 2)
    print("The number of possible ties in this network is: {}".format(ties))

We can call on these functions like so:

In [None]:
directed_ties(5)

In [None]:
undirected_ties(5)