![UKDS Logo](./images/UKDS_Logos_Col_Grey_300dpi.png)

# Social Network Analysis: Getting and Marshalling Data

Welcome to the <a href="https://ukdataservice.ac.uk/" target=_blank>UK Data Service</a> training series on *New Forms of Data for Social Science Research*. This series guides you through some of the most common and valuable new sources of data available for social science research: data collected from websites, social media platorms, text data, conducting simulations (agent based modelling), to name a few. To help you get to grips with these new forms of data, we provide webinars, interactive notebooks containing live programming code, reading lists and more.

* To access training materials for the entire series: <a href="https://github.com/UKDataServiceOpen/new-forms-of-data" target=_blank>[Training Materials]</a>

* To keep up to date with upcoming and past training events: <a href="https://ukdataservice.ac.uk/news-and-events/events" target=_blank>[Events]</a>

* To get in contact with feedback, ideas or to seek assistance: <a href="https://ukdataservice.ac.uk/help.aspx" target=_blank>[Help]</a>

<a href="https://www.research.manchester.ac.uk/portal/julia.kasmire.html" target=_blank>Dr Julia Kasmire</a> and <a href="https://www.research.manchester.ac.uk/portal/diarmuid.mcdonnell.html" target=_blank>Dr Diarmuid McDonnell</a> <br />
UK Data Service  <br />
University of Manchester <br />
September 2020

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span><ul class="toc-item"><li><span><a href="#Aims" data-toc-modified-id="Aims-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Aims</a></span></li><li><span><a href="#Lesson-details" data-toc-modified-id="Lesson-details-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Lesson details</a></span></li></ul></li><li><span><a href="#Guide-to-using-this-resource" data-toc-modified-id="Guide-to-using-this-resource-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Guide to using this resource</a></span><ul class="toc-item"><li><span><a href="#Interaction" data-toc-modified-id="Interaction-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Interaction</a></span></li><li><span><a href="#Learn-more" data-toc-modified-id="Learn-more-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Learn more</a></span></li></ul></li><li><span><a href="#Social-Network-Analysis:-The-Basics" data-toc-modified-id="Social-Network-Analysis:-The-Basics-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Social Network Analysis: The Basics</a></span><ul class="toc-item"><li><span><a href="#What-is-Social-Network-Analysis?" data-toc-modified-id="What-is-Social-Network-Analysis?-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>What is Social Network Analysis?</a></span></li><li><span><a href="#What-are-the-principles-behind-SNA?" data-toc-modified-id="What-are-the-principles-behind-SNA?-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>What are the principles behind SNA?</a></span></li><li><span><a href="#Why-should-you-consider-SNA-for-your-research?" data-toc-modified-id="Why-should-you-consider-SNA-for-your-research?-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Why should you consider SNA for your research?</a></span></li></ul></li><li><span><a href="#Key-Concepts" data-toc-modified-id="Key-Concepts-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Key Concepts</a></span><ul class="toc-item"><li><span><a href="#Entities" data-toc-modified-id="Entities-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Entities</a></span></li><li><span><a href="#Connections" data-toc-modified-id="Connections-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Connections</a></span></li><li><span><a href="#Components" data-toc-modified-id="Components-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Components</a></span></li><li><span><a href="#Networks" data-toc-modified-id="Networks-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Networks</a></span></li></ul></li><li><span><a href="#Representing-Networks-as-Graphs-and-Matrices" data-toc-modified-id="Representing-Networks-as-Graphs-and-Matrices-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Representing Networks as Graphs and Matrices</a></span><ul class="toc-item"><li><span><a href="#Graphs" data-toc-modified-id="Graphs-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Graphs</a></span></li><li><span><a href="#Matrices" data-toc-modified-id="Matrices-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Matrices</a></span></li></ul></li><li><span><a href="#A-Simple-Analysis" data-toc-modified-id="A-Simple-Analysis-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>A Simple Analysis</a></span></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Conclusion</a></span></li><li><span><a href="#Bibliography" data-toc-modified-id="Bibliography-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Bibliography</a></span></li><li><span><a href="#Further-reading-and-resources" data-toc-modified-id="Further-reading-and-resources-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Further reading and resources</a></span></li><li><span><a href="#Appendices" data-toc-modified-id="Appendices-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>Appendices</a></span></li></ul></div>

## Introduction

Vast swathes of our social interactions and personal behaviours are now conducted online and/or captured digitally. Thus, computational methods for collecting, cleaning and analysing data are an increasingly important component of a social scientist’s toolkit.

In this training series we cover some of the essential knowledge and skills needed to engage in **Social Network Analysis (SNA)**, a methodological approach that provides concepts, tools and techniques for uncovering and understanding social structures, relations and networks of assocation. We focus on the three major stages of SNA:
1. Understanding fundamental concepts and terms. [LINK]()
2. Collecting and cleaning social network data from various sources [Focus of this notebook].
3. Performing basic and intermediate analyses of social network data. 

By the end of these lessons you should be confident in your understanding of key SNA concepts and terms, proficient in the handling and cleaning of social network data, and able to apply a range of analytical techniques to derive substantive insight about social structures and relations. In addition, you will gain fluency in the use of the Python programming language for SNA and other computational social science tasks.

### Aims

This lesson - **Social Network Analysis: Getting and Marshalling Data** - has two aims:
1. Delineate the key steps in collecting, cleaning and repurposing data for social network analysis.
2. Cultivate your computational skills through coding examples. For example, there are a number of opportunities for you to execute the data collection code for your own purposes.

### Lesson details

* **Level**: Introductory, for individuals with no prior knowledge or experience of social network analysis.
* **Duration**: 45-60 minutes.
* **Pre-requisites**: You are encouraged to complete the following previous lessons:
    * Social Network Analysis: Basic Concepts [LINK]
    * [APIs as a Source of Data](https://github.com/UKDataServiceOpen/web-scraping/tree/master/webinars)
* **Audience**: Researchers and analysts from any disciplinary background interested in employing network analysis for social science research purposes.
* **Programming language**: Python.
* **Learning outcomes**:
	1. Understand the main steps in collecting, cleaning and reshaping data for social network analysis.
	2. Be able to use Python for working with social network data.

## Guide to using this resource

This learning resource was built using <a href="https://jupyter.org/" target=_blank>Jupyter Notebook</a>, an open-source software application that allows you to mix code, results and narrative in a single document. As <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>Barba et al. (2019)</a> espouse:
> In a world where every subject matter can have a data-supported treatment, where computational devices are omnipresent and pervasive, the union of natural language and computation creates compelling communication and learning opportunities.

If you are familiar with Jupyter notebooks then skip ahead to the main content (*What is Social Network Analysis?*). Otherwise, the following is a quick guide to navigating and interacting with the notebook.

### Interaction

**You only need to execute the code that is contained in sections which are marked by `In []`.**

To execute a cell, click or double-click the cell and press the `Run` button on the top toolbar (you can also use the keyboard shortcut Shift + Enter).

Try it for yourself:

In [4]:
print("Enter your name and press enter:")
name = input()
print("\r")
print("Hello {}, enjoy learning more about Python and SNA!".format(name))

Enter your name and press enter:
Diarmuid

Hello Diarmuid, enjoy learning more about Python and web-scraping!


### Learn more

Jupyter notebooks provide rich, flexible features for conducting and documenting your data analysis workflow. To learn more about additional notebook features, we recommend working through some of the <a href="https://github.com/darribas/gds19/blob/master/content/labs/lab_00.ipynb" target=_blank>materials</a> provided by Dani Arribas-Bel at the University of Liverpool. 

## Social Network Analysis: The Basics

### What is Social Network Analysis?

Social network analysis (SNA) is a methodological and conceptual toolbox for the measurement, systematic description, and analysis of relational structures (Schneider, 2008). [*Talk more about what a relational structure is*] It requires distinctive data structures, methods of analysis and data visualisation techniques (Caiani, 2014). 

Though various attempts have been made to establish a theoretical foundation for SNA...(Scott, 2017).

It has evolved from *network theory*, which itself seeks to generate measurable representations of patterns of relationships between entities in an abstract or actual space (Owen-Smith, 2017).

The *duality* of social networks: Networks are created by and influence the actions of entities. Related point: network structure is not always the result of deliberate action on the part of the entities i.e., their is no vision or strategy for the network, it simply arises through the many small decisions of the entities.

### What are the principles behind SNA?

Borgatti et al. (2002) establish five key principles of SNA:
1. *SNA focuses on relations (connections) between actors.* Actors and their relations are seen as interdependent rather than independent units.
2. *The relations between actors are the most meaningful focus of analysis.* Your data may allow you to perform other types of analyses - e.g., how does income vary across individuals and is it associated with variation in subjective wellbeing? - but the focus of SNA is on examining and understanding how actors are connected e.g., to what extent are individuals connected and are these patterns associated with income and subjective wellbeing? Put another way, the unit of analysis is the relation, not the individual.
3. *The structural and/or relational features of these actors constitute the analytically relevant characteristics of them.* To quote Freeman (2006, p. #):
>...these patterns [interactions between people] are important features of the lives of the individuals who display them.
4. *Relational ties between these actors are the channels for the flow of both material and non-material resources*. In essence, the connections are important and offer opportunities for sharing of valuable resources.
5. *The complete web of actors, their positions and their linkages - the network structure - provides opportunities for (and constraints upon) action.* This is one of the most important points to remember: networks are not only constructed from the relations between actors, they in turn influence the behaviour, opportunities, contraints and outcomes of said actors. We'll reflect on some of the mechanisms through which networks affect outcomes at the micro level in a later lesson [LINK]().

It is this focus on relational rather than attributional aspects of the units of analysis that makes SNA distinct as a methodology (Caiani, 2014).

### Why should you consider SNA for your research?

From an analytical perspective, SNA can be employed for a variety of valuable purposes (Caiani, 2014; Owen-Smith, 2017):
* The social phenomenon of interest is a network i.e., it is the unit of analysis in your research project. For example, a researcher may be interested in analysing the London Underground rail network (like in this [study](https://doi.org/10.1016/j.jtrangeo.2017.11.018)).
* The features and properties of a network can be important explanatory factors ('right-hand side' variables) for understanding other social phenomena. For example, in a review of the impact of social networks on health outcomes, Smith and Christakis (2008, p. 420) conclude that:
>  illness, disability, health behaviors, health care use, and death in one person are associated with similar outcomes in numerous others to whom that person is tied, and there can be a nonbiological transmission of illness.

In [None]:
import networkx as nx
import matplotlib.pyplot as plt

#estreet_el = estreet.stack().reset_index() # convert to edgelist
#estreet_el = estreet_el[estreet_el[0]==1] # keep pairs of charities that are connected
#estreet_el = estreet_el.drop(0, axis=1)
#estreet_el.rename(columns = {"level_0": "source", "level_1": "target"}, inplace = True)
#edgelist = estreet_el[pd.DataFrame(np.sort(estreet_el[['source','target']].values,1)).duplicated().values]
egraph = nx.from_pandas_adjacency(estreet)

plt.figure(3, figsize=(30,20)) 
nx.draw_shell(egraph, node_size=1200,font_size=25, with_labels=True)
plt.savefig("./images/estreet-band-sociogram-2020-08-25.png")

## Key Concepts

[*Need to use plenty of visual and tabular (matrix) examples throughout this section*]

[*Where do we talk about components etc?*]

A network has two key components (Owen-Smith, 2017):
1. The entities that are (or can be) connected in a network.
2. The connections that exist (or could exist) between the entities.

For example, a family tree is a network containing individuals (entities) that are related through some type of familial tie (connection).

### Entities

The entities that are connected in a network are known as **nodes**. These can be individuals, organisations, countries, animals, events etc. It's for this reason that we prefer the term node instead of *actor* or *agent*, which are also valid SNA terms for an entity. Other terms include *vertice*.

Two nodes that are or could be connected are called a **dyad**, while three nodes that are or could be connected are called a **triad**.

We'll cover this in more detail in a later lesson but for now it is worth posing the following questions about nodes:
* Who are they and how many of them are there in a network?
* What connections/ties exists between them?
* What positions do they occupy in the network? For example, do they broker connections between other nodes?

### Connections

[*Talk about direct and indirect ties*]

Connections or relations between entities are known as **ties**. There are a multitude of different types of ties: for example, a family tree contains ties between brothers and sisters, cousins, nieces and uncles, children and parents etc. Two individuals can also be related in multiple ways: for example, a pair of colleagues may also be good friends, part of the same sports club, and have attended the same university. 

Therefore we can think of a tie as possessing two dimensions:
1. Strength:
    * Binary: a tie exists between two entities
    * Valued: a tie can be assigned as value representing greater / lesser importance, strength, prominence etc.
2. Directionality:
    * Undirected: the tie
    * Directed: the tie flows from one entity to another (and is potentially reciprocated). An example would be the sharing of food between 
    
We can combine these two dimensions to produce the following types of ties (give examples):
Arcs
Edges

### Components

Also known as a subgraph, a **component** is a specific subgroup of nodes and ties contained within the overall network. Technically, a component is a set of nodes where every node is maximally connected: that is, each node is connected to every other node, either directly or indirectly. In addition, there are no connections with nodes outside the component (Scott, 2017).

It can be hard to make sense of a component, so let's visualise one below. [*Draw example of ten nodes, seven of which are contained within a node*]

### Networks

Using our understanding of nodes (entities) and ties (connections), we can identify different network types:
* One-mode network contains one type of node (e.g., organisations) - this is also known as a *unipartite* network.
* Two-mode network contains two types of node (e.g., organisations and their employees) - this is also known as a *bipartite* or *affiliate* network.

A two-mode network can be conceptualised and analysed as two one-mode networks (projections) - this process is known as *inducing*. However, as Owen-Smith (2017, p. 223) warns: 
> Care must be taken when inducing one-mode network projections from two-mode network data because not all affiliations provide equally compelling evidence of actual social relationships.

For example, if we were convert our two-mode network of organisations and employees to a one-mode network of employees, it is likely to be very sparsely connected because most employees only work for one company.

## Representing Networks as Graphs and Matrices

Networks can be represented using two main, complementary methods:
1. Graphs
2. Matrices

### Graphs

Many people are most familiar with the visual representation of networks, known as **graphs**. In SNA network graphs are also known as **sociograms**. A basic example is below: this graph represents an undirected network containing 5 nodes and 10 ties. An undirected, binary network is represented as a **simple graph**.

[*Show an example of a digraph etc*]

In a network graph:
* nodes are represented as circles
* ties are represented as unbroken lines

### Matrices

* Matrix X = g (rows) x h (columns)
* Transposed matrix X' = h x g
* One-mode network XX' = (g x h) * (h x g) = (g x g)
* One-mode network X'X = (h x g) * (g x h) = (h x h)

## A Simple Analysis

Let's bring some of our fundamental concepts to life with a simple research example. [360 Giving Covid-19 grants data]

There are three key steps in conducting social network analysis (Owen-Smith, 2017):
1. Decide on which nodes and ties to analyse i.e., who is connected and which relationships matter.
2. Collect data and structure it in a format suitable for analysis e.g., as a matrix or edge/list.
3. Summarise the network and its key features usign appropriate measures e.g., network size, density, cohesion etc.

## Conclusion

*SNA and its value and limitations and opportunities*.

Good luck on your data-driven travels!

## Bibliography

Barba, Lorena A. et al. (2019). *Teaching and Learning with Jupyter*. <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>https://jupyter4edu.github.io/jupyter-edu-book/</a>.

Brooker, P. (2020). *Programming with Python for Social Scientists*. London: SAGE Publications Ltd.

Lau, S., Gonzalez, J., & Nolan, D. (n.d.). *Principles and Techniques of Data Science*. https://www.textbook.ds100.org

Tagliaferri, L. (n.d.). *How to Code in Python 3*. https://assets.digitalocean.com/books/python/how-to-code-in-python.pdf

## Further reading and resources

We publish a list of useful books, papers, websites and other resources on our web-scraping Github repository: <a href="https://github.com/UKDataServiceOpen/social-network-analysis/tree/master/reading-list/" target=_blank>[Reading list]</a>

The help documentation for the `requests` and `BeautifulSoup` modules is refreshingly readable and useful:
* <a href="https://requests.readthedocs.io/en/master/" target=_blank>`requests`</a>
* <a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/" target=_blank>`BeautifulSoup`</a> 

You may also be interested in the following articles specifically relating to web-scraping:
* <a href="https://ico.org.uk/for-organisations/guide-to-data-protection" target=_blank>Guide to Data Protection</a>
* <a href="https://ocean.sagepub.com/blog/collecting-social-media-data-for-research" target=_blank>Collecting social media data for research</a>
* <a href="https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/" target=_blank>Web Scraping and Crawling Are Perfectly Legal, Right?</a>
* <a href="https://parissmith.co.uk/blog/web-crawling-screen-scraping-legal-position/" target=_blank>Web Crawling and Screen Scraping – the Legal Position</a>

## Appendices