## SOCIAL NETWORK ANALYSIS


Networks form the building blocks of any group of people, organisations, entities; basically anything that can be defined as having a relation. Often the objective of social network analysis is to obtain a better understanding of the behaviour and influence of the systems they represent and the systems they interact with. Ultimately, it is about finding the link between network structure and network function and vice versa. For instance, we can study social networks to better understand the nature of social interactions and their implications for human experience, commerce, the spread of disease, and the structure of society. In many social science studies, the value of social networks is expressed in how the degree to which a group of people are socially connected with each other reinforces beliefs that are commonly held, and how this could help transmit novel information that could change the dynamics of the interaction and potentially alter outcomes. Therefore, programmes that rely on the relationships, interactions, and cooperation of people to succeed can take advantage of the network structure of targeted communities. In other words, knowing which people, positions, and groups to target in a community can be critical to achieve higher rates of efficiency in interventions.  

Notable studies on the role of social networks include:
- labour market participation in referral networks and migrant communities;
- drug use and school attendance in student networks;
- policy views of electoral candidates and coalitions in discourse networks;
- advocacy of climate change in social media networks;
- spread of infectious diseases and financial crises in contagion networks;
- adoption and support of new technologies in global networks and local farming networks.


#### Definition
A network (graph) describes a collection of nodes (vertices) and the links (edges) between them. A node can represent anything from individuals or firms or countries, or even collections of such entities. A link between two nodes signifies a direct relation between them like a link between two friends. Some terminology:
- A *directed network* is where there is a clear distinction between source (the sender of a tie) and target (the receiver of the tie), relationships between two nodes are recorded as either asymmetric, mutual,  or null. Asymmetric is when a nomination is unidirectional (i.e. when only one person claims to know/be friends/to have spoken with the other person); a mutual (reciprocal) relation is when both nominate each other; and a null relation is when there is no connection or link that exists. 
- An *undirected network* makes no distinction between sender (the person who contacted someone else for support or advice) and target (the person who had been contacted for support or advice). Simply, once someone says they know someone, we assume the reverse, that the other person knows them.
- A *connected component* is a complete network or sub-network where every node is atleast connected to one other node, i.e. there must exist a path such that any two nodes can be connected starting from point *A* to point *Z*. If there exists a node *K* in the network where it is not possible to connect a path between any other node in the network, then we say the network is disconnected. 
- A *giant component* is the subnetwork found by extracting the component with the largest number of connected nodes in the network.

#### Methods

When exploring the question of how social networks affect the spread of ideas, norms, and behaviour, the first key aspect to consider is a network's structure or topology. Who a person is connected to is who they come into contact with, and therefore who they "allow" to have an impact on their behaviour. The extent to which this affects the individual is predicated on how they "choose" to come into contact with this person: how close they are, how frequent the contact is, is it through a close knit group of mutual friends or through separate interactions? Given the many factors that determine the diffusion process, a common approach is to examine the network for "small world" features. Small worlds characterise a fascinating aspect of large social networks that find that even if nodes in a network are not directly connected, the likelihood that their neighbours are each others neighbours is high. This means any given node in a network can be reached in a small number of steps, which translates to a short average path length and a high clustering coefficient. Naturally, this creates an easy path for diffusion of information, where higher levels of cooperation and  efficiency are more likely. 

To start, the **centrality** and **connectivity and cohesion** properties are described, then an evaluation of **small world** properties are presented.




#### 1. CENTRALITY

##### 1.1 NODE CENTRALITY


Node centrality is a general measure of a nodes "importance" within a network and is often defined in terms of:

>**Degree**: Number of nodes a node is connected to (both sending and receiving ties).

>**Indegree**: Number of nodes nominating a node (receiving a tie).

>**Outdegree**: Number of nodes a node nominates (sending a tie).

>**Closeness**: Proximity to each node in the network. 

>**Betweenness**: The count of how many times a node interupts or lies on the path of shortest distance between two other nodes. (Frequency at which this node connects every other node in the network using the shortest path. )

>**Eccentricity**: Maximum shortest distance from (or to) a node, to (or from) all other nodes in the graph.

>**PageRank**: A link analysis measure that calculates the importance of a node based on the importance of the nodes connected to it.

>**Authority**: A link analysis measure that retrieves the most relevant nodes based on the incoming ties. The more a node is linked to nodes that are recognised as "hubs", i.e. primary sources of information, the higher the authority score.

>**Hubs**: A link analysis measure that retrieves the most relevant nodes based on the outgoing ties. The more a node is linked to nodes that are recognised as "authorities", i.e. primary receivers of information, the higher the hub score.





##### 1.2. NETWORK CENTRALITY
 


Network centrality is a measure of a node's ties relative to the ties present in the network and the distribution of ties throughout the network. For example, we can determine the extent to which the "importance" of a node in a network or the "power" of a node in a network is concentrated in a few nodes by examining whether the network's degree distribution is normally distributed or skewed. 

>**Degree distribution**: Frequency distribution of degree values of nodes. A skewed degree distribution, where there are a few high degree (popular) nodes and many low degree (periphery) nodes, is evidence of preferential attachment (i.e. the more connected a node is, the more likely it is to make new connections), and therefore concentrated power.

>**Density**: Volume of connections in a network. It is the number of ties relative to the number of all possible ties. A density is 0 for a graph without edges and 1 for a complete graph.

>**Average Path Length**: Average shortest (geodesic) distance between each starting and ending node (i.e. the average number of steps one has to take across the network for connecting two separate individuals).

>**Diameter**: Longest of all shortest (geodesic) paths between each starting and ending node (i.e. after calculating all shortest paths, the length that is the furtherest apart from one node to the other is the diameter). This measure is used to guage the overall size of the network or the longest distance it would take to reach the node furtherest away in the network (i.e. the distance from one end of the network to another.).



#### 2. CONNECTIVITY AND COHESION 

Connectivity and cohesion properties refer to the direction, frequency and consistency of relations between nodes and the nodes in their neighbourhood [a personal or ego network that only includes nodes a node is connected to]. This includes the study of dyads (relations between 2-nodes), triads (relations between 3-nodes), clusters and cliques (subset of densely connected ties or subgraphs), and structural holes (the absence of vital ties or redundancy of close-knit ties). These measures can be examined through the following properties:



##### 2.1. TRIADs


Connectivity and cohesion properties refer to the direction, frequency and consistency of relations between nodes and the nodes in their neighbourhood [a personal or ego network that only includes nodes a node is connected to]. This includes the study of dyads (relations between 2-nodes), triads (relations between 3-nodes), clusters and cliques (subset of densely connected ties or subgraphs), and structural holes (the hole (absence) of vital ties vs redundancy (presence) of close-knit ties). These measures can be examined through the following properties:

>**Reciprocity**: Ratio of nodes in a nodes neighbourhood that a node is connected to that reciprocate ties.

>**Transitivity**: Fraction of all triangles in a nodes neighbourhood where a node is connected to a node that is connected to another node that it is also connected to (e.g a friend of a friend is a friend). 

>**Hierarchy**: Number of triads in a nodes neighbourhood where there is a consensus on the directionality of ties (e.g. many subordinates nominating one boss or followers nominating one leader). A high value indicates a high dependency rate where power or information is concentrated in a few nodes.

>**Constraint**: The extent to which a node has the same ties as other nodes in their neighbourhood. While having many close redundant ties can offer support for those most vulnerable, it can also constrain those that seek to grow outside of their environment and achieve a competitive advantage.


##### 2.2. CLUSTERING

The structural cohesion of a network can be defined as the minimum number of actors who, if removed from the network, would disconnect it. A key measure for overall network cohesion is estimated through the *Clustering Coefficient* that indicates the number of routes and paths available in connecting the network. A closely related measure is the articulation point(s) of a graph which represent vulnerabilities in a connected network. These are single points that are vital to the function and resilience of the network. Their removal would lead to a failure or fragmentation of the network.

>**Clustering Coefficient**: Extent to which links in a network follow a transitive property (i.e. likelihood of node *i* being connected to node *k* given that *i* is connected to *j* and *j* is connected to *k*). This captures how tightly knit or cohesive the network is.

>**Articulation Points**: The point (or cut vertex) is the vertex that if removed would disconnect a connected graph.  

Another important structual feature of the network is the extent to which homophilic ties are produced. This occurs when nodes exhibiting similar attributes have a higher than expected likelihood of forming bonds with each other than with nodes that are dissimilar. 

>**Homophily**: Tendency for nodes with similar attributes to be more likely connected with each other then with nodes of dissimilar attributes.

>**Assortitivity**: The similarity of connections in the graph with respect to the node degree. In other words, how similar they are in the number of connections they make.



#### 3. SMALL WORLDS


One way to determine the extent to which ties strengthen collective interest is by assessing the network's interaction structure. Many studies attempt to evaluate the performance of different network structures based on their ability to converge to a cooperative outcome. By comparing different structures, researchers can, for instance, examine whether networks of low fragmentation (high clustering) and short average path length (short distance between individuals) increase efficiency in diffusion of cooperative strategies, similar to those found in small world networks.

Descriptive measures are very rarely intutive and should therefore be analysed in conjunction with other features. As such, a good heuristic to follow when evaluating a network is to assess how much of the current network's features are simulated by a random process, and how closely they resemble a small world. Introduced by Watts and Strogatz (1998), a small world embodies the idea that that unlike random networks of the same size, large networks tend to have a small diameter or small **Average Path Length** and a high **Clustering Coefficient**.

Two small-world coefficients can be used to evaluate the network:

- **Sigma** = C/Cr / L/Lr. 
> Where C and L are respectively the average clustering coefficient and average shortest path length of G. Cr and Lr are respectively the average clustering coefficient and average shortest path length of an equivalent random graph.
> A graph is commonly classified as small-world if sigma>1

- **Omega** = Lr/L - C/Cl.
> Where C and L are respectively the average clustering coefficient and average shortest path length of G. Lr is the average shortest path length of an equivalent random graph and Cl is the average clustering coefficient of an equivalent lattice graph.
> The small-world coefficient (omega) measures how much G is like a lattice or a random graph. Negative values mean G is similar to a lattice whereas positive values mean G is a random graph. Values close to 0 mean that G has small-world characteristics.


#### 4. REFERENCES

Goyal, Sanjeev., Connections: An Introduction to the Economics of Networks, Princeton, NJ: Princeton University
Press, 2007.

Jackson, Matthew O., Social and Economic Networks, Princeton, NJ: Princeton University Press, 2008.

Wasserman, Stanley and Katherine Faust, Social Network Analysis, New York, NY: Cambridge University
Press, 2007.

Watts, Duncan J., Small Worlds: the dynamics of networks between order and randomness, Princeton, NJ: Princeton University Press, 2005.

Copeland, Molly., "Whole Network Descriptive Statistics," Social Networks and Health Workshop, 2019. Available at: https://sites.duke.edu/dnac/07-whole-network-descriptive-statistics/

In [1]:
#!jupyter-nbconvert --no-input --no-prompt --to pdf 3_LRES_Measurement.ipynb