# Chapter 4
# Social Networks
Introduces the main concepts of social networks such as <u>properties of social networks</u>, <u>data collection in social networks</u>, <u>data sampling</u>, and <u>social network analysis</u>.

## Social Network
- <u>A social composition of actors and relationship defined on them</u>.
    - <u>Actors</u> can be persons, places, organizations, roles, etc. 
    - <u>Relationship</u> can be kinship, friendship, knowledge, acquaintance, correspondence, etc.
- <u>Network nodes</u> mostly represent persons or organizations. However, they can also refer to Web pages, journal articles, departments, neighborhoods, or even countries.
- Social networks can be grouped into two types: <u>one mode</u> and <u>multimode</u>.
    - One-mode networks
        - Include <u>one type of nodes</u> to represent actors (usually people), subgroups, or communities. 
        - The relations can represent individual evaluation (such as friendship), transfer of materials (such as borrowing or buying), transferring of non-materials (such as communication), interactions, formal roles, or kinship (such as marriage).
    - Two-mode networks
        - Includes <u>two different sets of nodes</u>.
        - People possess information and resource, investors buy stock in corporations or companies employ people, and many others are all examples of 2-mode or bimodal networks.
    

## Properties of a Social Network
### Scale-Free Networks
- Networks whose degree distribution obeys a <u>power law(멱법칙-冪法則)</u> $P(k)\sim K^{-\gamma}$ (i.e., <mark>a few well-connected nodes, a lot of poorly connected nodes</mark>).
- Degree distribution is calculated as follows:
$$P_k = \frac{1}{n}\# \{i\mid k_i=k\}$$
- Citation networks, biological networks, WWW graph, Internet graph, and social networks have <u>right-skewed or power-law degree distribution</u>.
- <u>Power-law distribution</u> means that <u>few nodes account for the vast majority of links</u>, while <u>most nodes have very few links</u>, which <u>emphasizes the idea that we have a core with a periphery of nodes with few connections</u>.
<img src="Fig4.1.png" width=500>


- In these networks, <u>a small number of well-connected nodes (hubs)</u> significantly <u>reduce the diameter</u> of the entire networks.

In [None]:
import networkx as nx
g = nx.scale_free_graph(40, alpha=0.41, beta=0.54, gamma=0.05, 
                        delta_in=0.2, delta_out=0, create_using=None, seed=None)

- `nx.scale_free_graph`'s parameters:
    - `n` (integer) – Number of nodes in the graph.
    - `alpha` (float) – Probability for adding a new node connected to an existing node
    chosen randomly according to the in-degree distribution.
    - `beta` (float) – Probability for adding an edge between two existing nodes. One existing node is chosen randomly according to the in-degree distribution and the
    other chosen randomly according to the out-degree distribution.
    - `gamma` (float) – Probability for adding a new node connected to an existing node
    chosen randomly according to the out-degree distribution.
    - `delta_in` (float) – Bias for choosing nodes from in-degree distribution.
    - `delta_out` (float) – Bias for choosing nodes from out-degree distribution.
    - `create_using` (graph, optional (default MultiDiGraph)) – Use this graph instance
    to start the process (default = 3-cycle).
    - `seed` (integer, optional) – Seed for random number generator.

In [None]:
print(nx.info(g))

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 8))
plt.axis('off')

layout = nx.spring_layout(g)

nx.draw_networkx(g, layout=layout,
                 with_labels=True, node_size=800, node_color='y')

### Small-World Networks
- In small-world networks,
    - Most nodes are <u>homogeneous</u>.
    - Can be reached by a small number of steps. 
    - So, each node has roughly <u>the same number of links</u>. 
    - The distance between any two nodes grows proportionally to the logarithm of the order of the network.
    - Have many local links and few long-range “shortcuts.” 
    - Have <u>high clustering coefficient</u>, short average path length, and over-abundance of hub nodes. 
    - Consist of dense communities or clusters that are loosely connected by boundary spanners (also called connectors). 
    - Even though these networks are uniform, they decay exponentially. ?????
    
    
- *[Watts-Strogatz(WS)](https://en.wikipedia.org/wiki/Watts–Strogatz_model)* small-world graph
    - 3 parameters:
        - `n (int)` which is the number of nodes,
        - `k (int)` where each node is connected to k nearest neighbors in a ring topology,
        - `p (float)` the probability of rewiring each edge
        - `seed` (optional `int`) for random number generator (`default=none`).
    - Generating is as follows:
        1. Create a ring of `n` nodes. 
        2. Each node in the ring is connected with its $k$ nearest neighbors ($k-1$ neighbors if $k$ is odd). 
        3. Shortcuts are created by replacing some edges as follows: for each edge $u$-$v$ in the underlying “$n$-ring with $k$ nearest neighbors” with probability $p$, replace it with a new edge $u$-$w$ with uniformly random choice of existing node $w$.

In [None]:
import networkx as nx

g = nx.watts_strogatz_graph(25, 5, 0.4)
# g = nx.watts_strogatz_graph(20, 4, 0.2)
print(nx.info(g))

In [None]:
plt.axis('off')
nx.draw_networkx(g, node_color='c', node_size=400)

In [None]:
# Modified by etc.
plt.axis('off')
pos = nx.circular_layout(g)
nx.draw_networkx(g, pos, node_color='c', node_size=400)

### Network Navigation
- Means that besides the small-world phenomenon, people are good at finding paths in networks.

### [Dunbar’s Number](https://en.wikipedia.org/wiki/Dunbar%27s_number)
- <u>Cognitive limit to the number</u> of individuals with whom one can maintain stable social relationships. 
- Relationships in which an actor knows who each person is and how each person relates to every other person. 
- Lie between 100 and 250, with a commonly used value of 150.

## Data Collection in Social Networks
- Collecting data using traditional methods such as <u>questionnaires, interviews, observations, and archival records</u>, while online data can be collected mainly <u>using APIs, Web crawlers, online surveys, and specialized applications</u>.

## [Six Degrees of Separation](https://en.wikipedia.org/wiki/Six_degrees_of_separation)
- In 1969, a psychologist called Stanley Milgram did an [experiment](https://en.wikipedia.org/wiki/Small-world_experiment).
- Sent out 300 letters to some randomly selected addresses in Nebraska and Kansas.
- Each of the 300 letters had instructions on how to deliver the letter back to Boston through a series of acquaintances.
- Only 64 letters, out of 300, arrived their destination with an average chain length of <u>5.5</u>.
- Ref
    - [분리의 여섯 단계 이론 @ Naver](http://terms.naver.com/entry.nhn?docId=1838337&cid=42044&categoryId=42044)

## Online Social Data Collection
- Beyond the traditional methods of collecting social data, online data can be collected by the following methods:
    1. APIs.
    2. Web crawlers.
    3. Online surveys.
    4. Deployed applications.
- Nevertheless, all network data collection methods <u>suffer from some drawbacks</u>, particularly about information accuracy, information validity, information reliability, and measurement error.

## Data Sampling
- The three most frequently sampling techniques used here are:
    1. *Node sampling*. A limited subset of nodes alongside their links are chosen to be the sampled data.
    2. *Link sampling*. A subset of links is selected to represent the sampled data.
    3. *Snowball sampling(chain sampling or respondent-driven sampling)*.
        - Starts with a set of <u>sampled nodes alongside their neighbors</u>. These nominated nodes are the <u>first-order zone</u> of the network.
        - Next, the nodes in this zone are sampled, and all their <u>connections are extracted to form the second-order zone</u>.
        - This process is done several times and leads eventually have a network of several zones.
        - Requires that <u>limit the depth</u> of search to a predetermined number to avoid issues like the explosion of data that the snowball sampling method can deliver and the human limit of perception of social networks.

In [None]:
import sys
import os
import networkx as nx
import urllib.request

g = nx.Graph()

def read_lj_friends(g, name):
    # fetch the friend-list from LiveJournal
    response = urllib.request.urlopen('http://www.livejournal.com/misc/fdata.bml?user='+name)
    for line_b in response.readlines():        
        line = str(line_b, 'utf-8')  # byte to utf-8   
        if line.startswith('#'):
            continue
        # the format is "< name" (incoming) or "> name" (outgoing)
        parts = line.split()
        #make sure that we do not have an empty line
        if len(parts) == 0:
            continue
        #add the edge to the network
        if parts[0] == '<':
          g.add_edge(parts[1], name)
        else:
          g.add_edge(name, parts[1])

In [None]:
def snowball_sampling(g, center, max_depth=1, current_depth=0, taboo_list=[]):
    # if we have reached the depth limit of the search, bomb out. print center, current_depth, max_depth, taboo_list   
    print(center, current_depth, max_depth, taboo_list)
    if current_depth == max_depth:
        print('out of depth')
        return taboo_list
    if center in taboo_list:
        print('taboo')
        return taboo_list #we've been here before
    else:
        taboo_list.append(center) # we shall never return

    read_lj_friends(g, center)
    
#     print("G : ", g.nodes())
    
    for node in g.neighbors(center):
        taboo_list = snowball_sampling(g, node, current_depth=current_depth+1, 
                                     max_depth=max_depth, taboo_list=taboo_list)
    
    return taboo_list

In [None]:
snowball_sampling(g, 'valerois')
# snowball_sampling(g, 'valerois', max_depth=2)

In [None]:
nx.write_pajek(g, "livejournadata.net")

In [None]:
!cat livejournadata.net | head

In [None]:
print(nx.info(g))

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.axis("off")
plt.figure(figsize=(20, 15))
nx.draw_networkx(g, arrows=True)

# plt.savefig("test.png")

## Social Network Analysis
- An approach in social research with four distinctive characteristics:
    - systematic relational data, 
    - structural intuition, 
    - graphical models, 
    - mathematical models.
- The ultimate goal of SNA : examine <u>the relations between individuals within a social network</u>, which can have the meaning of influence, affection, communication, advice, friendship, trust, dislike, conflict, or many other things, in addition to the overall network structure.
- Micro, meso, and macro 
- As a whole (sociocentric analysis), a part of the network (sub-network analysis), or the connections of one specific node (ego-network analysis).


- SNA suffers from some issues related to trust, privacy, and strategy in organizations and shortage of providing successful results for certain implementations such as in education evaluation.

## Social Network Analysis vs. Link Analysis
- Share the same idea of using nodes and edges to model networks, and both try to <u>find the key players</u> in a network.
- Link analysis allows for <u>different types of nodes and edges to coexist on the same network</u> which may produce invalid results. ❓❓❓

## Historical Development
- emerged as a result of cooperation between three different disciplines:
    1. sociometric analysts who came up with technical advancements in graph theory by working with small groups of data, 
    2. Harvard researchers who in 1930s discovered patterns of interpersonal relations and the formation of cliques, and 
    3. Manchester anthropologists who studied people connections and community structures in tribal and village societies.
- Four features characterize the current use of SNA:
    - The study of SNA is motivated by the structural composition of ties that link social actors.
    - Built on systematic empirical inputs.
    - Relies heavily on the use of graphical representations.
    - Relies on the use of computational and mathematical models.
- [International Network for Social Network Analysis (INSNA)](http://www.insna.org)

## Importance of Social Network Analysis
- Perform a critical job determining
    - How to solve issues in society, 
    - How organizations can be better run, 
    - How individuals can achieve their goals faster. 
- SNA was successfully applied in
    1. Health, in research related to the epidemiology and prevention of sexually transmitted diseases; 
    2. Cybercrime, investigating online hacker communities; 
    3. Business, studying the influence of SNA and sentiment analysis in predicting business trends; 
    4. Animal social networks, investigating the relationships and the social structures of animal gatherings and the direct and indirect interactions between groups;
    5. Communications, studying the different structural properties of short message service graphs.
    6. Information retrieval, information fusion communities, and investigation of terrorist groups.

## Social Network Analysis Modeling Tools
- SNA modeling tools
    - Pajek
    - UCINET
    - StOCENT
    - Gephi
    - Network Workbench
    - NodeXL
- Special-purposes SNA modeling tools
    - NEGOPY
    - InFlow
    - SocioMetric LinkAlyzer
- Programming utilitity
    - NetworkX
    - JUNG
    - iGraph
    - Prefuse
    - SNAP