# WARNING: I didn't receive any trainings on Network Analysis

# NetworkX in Python 

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. https://networkx.org/documentation/stable/index.html

In [1]:
!pip install networkx



- The network data used in this tutorial is derived from the Oxford Dictionary of National Biography and from the ongoing work of the Six Degrees of Francis Bacon project, which is reconstructing the social networks of early modern Britain (1500-1700).


- Introduction
  - Lesson Goals
  - Prerequisites
  
  
What might you learn from network data?


To use the NetworkX package for working with network data in Python; and
To analyze humanities network data to find:
Network structure and path lengths,
Important or central nodes, and
Communities and subgroups


This is a tutorial for exploring network statistics and metrics. We will therefore focus on ways to analyze, and draw conclusions from, networks without visualizing them. You’ll likely want a combination of visualization and network metrics in your own project, and so we recommend this article as a companion to



Our example: the Society of Friends


Data Prep and NetworkX Installation


Getting Started


Reading files, importing data


Basics of NetworkX: Creating the Graph

Adding Attributes

Metrics available in NetworkX

The Shape of the Network

Centrality

Advanced NetworkX: Community detection

# 1) Read  raw data

In [None]:
import csv
from operator import itemgetter
import networkx as nx
from networkx.algorithms import community 
#for community detection, you need to be imported separately from networkx.algorithms.

- We first use the CSV module to load the nodes and edges. 
- Two datasets: 
    - a list of nodes and 
    - a list of edge pairs (as tuples of nodes).

In [3]:

with open('quakers_nodelist.csv', 'r') as nodecsv: # Open the file
    nodereader = csv.reader(nodecsv) # Read the csv
    # Retrieve the data (using Python list comprhension and list slicing to remove the header row, see footnote 3)
    nodes = [n for n in nodereader][1:]

node_names = [n[0] for n in nodes] # Get a list of only the node names

with open('quakers_edgelist.csv', 'r') as edgecsv: # Open the file
    edgereader = csv.reader(edgecsv) # Read the csv
    edges = [tuple(e) for e in edgereader][1:] # Retrieve the data

In [4]:
print(len(node_names))

119


In [5]:
print(len(edges))

174


# 2) Create Graph

In [6]:
G = nx.Graph()

- create a “graph object,” a special NetworkX data type, using NetworkX

- This will create a new Graph object, G, with nothing in it. Now you can add your lists of nodes and edges like so:

In [7]:
G.add_nodes_from(node_names)
G.add_edges_from(edges)

- This is one of several ways to add data to a network object. 
- You can check out the [NetworkX documentation](https://networkx.org/documentation/stable/tutorial.html#adding-attributes-to-graphs-nodes-and-edges) for information about adding weighted edges, or adding nodes and edges one-at-a-time.

- check basic information about the newly-created network using the info function:

In [8]:
print(nx.info(G))

Name: 
Type: Graph
Number of nodes: 119
Number of edges: 174
Average degree:   2.9244


 - Five items as output: 
     - the name of your graph (which will be blank in this case), 
     - its type, 
     - the number of nodes, 
     - the number of edges, 
     - the average degree in the network. 

# 3) Add Attributes

- For NetworkX, a Graph object is one big thing (your network) made up of two kinds of smaller things (your nodes and your edges). 
- So far we’ve already import the nodes and edges (as pairs of nodes),
- NetworkX also allows us to add attributes to both nodes and edges, providing more information about each of them.


- This list contains all of the rows from quakers_nodelist.csv, including columns for name, historical significance, gender, birth year, death year, and SDFB ID. You’ll want to loop through this list and add this information to our graph. There are a couple ways to do this, but NetworkX provides two convenient functions for adding attributes to all of a Graph’s nodes or edges at once: `nx.set_node_attributes()` and `nx.set_edge_attributes()`. To use these functions, you’ll need your attribute data to be in the form of a Python dictionary, in which node names are the keys and the attributes you want to add are the values.5 You’ll want to create a dictionary for each one of your attributes, and then add them using the functions above. The first thing you must do is create five empty dictionaries, using curly braces:

In [9]:
hist_sig_dict = {}
gender_dict = {}
birth_dict = {}
death_dict = {}
id_dict = {}

- Now we can loop through our nodes list and add the appropriate items to each dictionary. 
- We do this by knowing in advance the position, or index, of each attribute. Because  `quaker_nodelist.csv` file is well-organized, we know that the person’s name will always be the first item in the list: index 0, since you always start counting with 0 in Python. 
- The person’s historical significance will be index 1, their gender will be index 2, and so on. Therefore we can construct our dictionaries like so:6

In [10]:
for node in nodes: # Loop through the list, one row at a time
    hist_sig_dict[node[0]] = node[1]
    gender_dict[node[0]] = node[2]
    birth_dict[node[0]] = node[3]
    death_dict[node[0]] = node[4]
    id_dict[node[0]] = node[5]

- Now we have a set of dictionaries for attributes that can be added to nodes in the Graph object.
- The `set_node_attributes` function takes three variables: 
    - the Graph to which you’re adding the attribute, 
    - the dictionary of id-attribute pairs,
    - the name of the new attribute. 
- The code for adding your six attributes looks like this:

In [11]:
nx.set_node_attributes(G, hist_sig_dict, 'historical_significance')
nx.set_node_attributes(G, gender_dict, 'gender')
nx.set_node_attributes(G, birth_dict, 'birth_year')
nx.set_node_attributes(G, death_dict, 'death_year')
nx.set_node_attributes(G, id_dict, 'sdfb_id')

- Now all of the nodes have these six attributes, 
- Below, we show an example of printing out all the birth years of the nodes by looping through them and accessing the birth_year attribute:

In [12]:
for n in G.nodes(): # Loop through every node, in our data "n" will be the name of the person
    print(n, G.nodes[n]['birth_year']) # Access every node by its name, and then by the attribute "birth_year"

Joseph Wyeth 1663
Alexander Skene of Newtyle 1621
James Logan 1674
Dorcas Erbery 1656
Lilias Skene 1626
William Mucklow 1630
Thomas Salthouse 1630
William Dewsbury 1621
John Audland 1630
Richard Claridge 1649
William Bradford 1663
Fettiplace Bellers 1687
John Bellers 1654
Isabel Yeamans 1637
George Fox the younger 1551
George Fox 1624
John Stubbs 1618
Anne Camm 1627
John Camm 1605
Thomas Camm 1640
Katharine Evans 1618
Lydia Lancaster 1683
Samuel Clarridge 1631
Thomas Lower 1633
Gervase Benson 1569
Stephen Crisp 1628
James Claypoole 1634
Thomas Holme 1626
John Freame 1665
John Swinton 1620
William Mead 1627
Henry Pickworth 1673
John Crook 1616
Gilbert Latey 1626
Ellis Hookes 1635
Joseph Besse 1683
James Nayler 1618
Elizabeth Hooten 1562
George Whitehead 1637
John Whitehead 1630
William Crouch 1628
Benjamin Furly 1636
Silvanus Bevan 1691
Robert Rich 1607
John Whiting 1656
Christopher Taylor 1614
Thomas Lawson 1630
Richard Farnworth 1630
William Coddington 1601
Thomas Taylor 1617
Richard 

# 4) Metrics available in NetworkX

## The Shape of the Network

- You can calculate network density by running nx.density(G). 

In [13]:
density = nx.density(G)
print("Network density:", density)

Network density: 0.02478279447372169


In [14]:
fell_whitehead_path = nx.shortest_path(G, source="Margaret Fell", target="George Whitehead")

print("Shortest path between Fell and Whitehead:", fell_whitehead_path)

Shortest path between Fell and Whitehead: ['Margaret Fell', 'George Fox', 'George Whitehead']


In [15]:
print("Length of that path:", len(fell_whitehead_path)-1)

Length of that path: 2


In [16]:
# If your Graph has more than one component, this will return False:
print(nx.is_connected(G))

# Next, use nx.connected_components to get the list of components,
# then use the max() command to find the largest one:
components = nx.connected_components(G)
largest_component = max(components, key=len)

# Create a "subgraph" of just the largest component
# Then calculate the diameter of the subgraph, just like you did with density.
#

subgraph = G.subgraph(largest_component)
diameter = nx.diameter(subgraph)
print("Network diameter of largest component:", diameter)

False
Network diameter of largest component: 8


In [17]:
triadic_closure = nx.transitivity(G)
print("Triadic closure:", triadic_closure)

Triadic closure: 0.16937799043062202


In [18]:
degree_dict = dict(G.degree(G.nodes()))
nx.set_node_attributes(G, degree_dict, 'degree')

In [19]:
print(G.nodes['William Penn'])

{'historical_significance': 'Quaker leader and founder of Pennsylvania', 'gender': 'male', 'birth_year': '1644', 'death_year': '1718', 'sdfb_id': '10009531', 'degree': 18}


In [20]:
sorted_degree = sorted(degree_dict.items(), key=itemgetter(1), reverse=True)

In [21]:
print("Top 20 nodes by degree:")
for d in sorted_degree[:20]:
    print(d)

Top 20 nodes by degree:
('George Fox', 22)
('William Penn', 18)
('James Nayler', 16)
('George Whitehead', 13)
('Margaret Fell', 13)
('Benjamin Furly', 10)
('Edward Burrough', 9)
('George Keith', 8)
('Thomas Ellwood', 8)
('Francis Howgill', 7)
('John Perrot', 7)
('John Audland', 6)
('Richard Farnworth', 6)
('Alexander Parker', 6)
('John Story', 6)
('John Stubbs', 5)
('Thomas Curtis', 5)
('John Wilkinson', 5)
('William Caton', 5)
('Anthony Pearson', 5)
