In [None]:
#!pip install networkx
import networkx as nx
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
# plots.style.use('seaborn-colorblind')
# plots.style.available

# Bash review

 _Refer to May 31 noteboook/notes._  

- What is an HFS?  
- Why is it important to understand how data are stored?  
- Why is Bash useful? 

**Open a Bash terminal: **

1. Change your current working directory to your root directory.  
2. Now, change your current working directory to your home folder. What are the symbols for "root" and "home" ? 
3. Create a folder on your Desktop named "best_author".  
4. Create a file inside of "best_author" named `best_author.txt`   
5. Use `echo` and `>` to add the name of your one single favorite author inside `authors.txt`
6. What is [`vim`](https://en.wikipedia.org/wiki/Vim_(text_editor))?  

# (Social) Network analysis

[**Social network theory**](https://en.wikipedia.org/wiki/Social_network_analysis) emphasizes human social relationships and the many ways they are expressed. [**Agent-based models**](https://en.wikipedia.org/wiki/Agent-based_model) remind us that relationships between entities are complex, context specific, and might or might not be correlated.  

Living entities form highly variable [**social adaptive strategies**](https://www.tandfonline.com/doi/abs/10.1080/00438243.1993.9980243) relative to their [individual contexts](https://plato.stanford.edu/entries/capability-approach/#CorIde). We are interested in the relationships between nodes, or individuals, groups, institutions, and other entities.  

Today's notebook demonstrates how to construct a basic network graph using NetworkX.   

# Network graphs

Finding relationships in meaning is challenging. Although computers help us find meaning in things (such as large bodies of text), they also present their of sets of challenges: proper choice or tools and methods, efficient coding, and interpretation/presentation of results.  

Data visualization techniques called [**network graphs**](https://en.wikipedia.org/wiki/Graph_theory) conveniently help us illustrate relationships (edges) between the individual actors (nodes). When these graphs explicitly focus on only the individual actors, they can be referred to as a "social" network graph.  

[**Brughmans**](https://www.jstor.org/stable/43654602?seq=1#page_scan_tab_contents) provides a great introduction to networks that you will want to read.   

# Vocabulary 

Start by conceptualizing how text data can be stored in a graph. Data are usually stored in two separate files - a node list and an edge list.  

Text data are ideal for [**data mining**](https://en.wikipedia.org/wiki/Data_mining) because of the amount of text data available. Data mining approaches seek to find patterns in otherwise unknown data, such as a journal article or book, newspaper publishing company, or an entire library.  

Familiarize yourself with some of the [vocabulary beginning on page 43 of this text](https://www.politaktiv.org/documents/10157/29141/SocNet_TheoryApp.pdf)
1. Node  
2. Edge  
3. Degree  
4. Weight  
5. Betweenness  
6. Centrality  
7. Degree centrality  
8. Betweenness centrality
9. Closeness centrality  
10. Eigenvector centrality  
11. Equivalence relation  
12. Group Theory  
13. Centralization  
14. Clustering coefficient  
15. Ego network  
16. Personal network  

# Conjectures on World Literature

[Moretti](https://newleftreview.org/II/1/franco-moretti-conjectures-on-world-literature) took issue with close reading, over-theorization, and linguistic and media imperialisms and noted the problematic nature of the construction of epistemologies for comparison of things and the abstractions that comprise them.
    
Therefore, reading _more_ text from a greater distance (opposite of "close" reading) presents interesting potential for analyzing text because of the ability to illustrate these relationships in a network graph.  

Let's look at a network based on two sentences that use some of the same words in non-Euclidean space. Leaving in the stop words are okay for now. We will:  
1. construct a graph  
2. add nodes  
3. construct edges between these nodes  
4. define paths betwen certain nodes

In [None]:
text = "Bob threw the ball to Jill. Jill threw the ball back to Bob. "
print(text)

# n-grams

If a word is a "gram" (node), then a bi-gram represents 1) the word and 2) the one before or after it. A [tri-gram](https://en.wikipedia.org/wiki/Trigram) is three words in succession, etc. You will see learn more about n-grams in Week 4.  

Our two sentences in `text` both have an n-gram relationship between them - there is only one Jill and one Bob!  

Thus, the weight between Jill, Bob, and threw will be larger than any of the other grams!  

Degree = there are two bobs, two jills, two balls, and two threw. 

Nodes get a "count" attribute so that more words = larger nodes. 

# View the nx help

Like the other help files we have talked about, sort through the help files when you get a few minutes spare time - just open the help and read a few lines. Do this 5 times and you should be able to understand what the method does. 

Although cryptic and frustrating in the first year or two, after some hours of study you will begin to realize that the help files do in fact, tell you everything and exactly what you need to know.  

How have you learned to identify pieces of the code? (hint: functions versus arguments)  

How do you look up an argument's definition?  

Take a few minutes to read the help files - what are we dealing with here?  

**NOTE:** we will return to the example in the help file at the end of class to review your understanding.  

In [None]:
help(nx)

# Use variable definition to save an `nx` graph type

Similar to creating empty lists to be populated with some data such as in a list comprehension, we want to store our graph as a specific, empty type named **`graph1`**.  

**This uses the same three-piece recipe for variable assignment just like before:**

In [None]:
graph1 = nx.Graph()
print(type(graph1))

# Challenge

Discuss: what are nodes and edges? Conceptualize a network graph and sketch it out with paper and pencil.  

# Plotting nodes

Adding nodes are a fundamental first step to constructing a network graph. Let's add the content of a sentence to our empty **`nx`** graph type **`nx_example`** using the **`.add_nodes_from`** method:

In [None]:
graph1.add_nodes_from(text.split())

In [None]:
# view the nodes
graph1.nodes()

# Plot the nodes

We can then draw the graph - what do you notice?

In [None]:
help(nx.draw)

In [None]:
nx.draw(graph1, with_labels=True, font_weight='bold', pos=nx.shell_layout(graph1))

# Add more nodes

Now lets add a third actor who also throws the ball to Bob.  

In [None]:
text2 = "Ted threw the ball back to Bob."
graph1.add_nodes_from(text2.split())
graph1.nodes()

In [None]:
nx.draw(graph1, with_labels=True, font_weight='bold', pos=nx.shell_layout(graph1))

# Remove punctuation and tokenize the text

"Bob" and "Bob." are the same, but the period will cause NetworkX to think it is a different node. Replace the periods with blank spaces. 

In [None]:
# add strings text and text2
text3 = text + text2
print(text3)

# replace periods with blank spaces
text4 = text3.replace(".", " ")
print(text4)

# tokenize the text
tokens = text4.split()
tokens

# Plot the clean nodes

In [None]:
# delete the old graph1 

# if you mess up, start again by running this cell

del(graph1)
graph1

In [None]:
# redefine graph1 with clean nodes
graph1 = nx.Graph()

# add nodes from tokens
graph1.add_nodes_from(tokens)
graph1.nodes()

In [None]:
# plot
nx.draw(graph1, with_labels=True, font_weight='bold', pos=nx.shell_layout(graph1))

# the correct nodes are now collapsed - pretty neat!

# Drawing edges

Now, let's draw edges between the nodes to represent their relationships. 

In [None]:
graph1.add_edges_from([
    ("Jill", "Bob"),
    ("Jill", "ball"),
    ("Bob", "ball"),
    ("Ted", "ball")
])

In [None]:
# plot edges
nx.draw(graph1, with_labels=True, font_weight='bold', pos=nx.shell_layout(graph1))

# Challenge

What's up with Ted? 

In [None]:
print(text4)

# Node and edge lists

Storing network data in node and edge lists help us plug them into NetworkX methods with ease.  

In [None]:
node_list = [
    "Bob", "threw", "the", "ball", "to", "Jill", 
    "Jill", "threw", "the", "ball", "back", "to", "Bob",
    "Ted", "threw", "the", "ball", "back" "to", "Bob"
]

edge_list = [
    ("Jill", "Bob"),
    ("Jill", "ball"),
    ("Bob", "ball"),
    ("Ted", "ball")
]

# Plot again using `node_list` and `edge_list`. 

In [None]:
graph2 = nx.Graph(edge_list)
nx.draw(graph2, with_labels=True, font_weight='bold', pos=nx.shell_layout(graph2))

# Add a path

The shortest path between Jill and Bob is through the ball! 

In [None]:
help(nx.shortest_path)

In [None]:
nx.shortest_path(graph1, "Jill", "Bob", weight=1)
nx.draw(graph1, with_labels=True, font_weight='bold', pos=nx.spring_layout(graph1))

# Look at example from `help(nx.DiGraph)`

Just plug our information in!  

The ball is common to the three actors in our network, but Ted only knows Jill through the ball, whereas only Bob and Jill directly know each other. 

In [None]:
G = nx.DiGraph(edge_list)
G.add_nodes_from(node_list)
H = nx.path_graph(node_list)
G.add_nodes_from(H)
G.add_edges_from(edge_list)
nx.draw(G, with_labels = True, pos=nx.shell_layout(G))

In [None]:
print(G.number_of_edges())
print(G.number_of_nodes())

# Challenge

Try it again using a collection of newspaper articles from 2009:

In [None]:
with open("./human-rights-2009.txt", "r") as myfile:
    hr = myfile.read()

In [None]:
print(len(hr))
print(len(hr.split()))

In [None]:
hr.split()[:25]

In [None]:
# Remove punctuation
from string import punctuation
for char in punctuation:
    hr = hr.replace(char, "")

In [None]:
hr.split()[:25]

In [None]:
hr_tokens = hr.split()

In [None]:
from collections import Counter
freq = Counter(hr_tokens)
freq.most_common(10)

# Wow! That's a lot of stop words. 

How can you apply what you learned in this notebook as well as in notebook 1-4 to understand the contents of this document?

In [None]:
## YOUR CODE HERE