In [27]:
# Shovan Biswas
# Data620, Assignment 02

import networkx as nx
import matplotlib.pyplot as plt
from pyvis import network as net
%matplotlib inline

## Short description of the database
The file adjnoun.gml contains the network of common adjective and noun adjacencies for the novel "David Copperfield" by Charles Dickens, as described by M. Newman. Nodes represent the most commonly occurring adjectives and ##nouns in the book. Node values are 0 for adjectives and 1 for nouns.  Edges connect any pair of words that occur in adjacent position in the text of the book.  Please cite M. E. J. Newman, Finding community structure in networks using the eigenvectors of matrices, Preprint physics/0605087 (2006).

The website (below) provides a file, called adjnoun.gml, which has gml format. 

Link to site: http://networkdata.ics.uci.edu/data/adjnoun/adjnoun.gml

In [8]:
#load file
file = 'adjnoun.gml'
G = nx.read_gml(file)
#
print(nx.info(G))
print('\nGraph Diameter:', nx.diameter(G)) # largest number of vertices to be travelled between one certex to another.

Name: 
Type: Graph
Number of nodes: 112
Number of edges: 425
Average degree:   7.5893

Graph Diameter: 5


## "little" guy with maximum friends
In the following loop, we'll compute the number of neighbors connected to each word. We'll see that although little i.e. the word is "little", it has the highest number of neighbors in the novel David Copperfield. 49 neighbors.

In [16]:
for n in G.nodes():
    print (n + ': ' + str(len(list(G.neighbors(n)))))

agreeable: 3
man: 14
old: 33
person: 9
anything: 2
short: 7
arm: 6
round: 11
aunt: 1
first: 17
bad: 4
air: 7
boy: 10
beautiful: 6
black: 12
face: 12
letter: 3
little: 49
young: 14
best: 9
course: 5
friend: 10
love: 5
part: 8
room: 15
thing: 14
time: 11
way: 15
better: 13
heart: 5
mind: 6
place: 12
right: 10
state: 5
woman: 7
word: 6
door: 7
eye: 10
bright: 9
evening: 5
morning: 4
certain: 8
day: 7
other: 28
child: 7
happy: 6
common: 3
dark: 5
kind: 10
night: 5
dear: 15
good: 28
home: 7
mother: 6
pretty: 13
open: 3
early: 4
fire: 3
full: 2
great: 13
master: 5
moment: 2
work: 2
general: 5
fancy: 1
voice: 6
head: 7
hope: 4
long: 12
greater: 2
hand: 12
hard: 6
red: 7
life: 7
glad: 1
large: 10
new: 12
white: 3
late: 3
whole: 13
light: 8
manner: 6
bed: 1
house: 5
low: 6
money: 2
ready: 2
small: 10
strange: 6
thought: 7
lost: 1
alone: 1
nothing: 6
miserable: 2
natural: 3
half: 1
wrong: 3
name: 1
pleasant: 5
possible: 2
side: 3
perfect: 2
poor: 10
quiet: 9
same: 21
strong: 7
something: 6
true:

## Concept of centrality
The concept of "Centrality" gives an idea of the measure of the number of words connected to it. It's exact value is quotient of number of neighbors connected to a node, divided by the total number of nodes. In the following we'll compute the centrality.

In [21]:
number_of_neighbors_little = len(list(G.neighbors('little')))
num_of_nodes = len(G.nodes())
print("Total number of neighbors of 'little': ", number_of_neighbors_little)
print("Total number of nodes in the Graph is: ", num_of_nodes)
print("Degree Centrality of 'little' is: ", number_of_neighbors_little/(len(G.nodes()) - 1))

Total number of neighbors of 'little':  49
Total number of nodes in the Graph is:  112
Degree Centrality of 'little' is:  0.44144144144144143


## Computation of centrality by NetworkX function degree_centrality()
So at this point, we learn that in David Copperfield, the word 'little' has the highest number of connections and has a centrality to it. When we are dealing with words, Centrality may not be as important as in certain other types of databases, like a movie or a database of a neighborhood. If words in the screen play of a movie can be rendered in gml format, then one can get an idea of who the central character is, who has a relatively minor role and so forth.

For this reason, NetworkX comes with function degree_centrality(), and we'll use to verify our semi-manual computation above. 

In [22]:
print("Maximum degree of centrality: ", max(list(nx.degree_centrality(G).values())))

Maximum degree of centrality:  0.44144144144144143


This matches with what we semi-manually computed before.

Having learned about centrality, we'll produce a list of tuples of each word in David Copperfield and its centrality.

The word 'little' is in there, with its centrality.

In [25]:
list(nx.degree_centrality(G).items())

[('agreeable', 0.02702702702702703),
 ('man', 0.12612612612612611),
 ('old', 0.2972972972972973),
 ('person', 0.08108108108108109),
 ('anything', 0.018018018018018018),
 ('short', 0.06306306306306306),
 ('arm', 0.05405405405405406),
 ('round', 0.0990990990990991),
 ('aunt', 0.009009009009009009),
 ('first', 0.15315315315315314),
 ('bad', 0.036036036036036036),
 ('air', 0.06306306306306306),
 ('boy', 0.09009009009009009),
 ('beautiful', 0.05405405405405406),
 ('black', 0.10810810810810811),
 ('face', 0.10810810810810811),
 ('letter', 0.02702702702702703),
 ('little', 0.44144144144144143),
 ('young', 0.12612612612612611),
 ('best', 0.08108108108108109),
 ('course', 0.04504504504504504),
 ('friend', 0.09009009009009009),
 ('love', 0.04504504504504504),
 ('part', 0.07207207207207207),
 ('room', 0.13513513513513514),
 ('thing', 0.12612612612612611),
 ('time', 0.0990990990990991),
 ('way', 0.13513513513513514),
 ('better', 0.11711711711711711),
 ('heart', 0.04504504504504504),
 ('mind', 0.05

## Graph of the words
Now, we come the visually most interesting part--the graph of the network of words. 

In [28]:
n = net.Network(height = "800px", width = "100%", notebook = True)
nxg = nx.Graph(G)
n.from_nx(nxg)
n.show("basic.html")