# Political blogs as networks in the 2004 election

This notebook analyzes the interactions between American conservative and liberal blogs, blah blah 

First, load essential libraries:

In [2]:
import itertools
from urllib.request import urlopen

import matplotlib.pyplot as plot
import networkx as nx
from networkx.algorithms import traversal as tr
import networkx.generators.small as sm
import numpy as np
import pandas as pd

# Load the data

Our data was collected by researchers Lada Adamic and Natalie Glance and 2005. They crawled the top political blogs of the period, saving the HTML links between them, and storing them in a machine readable format. The dataset is accessible from: http://networkdata.ics.uci.edu/data/polblogs/.

In [8]:
G = nx.readwrite.read_gml("polblogs.gml")

print( len(G.nodes()), len(G.edges()) )

1490 19090


The network graph consists of 1490 political blogs, reprented as nodes, and about 19 thousand links between them (edges).

We can directly examine what these nodes look like by key. The `value` takes either 0 or 1, where 0 indicates a left or liberal blog, and 1 indicates a right or conservative blog:

In [17]:
G.nodes()['100monkeystyping.com']

{'value': 0, 'source': 'Blogarama'}

# Centrality

The first question we ask is: What blogs are most important in this network? We examine the _centrality_ of the network to answer this. Each node in the graph has a _degree_, i.e., the number of connections between it and other nodes, i.e., HTML links.

The `degree` function calculates the degree of every node in a graph, and a custom `sorted_tuple` function puts it in descending order:

In [56]:
def sorted_tuple(map):
    """ Sorts a list of (k, v) tuples by v (desc)."""
    ms = sorted(map, key = lambda kv: (-kv[1], kv[0]))
    return ms

deg = nx.degree(G)
deg_sorted = sorted_tuple(deg)
deg_sorted[:10]

[('blogsforbush.com', 468),
 ('dailykos.com', 384),
 ('instapundit.com', 363),
 ('atrios.blogspot.com', 351),
 ('talkingpointsmemo.com', 283),
 ('washingtonmonthly.com', 256),
 ('drudgereport.com', 245),
 ('powerlineblog.com', 236),
 ('michellemalkin.com', 229),
 ('hughhewitt.com', 225)]

A more sophisticated way to measure centrality is _closeness centrality_. The algorithm iterates through the network, and is an attempt to represent how central a node is to the network.

There is substantial overlap between this measure and the more primitive measure above.

In [66]:
#sorting function from SNA textbook, edited for dict object
def sorted_map(map):
    ms = sorted(map.items(), key = lambda kv: (-kv[1], kv[0]))
    return ms

# measure of node’s centrality
c = nx.closeness_centrality(G)
cs = sorted_map(c)
print(cs[:10])

[('dailykos.com', 0.3677362450836158), ('instapundit.com', 0.3514046453768085), ('talkingpointsmemo.com', 0.34605155249883257), ('atrios.blogspot.com', 0.34537268726587755), ('drudgereport.com', 0.3304621817621418), ('washingtonmonthly.com', 0.32968862796588216), ('powerlineblog.com', 0.3290723875397777), ('andrewsullivan.com', 0.323298591006189), ('nationalreview.com/thecorner', 0.32053477894179533), ('talkleft.com', 0.3138212608445295)]


We can also segment the network into its liberal and conservative blogs, and run this algorithm on each seperately:

In [67]:
# Define liberal subgraph
libs = [n for n,v in G.nodes(data=True) if v['value'] == 0]  
G_libs = G.subgraph(libs)

# Define conservative subgraph
cons = [n for n,v in G.nodes(data=True) if v['value'] == 1]  
G_cons = G.subgraph(cons)

Thus DailyKos, Atrios, and TPM are the most central liberal blogs, while Druge, Instapundit, and BlogsForBush are the most central conservative blogs:

In [71]:
# measure of node’s centrality: liberal
c_libs = nx.closeness_centrality(G_libs)
cs_libs = sorted_map(c_libs)
print(cs_libs[:10])

[('dailykos.com', 0.4370256879099083), ('atrios.blogspot.com', 0.4094114274101064), ('talkingpointsmemo.com', 0.3952937919821717), ('juancole.com', 0.34657153390064815), ('washingtonmonthly.com', 0.3410200447992648), ('talkleft.com', 0.33488934736467124), ('digbysblog.blogspot.com', 0.3300681275244268), ('pandagon.net', 0.3289751867048096), ('prospect.org/weblog', 0.3268108762659621), ('thismodernworld.com', 0.3211762059855145)]


In [72]:
# measure of node’s centrality: liberal
c_cons = nx.closeness_centrality(G_cons)
cs_cons = sorted_map(c_cons)
print(cs_cons[:10])

[('drudgereport.com', 0.4236693780099153), ('instapundit.com', 0.4230843680004424), ('blogsforbush.com', 0.41134561096921635), ('powerlineblog.com', 0.4025044722719141), ('littlegreenfootballs.com/weblog', 0.39186558754226003), ('michellemalkin.com', 0.3892931657422014), ('hughhewitt.com', 0.3749356728012351), ('captainsquartersblog.com/mt', 0.3698689745201373), ('nationalreview.com/thecorner', 0.3683374259713997), ('lashawnbarber.com', 0.36419033110375854)]


How does centrality vary between the two political persuasions? Histogram below

# Boundary Spanners: Bridges between liberals and conservatives

In [62]:
#bottlenecks or boundary spanners, not sure how to use this to find who connects dem communities to rep communities.
b = nx.betweenness_centrality(G)
bs = sorted_map(b)
print(bs[:10])

NetworkXNotImplemented: not implemented for multigraph type

In [64]:
#centrality based on incoming links. If you keep clicking links on a blog post, how likely are you to end up on x blog.
p = nx.pagerank(G)
ps = sorted_map(p)
print(ps[:10])

NetworkXNotImplemented: not implemented for multigraph type

In [19]:
#identify a set of decentralized spreaders with the best spreading ability. List of sites to target for maximal spread. Thought it was cool.
v = nx.voterank(G1)
print(v[:10])

['blogsforbush.com', 'dailykos.com', 'instapundit.com', 'atrios.blogspot.com', 'talkingpointsmemo.com', 'drudgereport.com', 'washingtonmonthly.com', 'powerlineblog.com', 'michellemalkin.com', 'gevkaffeegal.typepad.com/the_alliance']


NetworkXNotImplemented: not implemented for multigraph type