# Six Degrees of Francis Bacon
[Six Degrees of Francis Bacon](http://sixdegreesoffrancisbacon.com) is a collaborativley produced historical network which traces the social relationships of the early modern English philosopher Francis Bacon. This notebook downloads the latest relationships data from the website and creates a GML file that can be used with PolyGraphs.

The output file generated by this notebook should be placed in: `~/polygraphs-cache/data/francisbacon/francisbacon.gml.gz`

In [39]:
from collections import defaultdict
import datetime
import numpy as np
import pandas as pd
import networkx as nx

## Prepare download URLs

In [40]:
dt = datetime.datetime.now()
base = "http://sixdegreesoffrancisbacon.com/data/"
date = "_{y}_{m:02d}_{d:02d}.csv".format(y=dt.year, m=dt.month, d=dt.day)
people_url = base + "SDFB_people" + date
relations_url = base + "SDFB_relationships" + date

## Normalise nodes and create graph

In [41]:
# Lookup table for normalising node identifier to 0 to N
tbl = defaultdict(lambda: len(tbl))

In [42]:
# Read the relationships CSV
df = pd.read_csv(relations_url)

# Normalise node identifiers (from 0 to N) using default dict
src = [tbl[node] for node in df['person1_index']]
dst = [tbl[node] for node in df['person2_index']]

In [43]:
# Create NetworkX Graph
G = nx.Graph()
G.add_edges_from(zip(src, dst))

## Load information about nodes

In [44]:
# Load the people csv for names
names_df = pd.read_csv(people_url)

# Get display_name and id to turn it into a dictionary
names_dict = pd.Series(names_df.display_name.values,index=names_df.id).to_dict()

In [45]:
# Swap keys and values in tbl defaultdict
tbl_swapped = dict((v,k) for k,v in tbl.items())

# Add old ids as a node attribute
nx.set_node_attributes(G, tbl_swapped, "original_id")

# Create a dictionary of node names
names = { k: names_dict.get(v, 'None') for k, v in tbl_swapped.items() }

# Set the names dictionary as node attributes
nx.set_node_attributes(G, names, "name")

## Largest single component only
There are some disconnected parts of the network we want to remove.

In [46]:
# Generate connected components and select the largest
largest_component = max(nx.connected_components(G), key=len)

# Create a subgraph of G consisting only of this component
G2 = G.subgraph(largest_component)

## Export GML File

In [47]:
nx.write_gml(G2, "francisbacon-{0}.gml.gz".format(datetime.date.today()))