# Week 9 Facebook Assignment

This assignment requires you to work with Facebook network data, data preprocessing and networkx. Note that this is real data from real people!

In [11]:
import matplotlib.pyplot as plt
import wget
import pygraphviz
from networkx.drawing.nx_agraph import graphviz_layout, write_dot
import networkx as nwx
import gzip as gz

# Function to extracts gz file
# https://stackoverflow.com/a/52333182
def gunzip(source_filepath, dest_filepath, block_size=65536):
    with gz.open(source_filepath, 'rb') as s_file, \
            open(dest_filepath, 'wb') as d_file:
        while True:
            block = s_file.read(block_size)
            if not block:
                break
            else:
                d_file.write(block)
        d_file.write(block)

PART 1: Preparing data
The dataset you will be working with is available here: https://snap.stanford.edu/data/egonets-Facebook.html

You're first job is to

1. Download the data
2. Unpack the data
3 .Import the data as an undirected graph in networkx

This should all be done from your notebook in Python. This is an important step for you to automate data preprocessing.
Note: this could take a while, so if you feel adventurous you can use the multiprocessing library to speed things up.

Hand-in:
* The code for downloading, unpacking and loading the dataset

In [12]:
#Importing data as an undirected graph (networkkx = nwx)
#URL = 'https://snap.stanford.edu/data/facebook_combined.txt.gz'
#wget.download(URL)
#gunzip('facebook_combined.txt.gz', './facebook_combined.txt')

graph = nwx.read_edgelist('./facebook_combined.txt')

PART 2: Analyse the data
Now, let's take a look at the network you imported.

By node degree we mean the number of edges to and from a node. This is different in an undirected network, where in-degree == out-degree, and a directed network where in-degree != out-degree.
By graph degree we mean the number of edges in the entire network.

Hand-in code that display:
* The number of nodes in the network
* The number of edges in the network
* The average degree in the network
* A visualisation of the network inside your notebook

In [13]:
# How to use NetworkX v2
# https://stackoverflow.com/a/16567881

# Trying to analyse the graph
def analyse_graph():
    #print(nwx.info(graph))
    degrees = dict(graph.degree)
    nwx.draw(graph, nodelist=degrees.keys(), pos=graphviz_layout(graph),
            node_size=[v * 1.2 for v in degrees.values()], width=.05, cmap=plt.cm.GnBu,
            with_labels=True, font_size=4, node_color=range(len(graph)))
    plt.show()  # output saved in visualization.png

# Due to neato errors, I haven't been able to generate a picture of the network.

# analyse_graph()

# RESULT

# Name
# Type: Graph
# Number of nodes: 4039 (Number of profiles)
# Number Of edges: 88234 (Total Connections)
# Average degree: 43.6910 (Average of the connections per profile)

PART 3: Find the most popular people
We're naturally interested in who has the most friends, so we want to extract top 10. That is, the 10 most connected people.
Hand-in:
* Code that extracts and reports the 10 people with the most connections in the network


In [14]:
#Sort graph nodes according to their degree
#https://stackoverflow.com/a/48382895

def find_most_popular_people():
    return sorted(graph.degree, key=lambda x: x[1], reverse=True)[:10]

print(find_most_popular_people())

[('107', 1045), ('1684', 792), ('1912', 755), ('3437', 547), ('0', 347), ('2543', 294), ('2347', 291), ('1888', 254), ('1800', 245), ('1663', 235)]
