In [1]:
import pandas as pd
import altair as alt
import numpy as np
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

In [2]:
link = 'https://snap.stanford.edu/data/facebook_combined.txt.gz'
edges = pd.read_csv(link, sep=' ', header=None, names=['source', 'target'], compression='gzip')
print(len(edges))
print(pd.concat([edges['source'], edges['target']]).nunique())

88234
4039


In [3]:
node_degrees = pd.concat([edges['source'], edges['target']]).value_counts().reset_index()
node_degrees.columns = ['node', 'degree']
node_degrees['degree_category'] = pd.cut(node_degrees['degree'], 
                                       bins=[0, 10, 50, 100, 500, 1000, 5000], 
                                       labels=['1-10', '11-50', '51-100', '101-500', '501-1000', '1000+'])
print(node_degrees.head())

   node  degree degree_category
0   107    1045           1000+
1  1684     792        501-1000
2  1912     755        501-1000
3  3437     547        501-1000
4     0     347         101-500


In [4]:
histogram = alt.Chart(node_degrees).mark_bar().encode(
    x=alt.X('degree:Q', bin=alt.Bin(maxbins=50), title='Node Degree'),
    y=alt.Y('count():Q', title='Number of Nodes'),
    tooltip=[alt.Tooltip('count():Q', title='Number of Nodes')]
).properties(
    width=400,
    height=300,
    title='Distribution of Node Degrees in Facebook Network'
)

histogram

This histogram shows the frequency distribution of node degress in Facebook's social network. This shows how many connections, or friends, that every person has in the network. For my design choices, I utilized bar encoding that has node degrees on the x-axis and a count of the nodes on the y-axis. All of the bars are colored blue to make it not too visually overwhelming. For the data transformations, I calculated the node degrees by counting the number of times a node appeared in the edge list.

As for the interactivity, I used hover tooltips, which lets users select a bar and see the number of respective nodes. This allows someone to see exactly how many nodes are within a certain degree range, instead of having to estimate based on the bar's height.

In [5]:
scatter = alt.Chart(node_degrees).mark_circle(opacity=0.6, size=60).encode(
    x=alt.X('node:O', title='Node ID', axis=alt.Axis(labels=False)),
    y=alt.Y('degree:Q', title='Degree'),
    color=alt.Color('degree_category:N', 
                   scale=alt.Scale(scheme='viridis'),
                   title='Degree Category')
).properties(
    width=400,
    height=300,
    title='Node Degrees in Facebook Social Network'
)

scatter

This scatter plot shows each node's degree category. For my design choices, I used positional encoding with node IDs on the x-axis and the degree values on the y-axis. I also used the Viridis color scheme since it has a wide variety of similar colors that are different in shade, which helps to show similarity but also separation amongst the degree ranges. For the data transformations, I put the degrees into ranges, or binned them, which allowed me to use color coding and effectively group nodes.

In [6]:
histogram.save('histogram.html')
scatter.save('scatter.html')