## Week Three - Assignment Graph Visualization

Brandon Chung 2/20/2026

## Instructions

This week's assignment is to:

1. Load a graph database of your choosing from a text file or other source.  If you take a large network dataset from the web (such as from Stanford Large Network Dataset Collection), please feel free at this point to load just a small subset of the nodes and edges.

2. Create basic analysis on the graph, including the graph’s diameter, and at least one other metric of your choosing.  You may either code the functions by hand (to build your intuition and insight), or use functions in an existing package. 

3. Use a visualization tool of your choice (Neo4j, Gephi, etc.) to display information.

4. Please record a short video (~ 5 minutes), and submit a link to the video in advance of our meet-up.

In [1]:
# Importing network dataset from SNAP repository found here https://snap.stanford.edu/data/ego-Facebook.html

import pandas as pd
import networkx as nx

# Load the dataset via complete edge list

data = pd.read_csv('facebook_combined.txt', sep=' ', header=None, names=['source', 'target'])

data.head()


Unnamed: 0,source,target
0,0,1
1,0,2
2,0,3
3,0,4
4,0,5


I found a facebook friends network dataset on the SNAP repository and loaded the complete edge list into the Juypiter notebook, specifying the header names as source and target. I then used the head function to ensure the data was inspect and ensure the data was loaded correctly. 

In [2]:
# Subset first 1000 edges
subset_data = data.head(1000)

# Create a graph from the edge list
G = nx.from_pandas_edgelist(subset_data, 'source', 'target')

# Print basic information about the graph
print(f"Number of nodes: {G.number_of_nodes()}")
print(f"Number of edges: {G.number_of_edges()}")

Number of nodes: 351
Number of edges: 1000


I subsetted the first 1,000 edges to work with and then created a NetworkX graph using the from pandas edgelist function and printed the number of nodes and edges for a basic understanding of the graph.

In [None]:
# Computing diameter of the graph
# Extracting largest connected component
largest_cc = max(nx.connected_components(G), key=len)
G_largest = G.subgraph(largest_cc)

print(f"Number of nodes in largest connected component: {G_largest.number_of_nodes()}")
print(f"Number of edges in largest connected component: {G_largest.number_of_edges()}")

# Compute diameter of the largest connected component
diameter = nx.diameter(G_largest)
print(f"Diameter of the largest connected component: {diameter}")

Number of nodes in largest connected component: 351
Number of edges in largest connected component: 1000
Diameter of the largest connected component: 3


In order to calculate the diameter of our subsetted network I needed to extract the largest connected component, which I did with NetworkX's built in functions. What I found when calculating the diameter of the largest component is that the diameter was 3, meaning the longest shortest path between any two nodes in the largest connected component is 3 edges. This tight diameter is due to the fact that the first 1,000 rows taken from the dataset is not random, and probably represents overlapping friend groups.

In [4]:
# Compute average shortest path length of the largest connected component
avg_shortest_path = nx.average_shortest_path_length(G_largest)
print(f"Average shortest path length: {avg_shortest_path}")


Average shortest path length: 2.0005698005698007


In [5]:
# Degree centrality
degree_centrality = nx.degree_centrality(G_largest)
print("Computed degree centrality")

# Betweenness centrality
# (This can be slow on large graphs, but your 5000‑edge subset should be fine)
betweenness_centrality = nx.betweenness_centrality(G_largest)
print("Computed betweenness centrality")

# Closeness centrality
closeness_centrality = nx.closeness_centrality(G_largest)
print("Computed closeness centrality")


Computed degree centrality
Computed betweenness centrality
Computed closeness centrality


In [6]:
# Display first 5 nodes with highest degree centrality
top_degree = sorted(degree_centrality.items(), key=lambda x: x[1], reverse=True)[:5]
print("Top 5 nodes by degree centrality:")

for node, centrality in top_degree:
    print(f"Node {node}: {centrality:.4f}")

# Display first 5 nodes with highest betweenness centrality
top_betweenness = sorted(betweenness_centrality.items(), key=lambda x: x[1], reverse=True)[:5]
print("\nTop 5 nodes by betweenness centrality:")
for node, centrality in top_betweenness:
    print(f"Node {node}: {centrality:.4f}")

# Display first 5 nodes with highest closeness centrality
top_closeness = sorted(closeness_centrality.items(), key=lambda x: x[1], reverse=True)[:5]
print("\nTop 5 nodes by closeness centrality:")
for node, centrality in top_closeness:
    print(f"Node {node}: {centrality:.4f}")



Top 5 nodes by degree centrality:
Node 0: 0.9914
Node 25: 0.1971
Node 26: 0.1943
Node 21: 0.1857
Node 9: 0.1629

Top 5 nodes by betweenness centrality:
Node 0: 0.9274
Node 34: 0.0171
Node 25: 0.0128
Node 21: 0.0110
Node 26: 0.0102

Top 5 nodes by closeness centrality:
Node 0: 0.9915
Node 25: 0.5521
Node 26: 0.5512
Node 21: 0.5486
Node 9: 0.5418


Along with diameter I chose to calculate average shortest path which is about 2. This means that on average any two people in this subsetted network can reach eachother in about 2 hops. Because of this we will see on the subgraph that our network is very dense. I also chose to calculate degree, betweenness and closeness centrality, from the results we can see that node 0 exhibits extreme interconnectedness. The subset likely contains mainly node 0's "friend list". Node 0 is connected to almost all other nodes. Node 34 notably does not appear on the top 5 of closeness or degree, but is number 2 for betweeness, Node 34 serves as an important part to connect different groups in the network.

In [7]:
# Exporting to Gephi format
for node in G_largest.nodes():
    G_largest.nodes[node]["degree_centrality"] = degree_centrality[node]
    G_largest.nodes[node]["betweenness_centrality"] = betweenness_centrality[node]
    G_largest.nodes[node]["closeness_centrality"] = closeness_centrality[node]

nx.write_gexf(G_largest, "facebook_subset_1000_lcc_with_metrics.gexf")


Video presentation: https://youtu.be/tMIb5tTZScU