# OPTIONAL Workbook for Homework #8

You are welcome to work on a local version of a notebook and upload it for this assignment.

This workspace is here if you'd rather not have to install all necessary packages locally.

You can download any json files to your local computer to add them to your jekyll page.

In [2]:
# Import necessary libraries
import pandas as pd
import networkx as nx
import altair as alt
import random
import plotly.graph_objects as go

## Plot 1

For the plot 1, both Altair and plotly have been used. Initially Altair was used, after some research it was observed that altair might not be the best possible way to go about NetworkX's node-link diagrams. 

As to why Altair/Vega-Lite May Not Be Ideal for Node-Link Diagrams-
1. No Built-in Graph Layouts: Altair does not have built-in support for graph layouts like spring_layout or circular_layout in NetworkX.
2. Limited Interactivity for Graphs: While Altair supports interactivity, it is not optimized by default for graph-specific interactions like zooming into nodes or highlighting edges.

In [3]:
# Enable JSON transformer and disable max row limit to handle our data;
# still, by subsampling we reduce the memory needed
alt.data_transformers.enable('json')
alt.data_transformers.disable_max_rows()

# ---------------------------------------------------------------
# Step 1: Load the Facebook Ego Network Edge List
G = nx.read_edgelist("facebook_combined.txt.gz", nodetype=int, create_using=nx.Graph())

# ---------------------------------------------------------------
# Option: Subsample the Graph
# If the network is very large, reduce its size by sampling a fixed number of nodes.
sample_size = 500  # adjust this value as needed; ensure sample_size < total nodes in G
if len(G.nodes()) > sample_size:
    sample_nodes = random.sample(list(G.nodes()), sample_size)
    # Create an induced subgraph from the sampled nodes
    G_sub = G.subgraph(sample_nodes).copy()
else:
    G_sub = G

# ---------------------------------------------------------------
# Step 2: Generate Layout Positions (Force-Directed / Spring Layout)
# k controls spacing; seed fixes randomness for reproducibility.
pos = nx.spring_layout(G_sub, k=0.1, seed=42)

# ---------------------------------------------------------------
# Step 3: Build the Nodes DataFrame
# We store each node's position and its degree (computed for the subgraph).
node_list = []
for node in G_sub.nodes():
    x, y = pos[node]
    degree = G_sub.degree(node)
    node_list.append({
        'node': node,
        'x': x,
        'y': y,
        'degree': degree
    })

nodes_df = pd.DataFrame(node_list)

# ---------------------------------------------------------------
# Step 4: Build the Edges DataFrame
# Altair requires the "long-form" data structure where each edge is split into two rows 
# (one for each endpoint) so that the line can be drawn connecting them.
edge_list = []
edge_id = 0
for u, v in G_sub.edges():
    # First endpoint of the edge
    edge_list.append({
        'edge_id': edge_id,
        'point_idx': 0,
        'x': pos[u][0],
        'y': pos[u][1],
        'node': u
    })
    # Second endpoint of the edge
    edge_list.append({
        'edge_id': edge_id,
        'point_idx': 1,
        'x': pos[v][0],
        'y': pos[v][1],
        'node': v
    })
    edge_id += 1

edges_df = pd.DataFrame(edge_list)

# ---------------------------------------------------------------
# Step 5: Create the Edges Layer (Lines)
edges_layer = alt.Chart(edges_df).mark_line(
    color='gray',
    opacity=0.5
).encode(
    x='x:Q',
    y='y:Q',
    detail='edge_id:N',   # groups data per edge
    order='point_idx:O'   # connects the two points in order
)

# ---------------------------------------------------------------
# Step 6: Create the Nodes Layer (Circles)
nodes_layer = alt.Chart(nodes_df).mark_circle().encode(
    x='x:Q',
    y='y:Q',
    color=alt.Color('degree:Q', scale=alt.Scale(scheme='blues'), title='Node Degree'),
    size=alt.value(25),
    tooltip=['node:N', 'degree:Q']
)

# ---------------------------------------------------------------
# Step 7: Combine Layers and Add Interactivity
# .interactive() enables panning and zooming. Further interactive selections can be added if desired.
node_link_chart = alt.layer(
    edges_layer,
    nodes_layer
).properties(
    width=600,
    height=500,
    title='Node-Link Diagram of Facebook Ego Network (Altair) - Subsampled'
).interactive()

# ---------------------------------------------------------------
# Step 8: Export to HTML
# Adjust the path as needed to match your desired output folder.
node_link_chart.save('../assets/plots/plot1_node_link_altair.html')

print("Altair node-link diagram saved as '../assets/plots/plot1_node_link_altair.html'.")

Altair node-link diagram saved as '../assets/plots/plot1_node_link_altair.html'.


In [4]:
node_link_chart.save('../assets/pngs/altnode_link_chart.png')

In [5]:
node_link_chart.save('../assets/json/altnode_link_chart.json')
print("Altair node-link diagram saved as 'plot1_node_link_altair.html'.")

Altair node-link diagram saved as 'plot1_node_link_altair.html'.


In [6]:
# Step 1: Load the Facebook Ego Network Dataset
G = nx.read_edgelist("facebook_combined.txt.gz", nodetype=int, create_using=nx.Graph())

# Step 2: Generate the Layout and Extract Node Positions
# Using a spring layout for force-directed graph visualization
pos = nx.spring_layout(G, k=0.1, seed=42)  # k controls spacing

# Extract x and y positions for nodes
node_x = []
node_y = []
for node in G.nodes():
    x, y = pos[node]
    node_x.append(x)
    node_y.append(y)
    

# Step 3: Create Edge Traces for Visualization
edge_x = []
edge_y = []
for edge in G.edges():
    x0, y0 = pos[edge[0]]
    x1, y1 = pos[edge[1]]
    edge_x.extend([x0, x1, None])
    edge_y.extend([y0, y1, None])

edge_trace = go.Scatter(
    x=edge_x, y=edge_y,
    line=dict(width=0.5, color='#888'),
    hoverinfo='none',
    mode='lines'
)

# Step 4: Create Node Trace
node_trace = go.Scatter(
    x=node_x, 
    y=node_y,
    mode='markers',
    hoverinfo='text',
    marker=dict(
        showscale=True,
        colorscale='YlGnBu',
        color=[],  # color will be assigned based on node degree later
        size=8,
        colorbar=dict(
            thickness=15,
            title=dict(
                text='Node Degree',
                side='right'
            ),
            xanchor='left'
        ),
        line_width=0.5
    )
)

# Add color values (degree of each node)
node_adjacencies = []
node_text = []
for node in G.nodes():
    degree = len(list(G.neighbors(node)))
    node_adjacencies.append(degree)
    node_text.append(f'Node {node}<br>Degree: {degree}')

node_trace.marker.color = node_adjacencies
node_trace.text = node_text

# Step 5: Create the Plotly Figure
fig = go.Figure(
    data=[edge_trace, node_trace],
    layout=go.Layout(
        title=dict(
            text='<br>Node-Link Diagram of Facebook Ego Network',
            font=dict(size=20)
        ),
        showlegend=False,
        hovermode='closest',
        margin=dict(b=20, l=5, r=5, t=40),
        annotations=[dict(
            text="Visualization by Plotly & NetworkX",
            showarrow=False,
            xref="paper", yref="paper",
            x=0.005, y=-0.002
        )],
        xaxis=dict(showgrid=False, zeroline=False),
        yaxis=dict(showgrid=False, zeroline=False)
    )
)

# Step 6: Export to HTML
fig.write_html("../assets/plots/plot1_node_link_diagram.html")

In [1]:
#fig.write_image('../assets/png/pltynode_link_chart.png', engine='kaleido', width=1200, height=800)

The above code was attempted to execute and ran for 6 hours till it was interupted as it did show any signs of completion.

In [None]:
import orjson
with open('../assets/json/pltynode_link_chart.json', 'wb') as f:
    f.write(orjson.dumps(fig.to_plotly_json()))
print("Node-Link Diagram saved as 'plot1_node_link_diagram.html'.")

Node-Link Diagram saved as 'plot1_node_link_diagram.html'.


## Plot 2

For the plot 2, both histogram and scatterplot has been visualised. The histogram of the node degree distribution has been generated with an interactive brush selection that highlights a chosen range of degrees. Along with this a scatter plot of node degree versus clustering coefficient that uses a click selection (where clicking on a point emphasizes it via opacity change) has been generated.

In [None]:
# Step 1: Load the Facebook Ego Network Dataset
# The dataset file 'facebook_combined.txt.gz' is assumed to be an edge list (space separated).
# Each line represents an edge between two nodes in the network.
# Load the graph using NetworkX.
G = nx.read_edgelist('facebook_combined.txt.gz',
                       create_using=nx.Graph(),
                       nodetype=int,
                       data=False)

# Step 2: Compute Node-Level Metrics (Degree and Clustering Coefficient)
# "Original Homework #5 section: Basic network loading and metric computation"
# For this assignment, we enhance the processing by calculating the clustering coefficient.
# These metrics will be used in our visualizations.
degrees = dict(G.degree())
clustering = nx.clustering(G)

# Build a DataFrame with each node, its degree, and its clustering coefficient.
data_nodes = pd.DataFrame({
    'Node': list(G.nodes()),
    'Degree': [degrees[node] for node in G.nodes()],
    'Clustering': [clustering[node] for node in G.nodes()]
})

# Step 3: Create Visualization 1 - Histogram of Node Degree Distribution
# This visualization shows the distribution of node degrees in the network.
# Encoding choices:
#  - X-axis: Binned degree (quantitative) with a maximum of 30 bins.
#  - Y-axis: Count of nodes (quantitative).
#  - Color: Interactive color change based on brush selection.
# The brush selection allows the user to dynamically highlight a range of degrees.
brush = alt.selection_interval(encodings=['x'])

hist = alt.Chart(data_nodes).mark_bar().encode(
    x=alt.X('Degree:Q', bin=alt.Bin(maxbins=30), title="Node Degree"),
    y=alt.Y('count()', title="Count of Nodes"),
    tooltip=[alt.Tooltip('count()', title="Count of Nodes")]
).properties(
    width=500,
    height=300,
    title="Histogram of Node Degree Distribution from Facebook Ego Network"
).add_params(
    brush  # Adds the brush selection for dynamic filtering
).encode(
    color=alt.condition(brush, alt.value('steelblue'), alt.value('lightgray'))
)

# Step 4: Create Visualization 2 - Scatter Plot of Degree vs. Clustering Coefficient
# This scatter plot visualizes the relationship between a node's degree and its clustering coefficient.
# Encoding choices:
#  - X-axis: Degree (quantitative)
#  - Y-axis: Clustering Coefficient (quantitative)
#  - Tooltip: Node id, Degree, and Clustering Coefficient for detailed information.
# Enhanced interactivity is introduced via a click selection that changes the opacity of the clicked node.
scatter = alt.Chart(data_nodes).mark_circle(size=60).encode(
    x=alt.X('Degree:Q', title="Node Degree"),
    y=alt.Y('Clustering:Q', title="Clustering Coefficient"),
    tooltip=['Node', 'Degree', 'Clustering']
).properties(
    width=500,
    height=400,
    title="Scatter Plot: Node Degree vs. Clustering Coefficient"
)

# Set up a click selection that operates on the 'Node' field.
click = alt.selection_point(fields=['Node'])

# Apply the selection so that the clicked node is highlighted while others appear partially transparent.
scatter = scatter.add_params(click).encode(
    opacity=alt.condition(click, alt.value(1), alt.value(0.5))
)

# Step 5: Export the Visualizations to HTML Files
# Export each chart as an HTML file. These files can be hosted (e.g., via GitHub Pages).
# In your repository, include these HTML files along with the rendered notebook.
hist.save('../assets/plots/plot2_histogram.html')
scatter.save('../assets/plots/plot2_scatter.html')

In [None]:
hist.save('../assets/json/plot2_hist_chart.json')
scatter.save('../assets/json/plot2_scatter_chart.json')
print("Visualizations created and exported as 'plot2_histogram.html' and 'plot2_scatter.html'.")

Visualizations created and exported as 'plot2_histogram.html' and 'plot2_scatter.html'.
