# California Road Network Analysis


## 1. Introduction and Motivation

We analyze the **roadNet-CA** dataset from the Stanford Network Analysis Project (SNAP), 
representing the California road network.

### Research Questions:
1. **Connectivity**: How connected is California's road network? 
2. **Structural Patterns**: What local structures (motifs) exist?
3. **Hub Identification**: Which intersections are most critical?

### Why This Dataset?
Road networks exhibit unique topological properties:
- **Spatial constraints**: Geographic layout influences connectivity
- **Degree distribution**: Most intersections have 3-4 connections
- **Triangle motifs**: Indicate local redundancy and resilience

### Methodology:
We use **GraphFrames on Spark** to handle the large-scale network efficiently,
then validate findings with real OpenStreetMap data via OSMnx.

## 2. Setup and Data Loading


In [0]:
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from graphframes import GraphFrame

spark = SparkSession.builder \
    .appName("California Road Network Analysis") \
    .getOrCreate()

print("Spark version:", spark.version)


Spark version: 4.0.0




In [0]:
# Load dataset 
road_df = spark.table("hive_metastore.default.road_net_ca")


## 3. Data Exploration


In [0]:
display(road_df.limit(20))
print("Total rows:", road_df.count())


value
1534013	1524759
1534013	1533993
1534013	1534014
1524761	1524760
1524763	1524762
1524766	1524762
1525508	1524764
1525508	1525509
1524767	1524768
1524767	1524769


Total rows: 5533218


## 4. Descriptive Statistics


In [0]:
# Count valid (non-comment) rows
valid_rows = road_df.filter(~F.col("value").startswith("#")).count()

# Parse edges for descriptive statistics
edges_simple = (
    road_df.filter(~F.col("value").startswith("#"))
    .select(
        F.split(F.col("value"), r"\s+").getItem(0).cast("long").alias("src"),
        F.split(F.col("value"), r"\s+").getItem(1).cast("long").alias("dst")
    )
)

# Count unique nodes
unique_nodes = (
    edges_simple.select("src")
    .union(edges_simple.select("dst"))
    .distinct()
    .count()
)

# Node value statistics
node_stats = (
    edges_simple.select("src")
    .union(edges_simple.select("dst"))
    .describe()
)

# Degree statistics
degree_df = (
    edges_simple.select(F.col("src").alias("id"))
    .union(edges_simple.select(F.col("dst").alias("id")))
    .groupBy("id")
    .count()
    .withColumnRenamed("count", "degree")
)

degree_stats = degree_df.describe()

print("=== DESCRIPTIVE STATISTICS ===")
print(f"Valid rows (edges): {valid_rows}")
print(f"Unique nodes: {unique_nodes}")

display(node_stats)
display(degree_stats)


=== DESCRIPTIVE STATISTICS ===
Valid rows (edges): 5533214
Unique nodes: 1965206


summary,src
count,11066428.0
mean,979857.9158733062
stddev,567822.151834807
min,0.0
max,1971280.0


Databricks data profile. Run in Databricks to view.

summary,id,degree
count,1965206.0,1965206.0
mean,985866.5497550892,5.631179632058929
stddev,568930.6434185976,1.989248562108384
min,0.0,2.0
max,1971280.0,24.0


Databricks data profile. Run in Databricks to view.

## 5. Build Graph (Edges, Vertices, GraphFrame)


In [0]:
# Load clean edges
raw_edges = (
    spark.read.table("default.road_net_ca")
    .filter(~F.col("value").startswith("#"))
    .select(F.split(F.col("value"), r"\s+").alias("parts"))
)

# Extract src/dst
edges_file = raw_edges.select(
    F.col("parts")[0].cast("long").alias("src"),
    F.col("parts")[1].cast("long").alias("dst")
)

# Undirected edges → canonical representation
edges_undirected = (
    edges_file
    .withColumn("u", F.least("src", "dst"))
    .withColumn("v", F.greatest("src", "dst"))
    .select("u", "v")
    .dropDuplicates()
    .withColumnRenamed("u", "src")
    .withColumnRenamed("v", "dst")
)

# Vertices
vertices = (
    edges_undirected.select(F.col("src").alias("id"))
    .union(edges_undirected.select(F.col("dst").alias("id")))
    .distinct()
)

# Directed edges needed for GraphFrames
edges_dir = edges_undirected.unionByName(
    edges_undirected.select(
        F.col("dst").alias("src"),
        F.col("src").alias("dst")
    )
)

graph = GraphFrame(vertices, edges_dir)

print("Graph built successfully.")
print("Vertices:", vertices.count())
print("Undirected edges:", edges_undirected.count())
print("Directed edges:", edges_dir.count())


Graph built successfully.
Vertices: 1965206
Undirected edges: 2766607
Directed edges: 5533214


## 6. Basic Graph Statistics

In [0]:
num_vertices = vertices.count()
num_edges = edges_undirected.count()

print("=== BASIC GRAPH STATISTICS ===")
print("Number of vertices:", num_vertices)
print("Number of undirected edges:", num_edges)


=== BASIC GRAPH STATISTICS ===
Number of vertices: 1965206
Number of undirected edges: 2766607


## 7. Largest Weakly Connected Component (WCC)

In [0]:
# Required for GraphFrames algorithms
spark.sparkContext.setCheckpointDir("/tmp/graphframes-checkpoints")

components = graph.connectedComponents().cache()

# Compute WCC sizes
wcc_sizes = components.groupBy("component").count()
largest_wcc = wcc_sizes.orderBy(F.desc("count")).first()

wcc_size = largest_wcc["count"]
fraction = wcc_size / num_vertices

print("=== LARGEST WEAKLY CONNECTED COMPONENT ===")
print("Component ID:", largest_wcc["component"])
print("Nodes in largest WCC:", wcc_size)
print("Fraction of all nodes:", fraction)


=== LARGEST WEAKLY CONNECTED COMPONENT ===
Component ID: 0
Nodes in largest WCC: 1957027
Fraction of all nodes: 0.9958380953447119


## 8. Wedges and Clustering Coefficient


In [0]:
# Compute node degrees
deg_df = (
    edges_undirected.select(F.col("src").alias("id"))
    .union(edges_undirected.select(F.col("dst").alias("id")))
    .groupBy("id")
    .count()
    .withColumnRenamed("count", "deg")
)

# Compute wedges per node
wedges_df = deg_df.withColumn(
    "wedges",
    (F.col("deg") * (F.col("deg") - 1)) / 2
)

# Total wedges
total_wedges = wedges_df.agg(F.sum("wedges")).first()[0]

# Global clustering coefficient
clustering_coeff = (
    (3 * num_triangles) / total_wedges
    if total_wedges > 0 else 0
)

print("=== WEDGES & CLUSTERING COEFFICIENT ===")
print("Total wedges:", total_wedges)
print("Clustering coefficient:", clustering_coeff)


=== WEDGES & CLUSTERING COEFFICIENT ===
Total wedges: 5995090.0
Clustering coefficient: 0.06038741703627468


## 9. Motif Analysis: Triangles


In [0]:
# Triangle pattern: a→b, b→c, c→a
triangles = graph.find(
    "(a)-[]->(b); (b)-[]->(c); (c)-[]->(a)"
).filter("a.id < b.id AND b.id < c.id")

num_triangles = triangles.count()

print("=== TRIANGLES ===")
print("Number of triangles:", num_triangles)


=== TRIANGLES ===
Number of triangles: 120676


# Visualizations


## 10. Interactive Graph Visualization 

The full California road network contains nearly 2 million nodes and cannot be visualized directly.  
To explore the network structure visually, we extract a *small subgraph* from the largest connected component and render it with an interactive D3.js force layout.
 
The goal is to explore how intersections are interconnected within their immediate neighborhood and to visually inspect whether the local topology forms clusters or remains sparse.  
This representation is purely topological (not geographic) and clearly shows the limited local density typical of large road networks.


In [0]:
from pyspark.sql import functions as F

# Degree on undirected graph
deg = (
    edges_undirected.select(F.col("src").alias("id"))
    .union(edges_undirected.select(F.col("dst").alias("id")))
    .groupBy("id")
    .count()
    .withColumnRenamed("count", "deg")
)

# Keep only nodes in largest WCC and pick a high-degree one
deg_wcc = (
    deg.join(
        components.select("id", "component"),
        on="id",
        how="inner"
    )
    .filter(F.col("component") == largest_wcc["component"])
)

root_row = deg_wcc.orderBy(F.desc("deg")).limit(1).collect()[0]
random_node = root_row["id"]
print("Selected high-degree root node:", random_node, "with degree:", root_row["deg"])

e = edges_undirected.alias("e")

level1 = (
    e.filter((F.col("e.src") == random_node) | (F.col("e.dst") == random_node))
    .select("src", "dst")
)

level1_nodes = (
    level1.select(F.col("src").alias("id"))
    .union(level1.select(F.col("dst").alias("id")))
    .distinct()
)

ln = level1_nodes.alias("ln")

level2 = (
    e.join(
        ln,
        (F.col("e.src") == F.col("ln.id")) | (F.col("e.dst") == F.col("ln.id")),
        "inner",
    )
    .select("src", "dst")
)

sub_edges = (
    level1.union(level2)
    .dropDuplicates()
    .limit(1000)
)

sub_nodes = (
    sub_edges.select(F.col("src").alias("id"))
    .union(sub_edges.select(F.col("dst").alias("id")))
    .distinct()
)

print("Subgraph edges:", sub_edges.count())
print("Subgraph nodes:", sub_nodes.count())


Selected high-degree root node: 562818 with degree: 12
Subgraph edges: 38
Subgraph nodes: 28


In [0]:
import json

nodes_list = [ {"id": str(r["id"])} for r in sub_nodes.collect() ]
edges_list = [ {"source": str(r["src"]), "target": str(r["dst"])} for r in sub_edges.collect() ]

graph_json = {
    "nodes": nodes_list,
    "links": edges_list
}

output_path = "/dbfs/tmp/roadnet_subgraph.json"

with open(output_path, "w") as f:
    json.dump(graph_json, f)

print("Saved JSON to:", output_path)


Saved JSON to: /dbfs/tmp/roadnet_subgraph.json


In [0]:
import json

with open("/dbfs/tmp/roadnet_subgraph.json") as f:
    g = json.load(f)

data_js = json.dumps(g)

displayHTML(f"""
<div id="viz" style="width:100%; height:800px; background:#111; border:1px solid #333;"></div>

<script src="https://d3js.org/d3.v7.min.js"></script>

<script>
const graph = {data_js};

const div = document.getElementById("viz");
const width = div.clientWidth;
const height = div.clientHeight;

// SVG
const svg = d3.select("#viz")
  .append("svg")
  .attr("width", width)
  .attr("height", height);

// Simulation
const sim = d3.forceSimulation(graph.nodes)
  .force("link", d3.forceLink(graph.links).id(d => d.id).distance(40))
  .force("charge", d3.forceManyBody().strength(-50))
  .force("center", d3.forceCenter(width / 2, height / 2));

// Draw links
const link = svg.append("g")
  .attr("stroke", "#666")
  .selectAll("line")
  .data(graph.links)
  .enter()
  .append("line")
  .attr("stroke-width", 1);

// Draw nodes
const node = svg.append("g")
  .selectAll("circle")
  .data(graph.nodes)
  .enter()
  .append("circle")
  .attr("r", 4)
  .attr("fill", "#4FC3F7");

// Update positions
sim.on("tick", () => {{
  link
    .attr("x1", d => d.source.x)
    .attr("y1", d => d.source.y)
    .attr("x2", d => d.target.x)
    .attr("y2", d => d.target.y);

  node
    .attr("cx", d => d.x)
    .attr("cy", d => d.y);
}});
</script>
""")


## 11. Motifs Graph Visualization 

This visualization enhances the previous one by explicitly highlighting all triangle motifs detected in the subgraph.  
Nodes that participate in a triangle are marked in orange, while triangle edges are shown in red.  
This view allows us to observe how rare triangle formations are within the California road network and how they emerge only in very localized configurations, confirming the low clustering expected in planar transportation networks.



In [0]:
from pyspark.sql import functions as F
import random

# 1) Take one triangle from the global graph 
triangle_row = (
    triangles
    .select(
        F.col("a.id").alias("a"),
        F.col("b.id").alias("b"),
        F.col("c.id").alias("c")
    )
    .limit(1)
    .collect()[0]
)

# Randomly choose one of the three triangle nodes as the root
candidates = [triangle_row["a"], triangle_row["b"], triangle_row["c"]]
root_node = random.choice(candidates)

print("Root node taken from a global triangle:", root_node)

# 2) Build the 2-hop subgraph around this node
e = edges_undirected.alias("e")

# Level 1 neighbors (edges incident to the root node)
level1 = (
    e.filter((F.col("e.src") == root_node) | (F.col("e.dst") == root_node))
    .select("src", "dst")
)

level1_nodes = (
    level1.select(F.col("src").alias("id"))
    .union(level1.select(F.col("dst").alias("id")))
    .distinct()
)

# Level 2 neighbors (neighbors of the neighbors)
ln = level1_nodes.alias("ln")

level2 = (
    e.join(
        ln,
        (F.col("e.src") == F.col("ln.id")) | (F.col("e.dst") == F.col("ln.id")),
        "inner"
    )
    .select("src", "dst")
)

# Final subgraph (limit its size)
sub_edges = (
    level1.union(level2)
    .dropDuplicates()
    .limit(1000)
)

sub_nodes = (
    sub_edges.select(F.col("src").alias("id"))
    .union(sub_edges.select(F.col("dst").alias("id")))
    .distinct()
)

print("Subgraph edges:", sub_edges.count())
print("Subgraph nodes:", sub_nodes.count())


Root node taken from a global triangle: 342253
Subgraph edges: 13
Subgraph nodes: 12


In [0]:
import json
from graphframes import GraphFrame
from pyspark.sql import functions as F

# Build a GraphFrame just on the subgraph
sub_vertices = sub_nodes
sub_edges_dir = sub_edges.unionByName(
    sub_edges.select(F.col("dst").alias("src"),
                     F.col("src").alias("dst"))
)

sub_graph = GraphFrame(sub_vertices, sub_edges_dir)

# Find triangles in the subgraph
triangles_df = sub_graph.find(
    "(a)-[]->(b); (b)-[]->(c); (c)-[]->(a)"
).filter("a.id < b.id AND b.id < c.id")

print("Triangles found in subgraph:", triangles_df.count())

# Collect triangle nodes and edges
triangle_nodes = set()
triangle_edges = set()

tri_rows = triangles_df.select(
    F.col("a.id").alias("a"),
    F.col("b.id").alias("b"),
    F.col("c.id").alias("c"),
).collect()

for r in tri_rows:
    a = str(r["a"])
    b = str(r["b"])
    c = str(r["c"])

    triangle_nodes.update([a, b, c])

    triangle_edges.update([
        tuple(sorted([a, b])),
        tuple(sorted([b, c])),
        tuple(sorted([c, a])),
    ])

print("Triangle nodes:", len(triangle_nodes))
print("Triangle edges:", len(triangle_edges))

# Build JSON with flags for triangle nodes/edges
nodes_list = []
for r in sub_nodes.collect():
    nid = str(r["id"])
    nodes_list.append({
        "id": nid,
        "inTriangle": nid in triangle_nodes
    })

edges_list = []
for r in sub_edges.collect():
    s = str(r["src"])
    t = str(r["dst"])
    key = tuple(sorted([s, t]))
    edges_list.append({
        "source": s,
        "target": t,
        "inTriangle": key in triangle_edges
    })

graph_json = {
    "nodes": nodes_list,
    "links": edges_list
}

output_path = "/dbfs/tmp/roadnet_subgraph_triangles.json"
with open(output_path, "w") as f:
    json.dump(graph_json, f)

print("Saved JSON with triangle flags to:", output_path)



Triangles found in subgraph: 1
Triangle nodes: 3
Triangle edges: 3
Saved JSON with triangle flags to: /dbfs/tmp/roadnet_subgraph_triangles.json


In [0]:
import json

with open("/dbfs/tmp/roadnet_subgraph_triangles.json") as f:
    g = json.load(f)

data_js = json.dumps(g)

displayHTML(f"""
<div id="viz" style="width:100%; height:800px; background:#f5f5f5; border:1px solid #ccc;"></div>

<script src="https://d3js.org/d3.v7.min.js"></script>

<script>
const graph = {data_js};

const div = document.getElementById("viz");
const width = div.clientWidth;
const height = div.clientHeight;

// SVG
const svg = d3.select("#viz")
  .append("svg")
  .attr("width", width)
  .attr("height", height);

// Force simulation
const sim = d3.forceSimulation(graph.nodes)
  .force("link", d3.forceLink(graph.links).id(d => d.id).distance(40))
  .force("charge", d3.forceManyBody().strength(-60))
  .force("center", d3.forceCenter(width / 2, height / 2));

// Links: red if part of a triangle, grey otherwise
const link = svg.append("g")
  .selectAll("line")
  .data(graph.links)
  .enter()
  .append("line")
  .attr("stroke-width", 1.2)
  .attr("stroke", d => d.inTriangle ? "#e53935" : "#b0bec5")
  .attr("stroke-opacity", d => d.inTriangle ? 0.9 : 0.5);

// Nodes: orange, but brighter if in a triangle
const node = svg.append("g")
  .selectAll("circle")
  .data(graph.nodes)
  .enter()
  .append("circle")
  .attr("r", 4)
  .attr("fill", d => d.inTriangle ? "#ffb300" : "#6d4c41");

// Tick updates
sim.on("tick", () => {{
  link
    .attr("x1", d => d.source.x)
    .attr("y1", d => d.source.y)
    .attr("x2", d => d.target.x)
    .attr("y2", d => d.target.y);

  node
    .attr("cx", d => d.x)
    .attr("cy", d => d.y);
}});
</script>
""")


In [0]:
%pip install folium


Collecting folium
  Downloading folium-0.20.0-py2.py3-none-any.whl.metadata (4.2 kB)
Collecting branca>=0.6.0 (from folium)
  Downloading branca-0.8.2-py3-none-any.whl.metadata (1.7 kB)
Collecting xyzservices (from folium)
  Downloading xyzservices-2025.11.0-py3-none-any.whl.metadata (4.3 kB)
Downloading folium-0.20.0-py2.py3-none-any.whl (113 kB)
Downloading branca-0.8.2-py3-none-any.whl (26 kB)
Downloading xyzservices-2025.11.0-py3-none-any.whl (93 kB)
Installing collected packages: xyzservices, branca, folium
Successfully installed branca-0.8.2 folium-0.20.0 xyzservices-2025.11.0
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


## 12. Real-World Road Network  (OSMnx)

We use OSMnx to download the real street network of Los Angeles, restricted to car-accessible roads (`network_type="drive"`).  
This provides a fully georeferenced graph with real intersections and road geometries.  
The extracted graph is then converted into GeoDataFrames for further spatial analysis and mapping.


In [0]:
from pyspark.sql import functions as F
from graphframes import GraphFrame

# Load the roadNet-CA dataset
road_df = spark.table("hive_metastore.default.road_net_ca")

edges_file = (
    road_df
    .filter(~F.col("value").startswith("#"))
    .select(F.split(F.col("value"), r"\s+").alias("parts"))
    .select(
        F.col("parts")[0].cast("long").alias("src"),
        F.col("parts")[1].cast("long").alias("dst")
    )
)

# Build an undirected graph
edges_undirected = (
    edges_file
    .withColumn("u", F.least("src", "dst"))
    .withColumn("v", F.greatest("src", "dst"))
    .select("u", "v")
    .dropDuplicates()
    .withColumnRenamed("u", "src")
    .withColumnRenamed("v", "dst")
)

vertices = (
    edges_undirected.select(F.col("src").alias("id"))
    .union(edges_undirected.select(F.col("dst").alias("id")))
    .distinct()
)

# For GraphFrames, use a directed version
edges_dir = edges_undirected.unionByName(
    edges_undirected.select(
        F.col("dst").alias("src"),
        F.col("src").alias("dst")
    )
)

graph = GraphFrame(vertices, edges_dir)

print("Vertices:", vertices.count())
print("Undirected edges:", edges_undirected.count())
print("Directed edges:", edges_dir.count())

# Checkpoint directory for GraphFrames
spark.sparkContext.setCheckpointDir("/tmp/graphframes-checkpoints")

# Largest Weakly Connected Component
components = graph.connectedComponents().cache()
wcc_sizes = components.groupBy("component").count()
largest_wcc = wcc_sizes.orderBy(F.desc("count")).first()

print("Largest WCC size:", largest_wcc["count"])

# Global degree
deg_df = (
    edges_undirected.select(F.col("src").alias("id"))
    .union(edges_undirected.select(F.col("dst").alias("id")))
    .groupBy("id")
    .count()
    .withColumnRenamed("count", "deg")
)
print("Computed degree for", deg_df.count(), "nodes")


Vertices: 1965206
Undirected edges: 2766607
Directed edges: 5533214
Largest WCC size: 1957027
Computed degree for 1965206 nodes


In [0]:
# Nodes of the Largest Connected Component
wcc_nodes = (
    components
    .select("id", "component")
    .filter(F.col("component") == largest_wcc["component"])
)

# Degree computed only on the WCC
deg_wcc = (
    deg_df.join(wcc_nodes, on="id", how="inner")
)

# Select the most important nodes based on degree
TOP_N = 10  # you can change 10 → 5 / 15 etc.
top_nodes_rows = deg_wcc.orderBy(F.desc("deg")).limit(TOP_N).collect()
important_nodes = [row["id"] for row in top_nodes_rows]

print("Top-degree nodes used for the subgraph:", important_nodes)

# Edges that are incident to at least one of these nodes
sub_edges = (
    edges_undirected
    .filter(
        F.col("src").isin(important_nodes) |
        F.col("dst").isin(important_nodes)
    )
    .dropDuplicates()
    .limit(5000)        
)

sub_nodes = (
    sub_edges.select(F.col("src").alias("id"))
    .union(sub_edges.select(F.col("dst").alias("id")))
    .distinct()
)

print("Subgraph nodes:", sub_nodes.count())
print("Subgraph edges:", sub_edges.count())



Top-degree nodes used for the subgraph: [562818, 521168, 534751, 1795416, 309321, 1275439, 1495419, 942607, 290162, 1631015]
Subgraph nodes: 99
Subgraph edges: 89


In [0]:
import folium
from folium.plugins import MarkerCluster
import networkx as nx

def plot_road_subgraph_map(sub_edges_df, sub_nodes_df):

    # Convert edges to Pandas
    edges_pd = sub_edges_df.select("src", "dst").toPandas()
    if edges_pd.empty:
        print("No data to display")
        return None

    # Build the list of nodes starting from the edges
    nodes = (
        list(edges_pd["src"].unique()) +
        [n for n in edges_pd["dst"].unique() if n not in set(edges_pd["src"].unique())]
    )

    # NetworkX graph
    G_vis = nx.from_pandas_edgelist(edges_pd, "src", "dst")
    for n in nodes:
        if n not in G_vis:
            G_vis.add_node(n)

    # Force-directed layout (synthetic coordinates)
    pos = nx.spring_layout(G_vis, k=0.2, iterations=30, seed=42)

    # Approximate bounding box for California
    min_lat, max_lat = 34.0, 41.0
    min_lon, max_lon = -124.0, -116.0

    def normalize(val, min_v, max_v):
        norm = (val + 1) / 2   # [-1,1] → [0,1]
        return norm * (max_v - min_v) + min_v

    # Dictionary node → (lat, lon)
    node_coords = {}
    for node, (x, y) in pos.items():
        lat = normalize(y, min_lat, max_lat)
        lon = normalize(x, min_lon, max_lon)
        node_coords[node] = (lat, lon)

    # Map centered on California
    m = folium.Map(location=[37.5, -120.0], zoom_start=6, tiles="cartodb positron")

    # Marker cluster for nodes
    marker_cluster = MarkerCluster().add_to(m)

    for n, (lat, lon) in node_coords.items():
        folium.CircleMarker(
            location=[lat, lon],
            radius=4,
            color="blue",
            fill=True,
            fill_color="blue",
            fill_opacity=0.7,
            popup=f"Node {n}"
        ).add_to(marker_cluster)

    # Draw edges
    for _, row in edges_pd.iterrows():
        s = row["src"]
        t = row["dst"]

        if s not in node_coords or t not in node_coords:
            continue

        p1 = node_coords[s]
        p2 = node_coords[t]

        folium.PolyLine(
            locations=[p1, p2],
            color="gray",
            weight=1,
            opacity=0.5,
            popup=f"{s} ↔ {t}"
        ).add_to(m)

    return m

# Show the map
m = plot_road_subgraph_map(sub_edges, sub_nodes)
if m is not None:
    displayHTML(m._repr_html_())



## 13. Real-World Road Network Visualization (Folium)

A Folium map is generated using the true coordinates of the OSM road network.  
To ensure performance, up to 5,000 road segments are sampled and drawn as yellow polylines.  
This visualization provides a real geospatial reference, allowing us to compare the structural patterns observed in the SNAP dataset with an actual urban street network.


In [0]:
%pip install osmnx folium


Collecting osmnx
  Downloading osmnx-2.0.7-py3-none-any.whl.metadata (4.9 kB)
Collecting geopandas>=1.0.1 (from osmnx)
  Downloading geopandas-1.1.1-py3-none-any.whl.metadata (2.3 kB)
Collecting shapely>=2.0 (from osmnx)
  Downloading shapely-2.1.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (6.8 kB)
Collecting pyogrio>=0.7.2 (from geopandas>=1.0.1->osmnx)
  Downloading pyogrio-0.12.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (5.9 kB)
Collecting pyproj>=3.5.0 (from geopandas>=1.0.1->osmnx)
  Downloading pyproj-3.7.2-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (31 kB)
Downloading osmnx-2.0.7-py3-none-any.whl (101 kB)
Downloading geopandas-1.1.1-py3-none-any.whl (338 kB)
Downloading shapely-2.1.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.1 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/3.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m93.

In [0]:
import osmnx as ox
import folium

# OSMnx settings
ox.settings.use_cache = True       
ox.settings.log_console = True     


In [0]:
place_name = "Los Angeles, California, USA"

# Download the road network based on car-accessible streets
G_real = ox.graph_from_place(
    place_name,
    network_type="drive",
    simplify=True
)

print("Number of OSM nodes:", len(G_real.nodes))
print("Number of OSM edges:", len(G_real.edges))


Number of OSM nodes: 49630
Number of OSM edges: 136127


In [0]:
import osmnx as ox
import folium
from shapely.geometry import LineString, MultiLineString

# Convert the OSMnx graph into GeoDataFrames
nodes_gdf, edges_gdf = ox.graph_to_gdfs(G_real, nodes=True, edges=True)

print("Nodes gdf:", len(nodes_gdf))
print("Edges gdf:", len(edges_gdf))

# Map center = mean latitude and longitude of all nodes
center_lat = nodes_gdf["y"].mean()
center_lon = nodes_gdf["x"].mean()

# Create the Folium map centered on the city
m_real = folium.Map(
    location=[center_lat, center_lon],
    zoom_start=12,
    tiles="cartodb positron"
)

# Defining a subset of edges
MAX_EDGES = 5000
if len(edges_gdf) > MAX_EDGES:
    edges_sample = edges_gdf.sample(n=MAX_EDGES, random_state=42)
else:
    edges_sample = edges_gdf

print("Edges drawn on the map:", len(edges_sample))

# Draw each road as a PolyLine
for _, row in edges_sample.iterrows():
    geom = row.geometry


    if isinstance(geom, LineString):
        coords = [(lat, lon) for lon, lat in geom.coords]
        folium.PolyLine(coords, color="yellow", weight=1, opacity=0.7).add_to(m_real)

    elif isinstance(geom, MultiLineString):
        for line in geom:
            coords = [(lat, lon) for lon, lat in line.coords]
            folium.PolyLine(coords, color="yellow", weight=1, opacity=0.7).add_to(m_real)

# Display the map
displayHTML(m_real._repr_html_())



## Conclusion

The analysis showed that the California road network is sparse, low-density, and weakly clustered, exactly what we expect from a large real-world road system. 

Motif counts confirmed that triangles and tightly connected structures are rare, reflecting a functional, non-redundant design. 

Subgraph visualizations helped illustrate how intersections connect locally without forming dense cores.

