# Random Graph Generation and Random Walk

Simple implementation using SciPy and NetworkX to:
1. Generate a random sparse graph
2. Calculate the graph Laplacian
3. Perform a random walk using the Laplacian

In [19]:
import numpy as np
import scipy.sparse as sp
import networkx as nx
from scipy.sparse.linalg import expm_multiply
import matplotlib.pyplot as plt

# Hyperparameters
N = 100          # Number of nodes
AVG_DEGREE = 5   # Average degree per node
M = 10           # Number of random walk steps
START_NODE = 0   # Starting node for random walk
SEED = 42        # Random seed for reproducibility

In [20]:
def generate_random_graph(n_nodes, avg_degree, seed=None):
    """
    Generate a random sparse graph using Erdős-Rényi model.
    
    Time complexity: O(N²) in worst case, O(N * avg_degree) on average
    """
    if seed is not None:
        np.random.seed(seed)
    
    # Calculate edge probability to achieve desired average degree
    p = avg_degree / (n_nodes - 1)
    
    # Generate random graph
    G = nx.erdos_renyi_graph(n_nodes, p, seed=seed)
    
    # Ensure graph is connected (add edges if needed)
    if not nx.is_connected(G):
        components = list(nx.connected_components(G))
        for i in range(len(components) - 1):
            node1 = list(components[i])[0]
            node2 = list(components[i + 1])[0]
            G.add_edge(node1, node2)
    
    return G

# Step 1: Generate random sparse graph
print("Step 1: Generating random sparse graph...")
G = generate_random_graph(N, AVG_DEGREE, SEED)
print(f"Generated graph with {G.number_of_nodes()} nodes and {G.number_of_edges()} edges")
print(f"Actual average degree: {2 * G.number_of_edges() / G.number_of_nodes():.2f}")

Step 1: Generating random sparse graph...
Generated graph with 100 nodes and 227 edges
Actual average degree: 4.54


In [21]:
def compute_graph_laplacian(graph):
    """
    Compute the normalized graph Laplacian matrix: L = I - D^(-1/2) * A * D^(-1/2)
    
    Time complexity: O(E) = O(N * avg_degree)
    """
    # Get the normalized Laplacian matrix as sparse matrix
    # This computes L = I - D^(-1/2) * A * D^(-1/2)
    L = nx.normalized_laplacian_matrix(graph, nodelist=sorted(graph.nodes()))
    return L

# Step 2 & 3: Load with NetworkX and calculate Laplacian
print("\nStep 2-3: Computing normalized graph Laplacian...")
print("Normalized Laplacian: L = I - D^(-1/2) * A * D^(-1/2)")
L = compute_graph_laplacian(G)
print(f"Laplacian matrix shape: {L.shape}")
print(f"Laplacian sparsity: {L.nnz / (N * N) * 100:.2f}% non-zero entries")
print(f"Laplacian matrix type: {type(L)}")
print(f"Time complexity: O(E) = O({G.number_of_edges()}) = O(N × avg_degree)")


Step 2-3: Computing normalized graph Laplacian...
Normalized Laplacian: L = I - D^(-1/2) * A * D^(-1/2)
Laplacian matrix shape: (100, 100)
Laplacian sparsity: 5.54% non-zero entries
Laplacian matrix type: <class 'scipy.sparse._csr.csr_array'>
Time complexity: O(E) = O(227) = O(N × avg_degree)


## Time Complexity Analysis

For a graph with **N = 100 nodes** and **average degree = 5**:

### 1. Graph Generation: **O(N²)** worst case, **O(N × avg_degree)** average
- Erdős-Rényi model checks all possible edges: O(N²) = O(10,000)
- Expected edges created: O(N × avg_degree) = O(500)
- **Actual complexity: O(500)** on average

### 2. NetworkX Loading: **O(E)** 
- Already in NetworkX format, no additional loading needed
- **Complexity: O(1)**

### 3. Laplacian Computation: **O(E)**
- Building adjacency matrix: O(E) = O(N × avg_degree) = O(500)
- Computing normalized Laplacian: O(E) = O(500)
- **Total complexity: O(500)**

### 4. Uniform Random Walk: **O(M × avg_degree)**
- Each step: get neighbors O(avg_degree), sample uniformly O(1)
- For M = 10 steps: O(10 × 5) = **O(50)**
- **Most efficient approach for traditional random walks**

### Overall Time Complexity: **O(M × avg_degree) = O(50)**

The uniform neighbor visiting strategy provides the standard random walk behavior where each step moves to a uniformly random neighbor of the current node.

In [23]:
class RandomWalkSampler:
    """
    Random walk sampler with configurable strategies and batch sampling support.
    """
    
    def __init__(self, graph, strategy='uniform', stopping_prob=0.0, seed=None):
        """
        Initialize the random walk sampler.
        
        Args:
            graph: NetworkX graph object
            strategy: Sampling strategy ('uniform' for now, extensible for future strategies)
            stopping_prob: Probability of stopping at each step (0.0 = never stop early)
            seed: Random seed for reproducibility
        """
        self.graph = graph
        self.strategy = strategy
        self.stopping_prob = stopping_prob
        self.seed = seed
        
        if strategy not in ['uniform']:
            raise ValueError(f"Strategy '{strategy}' not supported. Available: ['uniform']")
        
        if seed is not None:
            np.random.seed(seed)
    
    def sample_walk(self, start_node, max_steps):
        """
        Sample a single random walk from a starting node.
        
        Args:
            start_node: Starting node index
            max_steps: Maximum number of steps (may stop early due to stopping_prob)
        
        Returns:
            list: Path of visited nodes
        """
        if self.strategy == 'uniform':
            return self._uniform_walk(start_node, max_steps)
        
    def sample_batch_walks(self, start_nodes, max_steps, num_walks_per_node=1):
        """
        Sample multiple random walks from multiple starting nodes.
        
        Args:
            start_nodes: List of starting node indices or 'all' for all nodes
            max_steps: Maximum number of steps per walk
            num_walks_per_node: Number of walks to sample from each starting node
        
        Returns:
            dict: {start_node: [walk1, walk2, ...]} where each walk is a list of nodes
        """
        if start_nodes == 'all':
            start_nodes = list(self.graph.nodes())
        
        batch_results = {}
        
        for start_node in start_nodes:
            walks = []
            for _ in range(num_walks_per_node):
                walk = self.sample_walk(start_node, max_steps)
                walks.append(walk)
            batch_results[start_node] = walks
        
        return batch_results
    
    def _uniform_walk(self, start_node, max_steps):
        """
        Perform uniform random walk with stopping criteria.
        
        Time complexity: O(max_steps * avg_degree) in worst case
        """
        current_node = start_node
        path = [current_node]
        
        for step in range(max_steps):
            # Check stopping criteria
            if np.random.random() < self.stopping_prob:
                break
            
            # Get neighbors of current node
            neighbors = list(self.graph.neighbors(current_node))
            
            if len(neighbors) == 0:
                # If isolated node, stop the walk
                break
            else:
                # Uniformly sample from neighbors
                next_node = np.random.choice(neighbors)
                path.append(next_node)
                current_node = next_node
        
        return path
    
    def get_walk_statistics(self, walks_dict):
        """
        Compute statistics for batch walks.
        
        Args:
            walks_dict: Output from sample_batch_walks
        
        Returns:
            dict: Statistics about the walks
        """
        total_walks = sum(len(walks) for walks in walks_dict.values())
        total_steps = sum(len(walk) - 1 for walks in walks_dict.values() for walk in walks)
        avg_walk_length = total_steps / total_walks if total_walks > 0 else 0
        
        all_nodes_visited = set()
        for walks in walks_dict.values():
            for walk in walks:
                all_nodes_visited.update(walk)
        
        return {
            'total_walks': total_walks,
            'total_steps': total_steps,
            'avg_walk_length': avg_walk_length,
            'unique_nodes_visited': len(all_nodes_visited),
            'graph_coverage': len(all_nodes_visited) / self.graph.number_of_nodes()
        }

In [24]:
# Step 4: Create RandomWalkSampler and demonstrate usage
print(f"\nStep 4: Using RandomWalkSampler for random walks...")

# Create sampler with stopping probability
sampler = RandomWalkSampler(G, strategy='uniform', stopping_prob=0.1, seed=SEED)

# Single walk example
print(f"\nSingle walk from node {START_NODE}:")
single_walk = sampler.sample_walk(START_NODE, M)
print(f"Walk path: {single_walk}")
print(f"Walk length: {len(single_walk)} steps")

# Batch walks example - sample from first 5 nodes
print(f"\nBatch walks from first 5 nodes (2 walks each):")
batch_walks = sampler.sample_batch_walks(
    start_nodes=list(range(5)), 
    max_steps=M, 
    num_walks_per_node=2
)

for start_node, walks in batch_walks.items():
    print(f"Node {start_node}: {len(walks)} walks")
    for i, walk in enumerate(walks):
        print(f"  Walk {i+1}: {walk} (length: {len(walk)})")

# Get statistics
stats = sampler.get_walk_statistics(batch_walks)
print(f"\nBatch walk statistics:")
for key, value in stats.items():
    if isinstance(value, float):
        print(f"- {key}: {value:.3f}")
    else:
        print(f"- {key}: {value}")

# Example of sampling from all nodes (smaller batch for demo)
print(f"\nSampling from all nodes (1 walk each, max 5 steps):")
sampler_all = RandomWalkSampler(G, strategy='uniform', stopping_prob=0.15, seed=SEED)
all_walks = sampler_all.sample_batch_walks('all', max_steps=5, num_walks_per_node=1)
all_stats = sampler_all.get_walk_statistics(all_walks)
print(f"Total walks: {all_stats['total_walks']}")
print(f"Average walk length: {all_stats['avg_walk_length']:.2f}")
print(f"Graph coverage: {all_stats['graph_coverage']:.2f}")


Step 4: Using RandomWalkSampler for random walks...

Single walk from node 0:
Walk path: [0, 42, 59, 30, 32, 96, 78, 90, 76]
Walk length: 9 steps

Batch walks from first 5 nodes (2 walks each):
Node 0: 2 walks
  Walk 1: [0, 2, 75] (length: 3)
  Walk 2: [0] (length: 1)
Node 1: 2 walks
  Walk 1: [1, 29, 81, 29, 27, 36, 27] (length: 7)
  Walk 2: [1, 3, 7, 64, 7, 3, 52, 75, 89, 75, 12] (length: 11)
Node 2: 2 walks
  Walk 1: [2, 75, 71, 58] (length: 4)
  Walk 2: [2, 75, 42, 75] (length: 4)
Node 3: 2 walks
  Walk 1: [3, 1, 3, 7, 64, 58, 71, 67, 20, 67, 81] (length: 11)
  Walk 2: [3, 34, 47] (length: 3)
Node 4: 2 walks
  Walk 1: [4, 9, 4, 30] (length: 4)
  Walk 2: [4, 96, 78, 80, 88, 80, 34, 74, 34, 37] (length: 10)

Batch walk statistics:
- total_walks: 10
- total_steps: 48
- avg_walk_length: 4.800
- unique_nodes_visited: 30
- graph_coverage: 0.300

Sampling from all nodes (1 walk each, max 5 steps):
Total walks: 100
Average walk length: 3.02
Graph coverage: 1.00


## RandomWalkSampler API

The `RandomWalkSampler` class provides a clean API for random walk sampling:

### Features:
- **Strategy Selection**: Currently supports 'uniform', extensible for future strategies
- **Stopping Criteria**: Configurable probability `p` to stop early at each step
- **Batch Sampling**: Efficiently sample multiple walks from multiple nodes
- **Statistics**: Built-in methods to analyze walk properties

### Usage Examples:

```python
# Create sampler
sampler = RandomWalkSampler(graph, strategy='uniform', stopping_prob=0.1)

# Single walk
walk = sampler.sample_walk(start_node=0, max_steps=10)

# Batch walks from specific nodes
batch = sampler.sample_batch_walks([0, 1, 2], max_steps=10, num_walks_per_node=5)

# Sample from all nodes
all_walks = sampler.sample_batch_walks('all', max_steps=10)

# Get statistics
stats = sampler.get_walk_statistics(batch)
```

### Time Complexity:
- **Single walk**: O(max_steps × avg_degree)
- **Batch walks**: O(num_nodes × num_walks_per_node × max_steps × avg_degree)
- **All nodes**: O(N × num_walks_per_node × max_steps × avg_degree)

The sampler efficiently handles large-scale random walk generation for graph analysis and machine learning applications.