# Graph Generation

This notebook shows how to generate random graphs and common causal structures using the causal meta-learning library.

In [None]:
# Import necessary modules
import sys
import os

# Add the root directory to the path to make imports work
root_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))
if root_dir not in sys.path:
    sys.path.append(root_dir)

# Import common libraries
import numpy as np
import matplotlib.pyplot as plt

# Import the causal meta-learning library
from causal_meta.graph import Graph, DirectedGraph, CausalGraph
import causal_meta.graph.visualization as viz

# Graph Generation

This notebook demonstrates how to use the graph generation factory pattern to create different types of graphs. We'll cover:

1. Using the GraphFactory class
2. Creating random graphs
3. Generating scale-free networks
4. Creating predefined graph structures
5. Customizing graph generation parameters

Let's get started!

In [None]:
# Import necessary modules
import sys
import os

# Add the root directory to the path to make imports work
root_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))
if root_dir not in sys.path:
    sys.path.append(root_dir)

# Import the necessary modules
from causal_meta.graph import CausalGraph, DirectedGraph
from causal_meta.graph.generators.factory import GraphFactory
import causal_meta.graph.visualization as viz

import numpy as np
import matplotlib.pyplot as plt

## 1. Using the GraphFactory class

The `GraphFactory` class provides a unified interface for creating different types of graphs, following the factory pattern design principle.

In [None]:
# Create an instance of the GraphFactory
factory = GraphFactory()

# Display available graph types
print("Available graph types:")
for graph_type in factory.available_graph_types():
    print(f"  - {graph_type}")

## 2. Creating Random Graphs

The factory can generate random graphs using the Erdős–Rényi model, where edges are added with a specified probability.

In [None]:
# Create a random directed graph
random_graph = factory.create_graph(
    graph_type="random",
    num_nodes=10,
    edge_probability=0.3,
    directed=True,
    seed=42  # For reproducibility
)

# Print some basic stats
print(f"Random graph nodes: {len(random_graph.get_nodes())}")
print(f"Random graph edges: {len(random_graph.get_edges())}")
print(f"Expected number of edges: {10 * 9 * 0.3:.1f}")

# Visualize the random graph
plt.figure(figsize=(10, 6))
ax = plt.gca()
viz.plot_graph(random_graph, ax=ax, title="Random Directed Graph (Erdős–Rényi, p=0.3)")
plt.show()

### Varying edge probability in random graphs

Let's see how varying the edge probability affects the structure of random graphs.

In [None]:
# Create random graphs with different edge probabilities
probabilities = [0.1, 0.3, 0.5, 0.8]
random_graphs = []

for p in probabilities:
    graph = factory.create_graph(
        graph_type="random",
        num_nodes=10,
        edge_probability=p,
        directed=True,
        seed=42  # Use the same seed for fair comparison
    )
    random_graphs.append((graph, f"p={p}"))

# Create a grid of visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
axes = axes.flatten()

for i, (graph, title) in enumerate(random_graphs):
    ax = axes[i]
    edge_count = len(graph.get_edges())
    viz.plot_graph(graph, ax=ax, 
                  title=f"Random Graph with {title} ({edge_count} edges)")

plt.tight_layout()
plt.show()

## 3. Generating Scale-Free Networks

Scale-free networks are characterized by a power-law degree distribution, where a few nodes have many connections (hubs) and most nodes have few connections.

In [None]:
# Create a scale-free network using the Barabási-Albert model
scale_free_graph = factory.create_graph(
    graph_type="scale_free",
    num_nodes=50,
    m=2,  # Number of edges to attach from a new node to existing nodes
    directed=True,  # Create a directed graph
    seed=42  # For reproducibility
)

# Print basic stats
print(f"Scale-free graph nodes: {len(scale_free_graph.get_nodes())}")
print(f"Scale-free graph edges: {len(scale_free_graph.get_edges())}")

# Visualize the scale-free network
plt.figure(figsize=(12, 8))
ax = plt.gca()
viz.plot_graph(scale_free_graph, ax=ax, 
              title="Scale-Free Network (Barabási-Albert, m=2)",
              layout="spring",  # Use spring layout to better show the hub structure
              node_size=200)
plt.show()

### Analyzing the degree distribution of scale-free networks

One key characteristic of scale-free networks is their power-law degree distribution. Let's verify this property.

In [None]:
# Create a larger scale-free network for better statistical analysis
large_scale_free = factory.create_graph(
    graph_type="scale_free",
    num_nodes=500,
    m=2,
    directed=False,  # Use undirected for simpler degree analysis
    seed=42
)

# Calculate the degree of each node
degrees = [len(large_scale_free.get_neighbors(node)) for node in large_scale_free.get_nodes()]

# Count the frequency of each degree
degree_count = {}
for degree in degrees:
    degree_count[degree] = degree_count.get(degree, 0) + 1

# Convert to lists for plotting
x = list(degree_count.keys())
y = list(degree_count.values())

# Plot the degree distribution
plt.figure(figsize=(10, 6))
plt.loglog(x, y, 'o', markersize=8)
plt.grid(True, which="both", ls="-")
plt.xlabel('Degree (log scale)')
plt.ylabel('Frequency (log scale)')
plt.title('Degree Distribution of Scale-Free Network (log-log scale)')

# Add a trend line to visualize the power law
from scipy import stats
# Filter out degrees with zero count to avoid log(0)
x_log = np.log(x)
y_log = np.log(y)
slope, intercept, r_value, p_value, std_err = stats.linregress(x_log, y_log)

# Plot the best fit line
x_line = np.array([min(x), max(x)])
y_line = np.exp(intercept) * x_line**slope
plt.loglog(x_line, y_line, 'r-', linewidth=2, 
           label=f'Power law fit: γ = {-slope:.2f}')
plt.legend()
plt.show()

print(f"Power law exponent (γ): {-slope:.2f}")
print(f"R-squared value: {r_value**2:.2f}")

## 4. Creating Predefined Graph Structures

The factory can also create common predefined graph structures, such as chains, trees, and complete graphs.

In [None]:
# Create various predefined graph structures
predefined_graphs = []

# Chain graph (linear path)
chain_graph = factory.create_graph(
    graph_type="predefined",
    structure="chain",
    num_nodes=5,
    directed=True
)
predefined_graphs.append((chain_graph, "Chain Graph"))

# Tree graph
tree_graph = factory.create_graph(
    graph_type="predefined",
    structure="tree",
    num_nodes=10,
    branching_factor=2,  # Binary tree
    directed=True
)
predefined_graphs.append((tree_graph, "Binary Tree Graph"))

# Complete graph (every node connected to every other node)
complete_graph = factory.create_graph(
    graph_type="predefined",
    structure="complete",
    num_nodes=6,
    directed=True
)
predefined_graphs.append((complete_graph, "Complete Graph"))

# Bipartite graph
bipartite_graph = factory.create_graph(
    graph_type="predefined",
    structure="bipartite",
    n1=4,  # Nodes in first set
    n2=3,  # Nodes in second set
    edge_probability=0.5,
    directed=True,
    seed=42
)
predefined_graphs.append((bipartite_graph, "Bipartite Graph"))

# Create a grid of visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
axes = axes.flatten()

for i, (graph, title) in enumerate(predefined_graphs):
    ax = axes[i]
    viz.plot_graph(graph, ax=ax, title=title)

plt.tight_layout()
plt.show()

### Causal graph structures

The factory can also create common causal graph structures for causal inference experiments.

In [None]:
# Create common causal graph structures
causal_graphs = []

# Fork structure (X ← Z → Y)
fork_graph = factory.create_graph(
    graph_type="predefined",
    structure="fork",
    labels=['X', 'Z', 'Y'],
    directed=True
)
causal_graphs.append((fork_graph, "Fork (X ← Z → Y)"))

# Collider structure (X → Z ← Y)
collider_graph = factory.create_graph(
    graph_type="predefined",
    structure="collider",
    labels=['X', 'Z', 'Y'],
    directed=True
)
causal_graphs.append((collider_graph, "Collider (X → Z ← Y)"))

# Chain/Mediator structure (X → Z → Y)
chain_causal_graph = factory.create_graph(
    graph_type="predefined",
    structure="chain",
    labels=['X', 'Z', 'Y'],
    directed=True
)
causal_graphs.append((chain_causal_graph, "Chain/Mediator (X → Z → Y)"))

# Create a grid of visualizations
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
axes = axes.flatten()

for i, (graph, title) in enumerate(causal_graphs):
    ax = axes[i]
    viz.plot_causal_graph(graph, ax=ax, title=title,
                        node_size=1000,
                        node_color='lightblue',
                        font_size=14)

plt.tight_layout()
plt.show()

## 5. Customizing Graph Generation Parameters

The factory allows extensive customization of the generated graphs.

In [None]:
# Create a custom graph from an adjacency matrix
# Define an adjacency matrix representing a specific graph structure
adj_matrix = np.array([
    [0, 1, 1, 0, 0],
    [0, 0, 1, 1, 0],
    [0, 0, 0, 1, 1],
    [0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0]
])

# Create graph from adjacency matrix
custom_graph = factory.create_graph(
    graph_type="predefined",
    structure="from_adjacency",
    adjacency_matrix=adj_matrix,
    directed=True,
    labels=['A', 'B', 'C', 'D', 'E']
)

# Visualize the custom graph
plt.figure(figsize=(10, 6))
ax = plt.gca()
viz.plot_graph(custom_graph, ax=ax, title="Custom Graph from Adjacency Matrix")
plt.show()

# Display the adjacency matrix
plt.figure(figsize=(8, 6))
plt.imshow(adj_matrix, cmap='Blues')
plt.colorbar(label='Edge presence')
plt.title('Adjacency Matrix of Custom Graph')
plt.xticks(range(5), ['A', 'B', 'C', 'D', 'E'])
plt.yticks(range(5), ['A', 'B', 'C', 'D', 'E'])
plt.grid(False)
plt.show()

### Adding noise to predefined structures

The factory allows adding random noise (additional edges) to predefined structures, which can be useful for testing algorithm robustness.

In [None]:
# Create a tree graph with different levels of random noise
noise_levels = [0.0, 0.05, 0.1, 0.2]
noisy_graphs = []

for noise in noise_levels:
    graph = factory.create_graph(
        graph_type="predefined",
        structure="tree",
        num_nodes=10,
        branching_factor=2,
        directed=True,
        noise_edges=noise,  # Probability of adding random edges
        seed=42  # Same seed for fair comparison
    )
    noisy_graphs.append((graph, f"Noise: {noise:.2f}"))

# Create a grid of visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
axes = axes.flatten()

for i, (graph, title) in enumerate(noisy_graphs):
    ax = axes[i]
    edge_count = len(graph.get_edges())
    base_edges = 9  # A binary tree with 10 nodes has 9 edges
    extra_edges = edge_count - base_edges
    viz.plot_graph(graph, ax=ax, 
                  title=f"Tree with {title} (+{extra_edges} edges)")

plt.tight_layout()
plt.show()

## 6. Generating Task Families for Meta-Learning

For meta-learning applications, we often need to generate families of related graphs with controlled variations. Let's see how to create a task family based on a seed graph.

In [None]:
# Create a seed graph (a simple DAG)
seed_graph = factory.create_graph(
    graph_type="predefined",
    structure="from_adjacency",
    adjacency_matrix=np.array([
        [0, 1, 1, 0, 0],
        [0, 0, 0, 1, 0],
        [0, 0, 0, 1, 1],
        [0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0]
    ]),
    directed=True
)

# Generate a family of related graphs by perturbing the seed graph
num_variations = 5
edge_perturbation_prob = 0.2
task_family = factory.create_task_family(
    seed_graph=seed_graph,
    num_graphs=num_variations,
    edge_perturbation_prob=edge_perturbation_prob,
    seed=42
)

# Visualize the seed graph and its variations
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()

# Visualize the seed graph
viz.plot_graph(seed_graph, ax=axes[0], title="Seed Graph")

# Visualize the variations
for i, graph in enumerate(task_family):
    ax = axes[i+1]
    viz.plot_graph(graph, ax=ax, title=f"Variation {i+1}")
    
    # Compare with the seed graph
    seed_edges = seed_graph.get_edges()
    var_edges = graph.get_edges()
    added = len(var_edges - seed_edges)
    removed = len(seed_edges - var_edges)
    ax.set_xlabel(f"Added: {added}, Removed: {removed}")

# Remove any unused subplots
for j in range(1 + len(task_family), len(axes)):
    fig.delaxes(axes[j])

plt.tight_layout()
plt.show()

## Summary

In this notebook, we explored the graph generation capabilities of the causal meta-learning library using the factory pattern:

1. The `GraphFactory` class provides a unified interface for creating different types of graphs
2. Random graphs can be generated with configurable edge probabilities
3. Scale-free networks exhibit power-law degree distributions with hub nodes
4. Predefined graph structures include chains, trees, complete graphs, and common causal structures
5. Graphs can be customized with various parameters including noise, adjacency matrices, and node labels
6. Task families can be generated for meta-learning applications

These generation capabilities are essential for creating synthetic datasets for testing algorithms, conducting simulation studies, and developing meta-learning approaches to causal inference and optimization.