# Stage 2: Pipeline Graph Construction & Visualization

This notebook demonstrates how to construct and visualize machine learning pipeline graphs (DAGs) using the C60.ai framework.

In [None]:
import matplotlib.pyplot as plt
import networkx as nx
from c60.engine.graph_schema import DAG, Node, Edge, NodeType


## Construct a Sample Pipeline Graph

We will build a simple pipeline: Imputer → Scaler → PCA → RandomForestClassifier.

In [None]:
# Define nodes
imputer = Node(node_id='imputer', node_type=NodeType.PREPROCESSOR, parameters={'strategy': 'mean'}, description='Impute missing values')
scaler = Node(node_id='scaler', node_type=NodeType.PREPROCESSOR, parameters={'type': 'standard'}, description='Standard scaling')
pca = Node(node_id='pca', node_type=NodeType.FEATURE_SELECTOR, parameters={'n_components': 2}, description='Principal Component Analysis')
rf = Node(node_id='rf', node_type=NodeType.ESTIMATOR, parameters={'n_estimators': 100}, description='Random Forest Classifier')

# Create DAG
dag = DAG()
dag.add_node(imputer)
dag.add_node(scaler)
dag.add_node(pca)
dag.add_node(rf)

# Add edges
dag.add_edge(Edge(source='imputer', target='scaler'))
dag.add_edge(Edge(source='scaler', target='pca'))
dag.add_edge(Edge(source='pca', target='rf'))

# Validate DAG
dag.validate()
print('Pipeline DAG is valid!')

## Visualize the Pipeline Graph

We use networkx and matplotlib for visualization.

In [None]:
G = nx.DiGraph()
for node_id, node in dag.nodes.items():
    G.add_node(node_id, label=f'{node.node_type.value}
{node_id}')
for edge in dag.edges:
    G.add_edge(edge.source, edge.target)

pos = nx.spring_layout(G, seed=42)
labels = nx.get_node_attributes(G, 'label')
plt.figure(figsize=(8, 4))
nx.draw(G, pos, with_labels=True, labels=labels, node_size=2000, node_color='skyblue', font_size=10, font_weight='bold', arrowsize=20)
plt.title('Pipeline DAG Structure')
plt.show()

## Export and Import Pipeline Graphs

You can serialize the pipeline DAG to a dictionary (or JSON) and reload it later.

In [None]:
dag_dict = dag.to_dict()
print('Serialized DAG:', dag_dict)

# Reconstruct from dict
dag2 = DAG.from_dict(dag_dict)
assert dag2.topological_sort() == dag.topological_sort()
print('DAG successfully reconstructed from dict!')

## Summary

- Built a modular pipeline DAG using the C60.ai framework.
- Visualized the pipeline structure.
- Demonstrated serialization and deserialization.

Next: We will explore pipeline mutation and evolutionary search.