# Structure Learning in Bayesian Networks
### Based on Chapter 18 of *Probabilistic Graphical Models* by Koller & Friedman
### Presented By: Serkalem Negusse


## 🎯 Objectives
- Understand structure learning concepts from Chapter 18
- Apply constraint-based and score-based learning
- Visualize learned Bayesian networks
- Evaluate learned models against ground truth


## ![image.png](attachment:image.png) Content Outline
1. Introduction to Structure Learning
2. Constraint-Based Approaches
3. Score-Based Approaches
4. Structure Search Methods
5. Bayesian Model Averaging
6. Learning with Additional Structure
7. Practical Applications
8. Visualization Strategies
9. Practical Examples
10. Summary

In [1]:

!pip install pgmpy networkx matplotlib pandas scikit-learn


## Introduction to Structure Learning
- Learn Bayesian network structure from data
- **Goal:** Find a graph G that explains dataset D ~ P*(X)
- **Key challenge:** multiple structures can represent same independencies
- Use in discovering causal patterns or improving predictions

## Constraint - Based Approaches
- Uses conditional independence (CI) tests to learn structure
- Example: PC algorithm (skeleton + v-structure orientation)
- Pros: Intuitive, explainable
- Cons: Sensitive to errors in CI testing

## Score - Based Approach
- Assign a score to each structure (e.g., BIC, BDe)
- Optimize over possible structures
- Balances model fit and complexity
- Handles noisy data more robustly than constraint-based methods

## Structure Search
- Search space is exponential in number of variables
- Use heuristics: hill-climbing, tabu search, genetic algorithms, MCMC
- Trade-off between completeness and computational cost

## Bayesian Model Averaging
- Instead of one structure, average over many using their posterior probability
- Reduces overfitting and gives edge confidence
- Useful in domains with noisy or sparse data

## Learning with Additional Structure
- Incorporate structured CPDs: noisy-OR, tree-CPDs, parameter sharing
- Template models for relational domains (e.g., plate models)
- Improves generalization with fewer parameters

## Practical Applications
- Healthcare: Alarm Network for ICU patient monitoring
- Bioinformatics: Gene regulatory networks
- Recommendation Systems: Collaborative filtering
- Education: Student knowledge modeling
- Marketing: Customer behavior prediction

## Practical Examples
- Alarm Network: Infers medical events from vital signs
- Collaborative Filtering: Learn structure of preferences
- Weather-Football-Coffee: Causal toy model structure learning

## Visualization Recommendations
- Directed Acyclic Graphs (DAGs): Key format for learned structures
- Heatmaps: Confidence scores of edges
- Search Trees: Trace structure learning process
- Tools: pgmpy, bnlearn, GeNIe, Tetrad

## Summary
- Structure learning enables discovery of data-generating processes
- Constraint-based vs. score-based approaches
- Visualization crucial for interpreting model insights
- Used widely in real-world applications across fields

## 📥 Load Dataset

In [2]:

import pandas as pd

# Load Asia dataset (you can replace with your own)
# Sample Asia dataset as an example (8 binary variables)
from pgmpy.utils import get_example_model
asia_model = get_example_model('asia')
data = asia_model.simulate(n_samples=1000, seed=42)
data.head()

AttributeError: First get states of variables, edges, parents and network name

## 🔍 Constraint-Based Learning (PC Algorithm)

In [None]:

from pgmpy.estimators import PC

pc = PC(data)
model_pc = pc.estimate()
model_pc.edges()


### 🖼 Visualize PC Learned Structure

In [None]:

import matplotlib.pyplot as plt
import networkx as nx

plt.figure(figsize=(8,6))
nx.draw(model_pc.to_digraph(), with_labels=True, node_color='lightblue', node_size=2000)
plt.title("PC Algorithm Structure")
plt.show()


## 📊 Score-Based Learning (BIC)

In [None]:

from pgmpy.estimators import HillClimbSearch, BicScore

hc = HillClimbSearch(data)
model_bic = hc.estimate(scoring_method=BicScore(data))
model_bic.edges()


### 🖼 Visualize BIC Learned Structure

In [None]:

plt.figure(figsize=(8,6))
nx.draw(model_bic.to_digraph(), with_labels=True, node_color='lightgreen', node_size=2000)
plt.title("Score-Based (BIC) Structure")
plt.show()


## ✅ Compare with True Model

In [None]:

from pgmpy.metrics import hamming_distance

true_model = asia_model
print("Hamming Distance (PC):", hamming_distance(model_pc, true_model))
print("Hamming Distance (BIC):", hamming_distance(model_bic, true_model))



## 📌 Summary
- **PC Algorithm** uses conditional independence tests to recover structure.
- **Score-based (BIC)** methods optimize a global score.
- **Evaluation** shows structural closeness to ground truth.
- This project demonstrates the core ideas from **Chapter 18** using real Bayesian network tools.


## 🔁 Bayesian Model Averaging (Approximation)

In [None]:

from modules.bma_and_parameter_learning import sample_structures

edge_probs, sampled_models = sample_structures(data, num_samples=20)
for edge, prob in sorted(edge_probs.items(), key=lambda x: -x[1]):
    print(f"Edge {edge} - Posterior Probability: {prob:.2f}")


### 🖼 Visualize Edge Probabilities (as Graph)

In [None]:

import networkx as nx
import matplotlib.pyplot as plt

G = nx.Graph()
for edge, prob in edge_probs.items():
    G.add_edge(*edge, weight=prob)

pos = nx.spring_layout(G)
edges = G.edges(data=True)
weights = [d['weight'] * 5 for (_, _, d) in edges]

plt.figure(figsize=(10,6))
nx.draw(G, pos, with_labels=True, width=weights, node_color='lightblue', edge_color='gray')
edge_labels = {(u,v): f"{d['weight']:.2f}" for u,v,d in edges}
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
plt.title("Posterior Edge Probabilities from BMA")
plt.show()


## 📐 Parameter Learning

In [None]:

from modules.bma_and_parameter_learning import learn_parameters

# Fit parameters to BIC-learned structure
model_bic = learn_parameters(model_bic, data)
model_bic.get_cpds()


## 🔎 Inference on Learned Model

In [None]:

from modules.bma_and_parameter_learning import perform_inference

# Query: What is P(Dyspnea | VisitAsia = yes)?
result = perform_inference(model_bic, query=["dyspnea"], evidence={"visit_to_asia": "yes"})
print(result)
result.plot()
