# NeuPI: Advanced Discretization Methods

This notebook demonstrates how to use the various discretization methods available in the **NeuPI** library. After a neural solver produces continuous probability outputs (typically from a sigmoid activation), a discretizer is needed to convert these probabilities into a final, binary MPE/MAP assignment.

We will explore three methods, ranging from a simple baseline to sophisticated, search-based approaches:

1.  **`ThresholdDiscretizer`**: The simplest method, which applies a fixed threshold (e.g., 0.5) to convert probabilities to binary values.
2.  **`KNearestDiscretizer`**: A powerful method that uses a beam search (backed by a high-performance Cython module) to find the *k*-best binary assignments close to the continuous prediction.
3.  **`HighUncertaintyDiscretizer`**: A smart heuristic that focuses its search on the *k* variables with the highest uncertainty (probabilities closest to 0.5), performing an exhaustive search over this reduced space.

We will compare the quality of the assignments produced by each method by evaluating their average log-likelihood using a PGM evaluator.

## Setup

First, we import the necessary components from `neupi` and other libraries. We will also set up a PGM evaluator and generate some dummy continuous outputs from a hypothetical neural network to serve as our inference data.

In [7]:
import torch
from pathlib import Path

# Import neupi components
from neupi.training.pm_ssl.pgm.mn import MarkovNetwork
from neupi.discretize.threshold import ThresholdDiscretizer
from neupi.discretize.kn import KNearestDiscretizer
from neupi.discretize.oauai import OAUAI

# Define the device for computation
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {DEVICE}")

# --- Path Setup ---
# Assuming this notebook is in 'examples/', and networks are in 'examples/networks/'
UAI_PATH = Path("networks") / "mn" / "Grids_17.uai"
assert UAI_PATH.exists(), f"File not found: {UAI_PATH}. Please run from the 'examples' directory."

# --- Load PGM Evaluator ---
mn_evaluator = MarkovNetwork(uai_file=str(UAI_PATH), device=DEVICE)
num_vars = mn_evaluator.num_variables
print(f"Loaded Markov Network with {num_vars} variables.")

# --- Create Dummy Inference Data ---
# This simulates the continuous probability outputs from a neural network's sigmoid layer.
num_samples = 16
prob_outputs = torch.rand(num_samples, num_vars, device=DEVICE, dtype=torch.float32)

# Define a query mask (where we want to find the best assignment)
# For this example, all variables are query variables.
query_mask = torch.ones_like(prob_outputs, dtype=torch.bool)
evidence_mask = torch.zeros_like(prob_outputs, dtype=torch.bool)
unobs_mask = torch.zeros_like(prob_outputs, dtype=torch.bool)

print(f"Generated dummy probability outputs of shape: {prob_outputs.shape}")

Using device: cuda
Using 1d factors: False
PGM is pairwise.
Loaded Markov Network with 400 variables.
Generated dummy probability outputs of shape: torch.Size([16, 400])


### Method 1: Baseline with `ThresholdDiscretizer`

This is our baseline. It's fast and simple, providing a good reference point for the more advanced methods.

In [8]:
# 1. Initialize the discretizer
threshold_discretizer = ThresholdDiscretizer(threshold=0.5)

# 2. Get the discrete assignments
threshold_assignments = threshold_discretizer(prob_outputs)

# 3. Evaluate the assignments to get their log-likelihood scores
with torch.no_grad():
    threshold_scores = mn_evaluator(threshold_assignments)

avg_threshold_score = threshold_scores.mean().item()
print(f"ThresholdDiscretizer Avg. Log-Likelihood: {avg_threshold_score:.4f}")

ThresholdDiscretizer Avg. Log-Likelihood: 57.4206


### Method 2: Beam Search with `KNearestDiscretizer`

This method performs a beam search to find the *k* assignments that are closest (in terms of L1 distance) to the continuous predictions. It then scores each of these *k* candidates with the PGM evaluator and returns the best one. This is a powerful, model-aware search.

In [9]:
# 1. Initialize the discretizer. It needs the PGM evaluator as its scoring function.
knn_discretizer = KNearestDiscretizer(pgm_evaluator=mn_evaluator, k=10)  # Beam width

# 2. Get the discrete assignments
# The other masks are passed to maintain a consistent API
knn_assignments = knn_discretizer(prob_outputs, evidence_mask, query_mask, unobs_mask)

# 3. Evaluate the assignments
with torch.no_grad():
    knn_scores = mn_evaluator(knn_assignments)

avg_knn_score = knn_scores.mean().item()
print(f"KNearestDiscretizer Avg. Log-Likelihood: {avg_knn_score:.4f}")

                                                          

KNearestDiscretizer Avg. Log-Likelihood: 73.0331




### Method 3: `OAUAI`

This method uses a smart heuristic: instead of searching over all variables, it identifies the *k* variables whose predicted probabilities are closest to 0.5 (i.e., the ones the network is least certain about). It then performs an exhaustive search over all 2^k possibilities for this small subset and picks the best one. Other orcale can be used to answer the query over the variables with the highest uncertainty (probabilities closest to 0.5).

In [10]:
# 1. Initialize the discretizer
uncertainty_discretizer = OAUAI(
    pgm_evaluator=mn_evaluator, k=5  # Number of uncertain variables to search over
)

# 2. Get the discrete assignments
uncertainty_assignments = uncertainty_discretizer(
    prob_outputs, evidence_mask, query_mask, unobs_mask
)

# 3. Evaluate the assignments
with torch.no_grad():
    uncertainty_scores = mn_evaluator(uncertainty_assignments)

avg_uncertainty_score = uncertainty_scores.mean().item()
print(f"HighUncertaintyDiscretizer Avg. Log-Likelihood: {avg_uncertainty_score:.4f}")

                                                                  

HighUncertaintyDiscretizer Avg. Log-Likelihood: 99.4000




## Comparison and Conclusion

Let's compare the average log-likelihood scores from all three methods. A higher (less negative) score indicates a better set of MPE solutions.

In [11]:
print("--- Discretization Performance Summary ---")
print(f"Baseline (Threshold):\t\t{avg_threshold_score:.4f}")
print(f"K-Nearest (Beam Search):\t\t{avg_knn_score:.4f}")
print(f"OAUAI (Oracle Based):\t{avg_uncertainty_score:.4f}")

improvement_knn = avg_knn_score - avg_threshold_score
improvement_uncertainty = avg_uncertainty_score - avg_threshold_score

print(f"\nImprovement over baseline (KNN): {improvement_knn:+.4f}")
print(f"Improvement over baseline (OAUAI): {improvement_uncertainty:+.4f}")

assert avg_knn_score >= avg_threshold_score
assert avg_uncertainty_score >= avg_threshold_score
print(
    "\nSuccessfully verified that the advanced methods provide scores greater than or equal to the baseline."
)

--- Discretization Performance Summary ---
Baseline (Threshold):		57.4206
K-Nearest (Beam Search):		73.0331
OAUAI (Oracle Based):	99.4000

Improvement over baseline (KNN): +15.6125
Improvement over baseline (OAUAI): +41.9794

Successfully verified that the advanced methods provide scores greater than or equal to the baseline.
