Pomegranate, a Python library, facilitates efficient implementation of diverse probabilistic models, from simple distributions to complex structures like Bayesian networks and hidden Markov models. It treats all models as probability distributions, enabling flexible construction and integration, including mixing distribution types and innovatively combining Bayesian networks.


Installation with pip: 
* pip install pomegranate
* we are using pip install pomegranate==v10.14.9 due to compatibility issues in the newest version

Installation with Github:
* git clone https://github.com/jmschrei/pomegranate
* cd pomegranate
* python setup.py install

Examples below illustrate some basic applications of the pomegranate library. 

### Example 1: Determining Chances of Attendance

The following code is Bayesian network to model the relationships between rain, track maintenance, train delays, and appointment attendance.

In [31]:
from pomegranate import Node, DiscreteDistribution, ConditionalProbabilityTable, BayesianNetwork

# Rain node has no parents
rain = Node(DiscreteDistribution({
    "none": 0.7,
    "light": 0.2,
    "heavy": 0.1
}), name="rain")

# Track maintenance node is conditional on rain
maintenance = Node(ConditionalProbabilityTable([
    ["none", "yes", 0.4],
    ["none", "no", 0.6],
    ["light", "yes", 0.2],
    ["light", "no", 0.8],
    ["heavy", "yes", 0.1],
    ["heavy", "no", 0.9]
], [rain.distribution]), name="maintenance")

# Train node is conditional on rain and maintenance
train = Node(ConditionalProbabilityTable([
    ["none", "yes", "on time", 0.8],
    ["none", "yes", "delayed", 0.2],
    ["none", "no", "on time", 0.9],
    ["none", "no", "delayed", 0.1],
    ["light", "yes", "on time", 0.6],
    ["light", "yes", "delayed", 0.4],
    ["light", "no", "on time", 0.7],
    ["light", "no", "delayed", 0.3],
    ["heavy", "yes", "on time", 0.4],
    ["heavy", "yes", "delayed", 0.6],
    ["heavy", "no", "on time", 0.5],
    ["heavy", "no", "delayed", 0.5],
], [rain.distribution, maintenance.distribution]), name="train")

# Appointment node is conditional on train
appointment = Node(ConditionalProbabilityTable([
    ["on time", "attend", 0.9],
    ["on time", "miss", 0.1],
    ["delayed", "attend", 0.6],
    ["delayed", "miss", 0.4]
], [train.distribution]), name="appointment")

# Create a Bayesian Network and add states
model = BayesianNetwork()
model.add_states(rain, maintenance, train, appointment)

# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)

# Finalize model
model.bake()

ImportError: cannot import name 'Node' from 'pomegranate' (c:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pomegranate\__init__.py)

The following code uses the model to infer the impact of a delayed train on the probabilities of other events.

In [None]:
from model import model

# Calculate predictions
predictions = model.predict_proba({
    "train": "delayed"
})

# Print predictions for each node
for node, prediction in zip(model.states, predictions):
    if isinstance(prediction, str):
        print(f"{node.name}: {prediction}")
    else:
        print(f"{node.name}")
        for value, probability in prediction.parameters[0].items():
            print(f"    {value}: {probability:.4f}")

The following code uses the model to calculate probability of a specific sequence of events occurring, which in this example, represents a scenario with no rain, no maintenance, an on-time train, and attending an appointment.

In [None]:
from model import model

# Calculate probability for a given observation
probability = model.probability([["none", "no", "on time", "attend"]])

print(probability)

The following code demonstrates how to perform sampling and rejection sampling with a Bayesian network model

In [None]:
import pomegranate

from collections import Counter

from model import model

def generate_sample():

    # Mapping of random variable name to sample generated
    sample = {}

    # Mapping of distribution to sample generated
    parents = {}

    # Loop over all states, assuming topological order
    for state in model.states:

        # If we have a non-root node, sample conditional on parents
        if isinstance(state.distribution, pomegranate.ConditionalProbabilityTable):
            sample[state.name] = state.distribution.sample(parent_values=parents)

        # Otherwise, just sample from the distribution alone
        else:
            sample[state.name] = state.distribution.sample()

        # Keep track of the sampled value in the parents mapping
        parents[state.distribution] = sample[state.name]

    # Return generated sample
    return sample

# Rejection sampling
# Compute distribution of Appointment given that train is delayed
N = 10000
data = []
for i in range(N):
    sample = generate_sample()
    if sample["train"] == "delayed":
        data.append(sample["appointment"])
print(Counter(data))


### Example 2: Hidden Markov Chain for predicting Sunny and Rainy days

The following code predicts sun and rain based on hidden markov assumption, with the observed data being whether people brought umbrellas or not. First we create the model with the probability distributions. 

In [None]:
from pomegranate import *

# Observation model for each state
sun = DiscreteDistribution({
    "umbrella": 0.2,
    "no umbrella": 0.8
})

rain = DiscreteDistribution({
    "umbrella": 0.9,
    "no umbrella": 0.1
})

states = [sun, rain]

# Transition model
transitions = numpy.array(
    [[0.8, 0.2], # Tomorrow's predictions if today = sun
     [0.3, 0.7]] # Tomorrow's predictions if today = rain
)

# Starting probabilities
starts = numpy.array([0.5, 0.5])

# Create the model
model = HiddenMarkovModel.from_matrix(
    transitions, states, starts,
    state_names=["sun", "rain"]
)
model.bake()

Then we construct the markov chain itself, inputting the data we have gathered regarding whether or not umbrellas were present in the past few days and proceed with making predictions.

In [None]:
from model import model

# Observed data
observations = [
    "umbrella",
    "umbrella",
    "no umbrella",
    "umbrella",
    "umbrella",
    "umbrella",
    "umbrella",
    "no umbrella",
    "no umbrella"
]

# Predict underlying states
predictions = model.predict(observations)
for prediction in predictions:
    print(model.states[prediction].name)