# Davide Marchi
## Assignment 4 - Bayesian Network (BN)
_Implement a Bayesian Network (BN) comprising at least 10 nodes, all with binomial or multinomial distribution. Represent the BN with the data structures that you deem appropriate and in the programming language that you prefer. The BN should model some problem/process of your choice, so you are also free to define the topology according to your prior knowledge (just be ready to justify your choices). [...]<br>
Once you have modelled the BN, also plug in the necessary local conditional probability tables. You can set the values of the probabilities following your own intuition on the problem (ie no need to learn them from data). Then run some episoded of Ancestral Sampling on the BN and discuss the results.[...]_

### Imported modules
The realizations of the Bayesian Network starts by importing two needed modules: `random` and `itertools`:<br>
The first one will be needed to randomly generate the samples during ancestral sampling with the function `random()` while `itertools` will be usuful to create a list of all the possible permutations of the states of the parents of a node with `product()`.

In [133]:
# Import librarires
import random
import itertools

### Bayesian Network class
I have then implemented the `BayesianNetwork` class, which represent a Directed Acyclic Graph (DAG) whose nodes are random variables and the edges describe the conditional independence relationships.<br>
The nodes in the network are stored in a dictionary (`self.nodes`) where the keys are the names of the nodes and the values are dictionaris with the corresponding node's infos: that contains the node's states, parents, and CPT. This structure allows for efficient lookup and modification of the nodes (since it's based on an hash map).<br>

This value dictionary contains the following keys:
- `states`: This key corresponds to a list of the possible states that the node can take.
- `parents`: This key corresponds to a list of the node's parent nodes.
- `cpt`: This key corresponds to the node's Conditional Probability Table (CPT).

It's important to note that the cpt aswell is a dictionary:<br>
- The keys of the CPT dictionary are tuples representing the states of the parent nodes. If a node has no parents, the key will be an empty tuple `()`.
- The values of the CPT dictionary are lists representing the probabilities of the node's states given these parent states. The order of the probabilities in the list corresponds to the order of the states in the node's states list.

In [134]:
# Define the Bayesian Network class
class BayesianNetwork:
    def __init__(self):
        self.nodes = {}
    
    # Add a node to the network after checking for validity
    def add_node(self, name, states, cpt, parents=[]):

        # Check if the node already exists in the network
        if name in self.nodes:
            print(f"Node '{name}' already exists in the network.")
            return
        
        # Check if the node has at least one state
        if len(states) == 0:
            print(f"Node '{name}' must have at least one state.")
            return
        
        # Check if all parent nodes exist in the network
        for parent in parents:
            if parent not in self.nodes:
                print(f"Parent node '{parent}' does not exist in the network.")
                return

        # Check if the CPT is valid
        for parent_combination in itertools.product(*[self.nodes[parent]['states'] for parent in parents]):

            # Check if all the possible permutations of the states of the parent nodes are in the CPT
            # Check if each entry in the cpt as the corrent number of probabilities and the sum of the probabilities is 1
            if parent_combination not in cpt or len(cpt[parent_combination]) != len(states) or sum(cpt[parent_combination]) != 1:
                print(f"Invalid CPT for node '{name}'.")
                return

        # Add the node to the network
        self.nodes[name] = {'states': states, 'parents': parents, 'cpt': cpt}
    
    # Calculate the probability of a state of a node given its parents' states
    def probability(self, node, parent_states, state):
        states = self.nodes[node]['states']
        return self.nodes[node]['cpt'][parent_states][states.index(state)]
    
    # Sample a state for a node given its parents' states
    def sample(self, node, parent_states):

        # Generating a random value and collecting the possible states of the node
        p = random.random()
        cumulative_prob = 0
        states = self.nodes[node]['states']

        # Loop through the states of the node and calculate the probability of each state
        for state in states:
            probability = self.probability(node, parent_states, state)
            cumulative_prob += probability

            # Return the state and its probability if the random value is less than the cumulative probability (= if the state is selected)
            if p <= cumulative_prob:
                return state, probability
    
    # Sample states for all nodes
    def sample_states(self):
        sampled_states = {}
        joint_probability = 1

        # Loop through all nodes and sample the state of each node
        for node in self.nodes:
            parents = self.nodes[node]['parents']
            parent_states = tuple(sampled_states[parent] for parent in parents)
            sampled_states[node], probaility = self.sample(node, parent_states)
            joint_probability *= probaility

        # Return the sampled states and the joint probability
        return sampled_states, joint_probability

### Network example
To show the functionality of the presented implementation here is presented an example of istantiation of an object of the `BayesianNetwork` class.

In [135]:
# Define conditional probability tables (CPTs)
study_cpt = {
    (): [0.2, 0.6, 0.2]  # P(Study Time)
}
sleep_cpt = {
    (): [0.3, 0.5, 0.2]  # P(Sleep Quality)
}
stress_cpt = {
    (): [0.3, 0.5, 0.2]  # P(Stress Level)
}
exam_cpt = {
    ('Low', 'Low', 'Low'): [0.8, 0.15, 0.05],     # P(Exam Difficulty | Study, Sleep, Stress)
    ('Low', 'Low', 'Medium'): [0.7, 0.2, 0.1],
    ('Low', 'Low', 'High'): [0.6, 0.3, 0.1],
    ('Low', 'Medium', 'Low'): [0.7, 0.2, 0.1],
    ('Low', 'Medium', 'Medium'): [0.6, 0.3, 0.1],
    ('Low', 'Medium', 'High'): [0.5, 0.35, 0.15],
    ('Low', 'High', 'Low'): [0.6, 0.3, 0.1],
    ('Low', 'High', 'Medium'): [0.5, 0.35, 0.15],
    ('Low', 'High', 'High'): [0.4, 0.4, 0.2],
    ('Medium', 'Low', 'Low'): [0.7, 0.2, 0.1],
    ('Medium', 'Low', 'Medium'): [0.6, 0.3, 0.1],
    ('Medium', 'Low', 'High'): [0.5, 0.35, 0.15],
    ('Medium', 'Medium', 'Low'): [0.6, 0.3, 0.1],
    ('Medium', 'Medium', 'Medium'): [0.5, 0.35, 0.15],
    ('Medium', 'Medium', 'High'): [0.4, 0.4, 0.2],
    ('Medium', 'High', 'Low'): [0.5, 0.35, 0.15],
    ('Medium', 'High', 'Medium'): [0.4, 0.4, 0.2],
    ('Medium', 'High', 'High'): [0.3, 0.5, 0.2],
    ('High', 'Low', 'Low'): [0.6, 0.3, 0.1],
    ('High', 'Low', 'Medium'): [0.5, 0.35, 0.15],
    ('High', 'Low', 'High'): [0.4, 0.4, 0.2],
    ('High', 'Medium', 'Low'): [0.5, 0.35, 0.15],
    ('High', 'Medium', 'Medium'): [0.4, 0.4, 0.2],
    ('High', 'Medium', 'High'): [0.3, 0.5, 0.2],
    ('High', 'High', 'Low'): [0.4, 0.4, 0.2],
    ('High', 'High', 'Medium'): [0.3, 0.5, 0.2],
    ('High', 'High', 'High'): [0.2, 0.6, 0.2],
}
health_cpt = {
    ('Low',): [0.1, 0.6, 0.3],     # P(Health | Stress Level)
    ('Medium',): [0.3, 0.5, 0.2],
    ('High',): [0.5, 0.4, 0.1],
}
motivation_cpt = {
    ('Low', 'Low'): [0.1, 0.6, 0.3],    # P(Motivation | Stress Level, Sleep Quality)
    ('Low', 'Medium'): [0.2, 0.6, 0.2],
    ('Low', 'High'): [0.3, 0.5, 0.2],
    ('Medium', 'Low'): [0.2, 0.6, 0.2],
    ('Medium', 'Medium'): [0.3, 0.5, 0.2],
    ('Medium', 'High'): [0.4, 0.4, 0.2],
    ('High', 'Low'): [0.3, 0.5, 0.2],
    ('High', 'Medium'): [0.4, 0.4, 0.2],
    ('High', 'High'): [0.5, 0.3, 0.2],
}
distraction_cpt = {
    (): [0.7, 0.2, 0.1]    # P(External Distractions)
}
caffeine_cpt = {
    (): [0.3, 0.5, 0.2]    # P(Caffeine Intake)
}
nutrition_cpt = {
    (): [0.2, 0.6, 0.2]    # P(Nutrition)
}
social_cpt = {
    (): [0.3, 0.5, 0.2]    # P(Social Life)
}

# Initialize Bayesian Network
bn = BayesianNetwork()

# Add nodes to the network with their CPTs
bn.add_node('Study Time', ['Low', 'Medium', 'High'], cpt=study_cpt)
bn.add_node('Sleep Quality', ['Low', 'Medium', 'High'], cpt=sleep_cpt)
bn.add_node('Stress Level', ['Low', 'Medium', 'High'], cpt=stress_cpt)
bn.add_node('Exam Difficulty', ['Easy', 'Medium', 'Hard'], cpt=exam_cpt, parents=['Study Time', 'Sleep Quality', 'Stress Level'])
bn.add_node('Health', ['Good', 'Okay', 'Poor'], cpt=health_cpt, parents=['Stress Level'])
bn.add_node('Motivation', ['Low', 'Medium', 'High'], cpt=motivation_cpt, parents=['Stress Level', 'Sleep Quality'])
bn.add_node('External Distractions', ['Low', 'Medium', 'High'], cpt=distraction_cpt)
bn.add_node('Caffeine Intake', ['Low', 'Medium', 'High'], cpt=caffeine_cpt)
bn.add_node('Nutrition', ['Poor', 'Average', 'Good'], cpt=nutrition_cpt)
bn.add_node('Social Life', ['Low', 'Medium', 'High'], cpt=social_cpt)

### Ancestral Sampling
To assess the ability to perform the Ancestral Sampling i went for multiple calls to the `sample_states()` function.
A certain number of iterations will give us enough data to try to make assumptions regarding which states are most likely to be reached ny the nodes.

In [136]:
# Cycle to perform multiple episodes of sampling
for i in range(100):

    # Sample states for all nodes
    sampled_states, joint_probability = bn.sample_states()

    # Print sampled states
    for node, state in sampled_states.items():
        print(f"{node}: {state}")

    # Print joint probability
    print(f"Joint Probability: {joint_probability}")

Study Time: Medium
Sleep Quality: Low
Stress Level: Medium
Exam Difficulty: Hard
Health: Okay
Motivation: High
External Distractions: Low
Caffeine Intake: Low
Nutrition: Average
Social Life: Low
Joint Probability: 3.401999999999999e-05
Study Time: Low
Sleep Quality: High
Stress Level: Medium
Exam Difficulty: Medium
Health: Good
Motivation: Medium
External Distractions: Low
Caffeine Intake: High
Nutrition: Average
Social Life: Low
Joint Probability: 2.1168000000000005e-05
Study Time: Medium
Sleep Quality: High
Stress Level: Medium
Exam Difficulty: Medium
Health: Okay
Motivation: Medium
External Distractions: Low
Caffeine Intake: Low
Nutrition: Average
Social Life: Medium
Joint Probability: 0.0003024
Study Time: Medium
Sleep Quality: Medium
Stress Level: Medium
Exam Difficulty: Medium
Health: Poor
Motivation: Medium
External Distractions: Low
Caffeine Intake: High
Nutrition: Average
Social Life: Low
Joint Probability: 0.0001323
Study Time: Medium
Sleep Quality: Low
Stress Level: Medium
E

### Final considerations
