# Alberto Dicembre, Assignment #4

This is my (very) dirty but working implementation of a Bayesian Network. I decided to model my typical morning, after some testing of the functionalities. All variables have a binomial distribution (values "True" or "False"). Then I performed some Ancestral Sampling and took a look at the results.

In [1]:
from bayesian_network import BayesianNetwork 

## Testing the Network

In [2]:
net = BayesianNetwork(values=["True", "False"]) 
net.add_node("A")
net.add_node("B")
net.add_node("C")

In [3]:
print(net)
net.check_valid()

Nodes: dict_keys(['A', 'B', 'C']), Edges: {}, Tables: {'A': {'Prior': (0.5, 0.5)}, 'B': {'Prior': (0.5, 0.5)}, 'C': {'Prior': (0.5, 0.5)}}


True

In [4]:
net.add_edge("A", ["B"])
net.add_edge("B", ["C"])
print(net)

Nodes: dict_keys(['A', 'B', 'C']), Edges: {'A': ['B'], 'B': ['C']}, Tables: {'A': {'Prior': (0.5, 0.5)}, 'B': {'Prior': (0.5, 0.5)}, 'C': {'Prior': (0.5, 0.5)}}


In [5]:
net.check_valid()

Node B probability table doesn't match parents' number of values
Node C probability table doesn't match parents' number of values


False

#### Checking Topological Sort

In [6]:
print(net.topo_sort())

['A', 'B', 'C']


In [7]:
net.add_prob_table("A", {"Prior" : (0.9, 0.1)})
net.add_prob_table("B", {"True" : (0.8, 0.2), "False" : (0.2, 0.8)})

#### Trying to add a probability table for a node not in the network

In [8]:
net.add_prob_table("D", {"Prior" : (0.9, 0.1)})

Node not in network


### Now we create a valid network

In [9]:
net.add_node("D")
net.add_edge("D", ["C"])
net.add_prob_table("C", {("True", "True") : (0.95, 0.05),
                         ("True", "False") : (0.7, 0.3),
                         ("False", "True") : (0.7, 0.3),
                         ("False", "False") : (0.1, 0.9)})
net.add_prob_table("D", {"Prior" : (0.9, 0.1)})
                          
print(net)
net.check_valid()

Nodes: dict_keys(['A', 'B', 'C', 'D']), Edges: {'A': ['B'], 'B': ['C'], 'D': ['C']}, Tables: {'A': {'Prior': (0.9, 0.1)}, 'B': {'True': (0.8, 0.2), 'False': (0.2, 0.8)}, 'C': {('True', 'True'): (0.95, 0.05), ('True', 'False'): (0.7, 0.3), ('False', 'True'): (0.7, 0.3), ('False', 'False'): (0.1, 0.9)}, 'D': {'Prior': (0.9, 0.1)}}


True

#### Let's try some single node sampling

In [10]:
probs = net.get_probabilities("A") # remember, we gave A probabilities of 0.9 for True, 0.1 for False
print(probs)
for i in range(20):
    print(net.sample(probs))

(0.9, 0.1)
False
True
True
True
True
True
True
True
True
True
True
True
False
True
True
True
False
True
True
True


### Now we try Ancestral Sampling

In [11]:
samples = net.ancestral_sampling(10)
for i, sample in enumerate(samples):
    print(f"{i}: {sample}")

Topological sort: ['A', 'D', 'B', 'C']
0: {'A': 'False', 'D': 'True', 'B': 'False', 'C': 'True'}
1: {'A': 'True', 'D': 'True', 'B': 'True', 'C': 'True'}
2: {'A': 'True', 'D': 'True', 'B': 'True', 'C': 'False'}
3: {'A': 'True', 'D': 'True', 'B': 'True', 'C': 'False'}
4: {'A': 'True', 'D': 'True', 'B': 'True', 'C': 'True'}
5: {'A': 'True', 'D': 'True', 'B': 'True', 'C': 'True'}
6: {'A': 'True', 'D': 'True', 'B': 'True', 'C': 'True'}
7: {'A': 'True', 'D': 'False', 'B': 'True', 'C': 'False'}
8: {'A': 'True', 'D': 'True', 'B': 'True', 'C': 'True'}
9: {'A': 'True', 'D': 'True', 'B': 'True', 'C': 'True'}


# Modeling (some aspects of) my morning routine 

The network will represent the following structure:

These are the typical variables that influence an average morning of mine, in a day with no lectures.

Of course, as "ancestor" nodes we have all those random variables that can influence my mood (Bad day start), which is the intermediate concept that will influence what activity I will perform during that morning.

## Some quick explanation

#### Exams incoming -> Gig incoming -> Play guitar
I play guitar in a band. I am typically less prone to have incoming gigs in exams period, since I give more priority to university. 
So when I have gigs incoming I will practice more often, when I have exams I will probably study instead.

#### Bad day start -> Play guitar
I enjoy playing guitar much more when I am in a good mood, therefore the probability of me playing is higher when "Bad day start" is False.

Same applies for Gym; I would likely do it in the afternoon instead.

#### Bad day start -> Scroll phone
I am more prone to get on the phone and waste time scrolling through social media when I feel groggy / am in a bad mood. 

#### Sunny weather -> Bad day start
I'm more likely to be in a good mood if the weather is sunny


In [12]:
daily_net = BayesianNetwork(nodes = {}, edges= {}, ordering= {}, values=["True", "False"])
daily_net.add_node("Noisy neighbors")
daily_net.add_node("Exams incoming")
daily_net.add_node("Bad day start")
daily_net.add_node("Call center call")
daily_net.add_node("Slept bad")
daily_net.add_node("Gig incoming")
daily_net.add_node("Sunny weather")
daily_net.add_node("Study")
daily_net.add_node("Play guitar")
daily_net.add_node("Scroll phone")
daily_net.add_node("Gym")

In [13]:
daily_net.add_edge("Exams incoming", ["Gig incoming", "Study"])
daily_net.add_edge("Slept bad", ["Bad day start"])
daily_net.add_edge("Noisy neighbors", ["Bad day start"])
daily_net.add_edge("Call center call", ["Bad day start"])
daily_net.add_edge("Sunny weather", ["Bad day start"])
daily_net.add_edge("Bad day start", ["Play guitar", "Scroll phone", "Gym"])
daily_net.add_edge("Gig incoming", ["Play guitar"])

In [14]:
# Determining prior probabilities
daily_net.add_prob_table("Exams incoming", {"Prior" : (0.1, 0.9)})
daily_net.add_prob_table("Noisy neighbors", {"Prior" : (0.3, 0.7)})
daily_net.add_prob_table("Slept bad", {"Prior" : (0.2, 0.8)})
daily_net.add_prob_table("Call center call", {"Prior" : (0.2, 0.8)})
daily_net.add_prob_table("Sunny weather", {"Prior" : (0.5, 0.5)})

In [15]:
daily_net.add_prob_table("Gig incoming", {"True" : (0.05, 0.95), "False" : (0.1, 0.9)}) # P(Gig incoming | Exams incoming)

daily_net.add_prob_table("Bad day start", {("True", "True", "True", "True") : (0.95, 0.05), # P(Bad day start | Slept bad, Noisy neighbors, Call center call, Sunny)
                                            ("False", "True", "True", "True") : (0.9, 0.1),
                                            ("True", "False", "True", "True") : (0.85, 0.15),
                                            ("True", "True", "False", "True") : (0.8, 0.2),
                                            ("True", "True", "True", "False") : (0.99, 0.01),
                                            ("False", "False", "True", "True") : (0.6, 0.4),
                                            ("True", "False", "False", "True") : (0.4, 0.6),
                                            ("True", "True", "False", "False") : (0.6, 0.4),
                                            ("False", "True", "False", "True") : (0.4, 0.6),
                                            ("True", "False", "True", "False") : (0.7, 0.3),
                                            ("False", "True", "True", "False") : (0.9, 0.1),
                                            ("False", "False", "False", "True") : (0.05, 0.95),
                                            ("True", "False", "False", "False") : (0.2, 0.8),
                                            ("False", "True", "False", "False") : (0.3, 0.7),
                                            ("False", "False", "True", "False") : (0.3, 0.7),
                                            ("False", "False", "False", "False") : (0.1, 0.9),    
                                            }) 

daily_net.add_prob_table("Study", {"True" : (0.9, 0.1), "False" : (0.2, 0.8)}) # P(Study | Exams incoming)

daily_net.add_prob_table("Play guitar", {("True", "True") : (0.7, 0.3), # P(Play guitar | Gig incoming, Bad day start)
                                          ("True", "False") : (0.9, 0.1),
                                          ("False", "True") : (0.2, 0.8),
                                          ("False", "False") : (0.6, 0.4)}) 
daily_net.add_prob_table("Scroll phone", {"True" : (0.7, 0.3), "False" : (0.4, 0.6)}) # P(Scroll phone | Bad day start)

daily_net.add_prob_table("Gym", {("True") : (0.3, 0.7), # P(Gym | Bad day start
                                 ("False") : (0.6, 0.4)})

In [16]:
# define ordering of parents for conditioned nodes wrt probability tables. Very dirty, should be done in a better way

daily_net.add_ordering({"Bad day start" : ("Slept bad", "Noisy neighbors", "Call center call", "Sunny weather"),
                        "Play guitar" : ("Gig incoming", "Bad day start")})

In [17]:
samples = daily_net.ancestral_sampling(100)
for i, sample in enumerate(samples):
    print(f"{i}: {sample}")

Topological sort: ['Noisy neighbors', 'Exams incoming', 'Call center call', 'Slept bad', 'Sunny weather', 'Gig incoming', 'Study', 'Bad day start', 'Play guitar', 'Scroll phone', 'Gym']
0: {'Noisy neighbors': 'False', 'Exams incoming': 'False', 'Call center call': 'False', 'Slept bad': 'False', 'Sunny weather': 'False', 'Gig incoming': 'False', 'Study': 'False', 'Bad day start': 'False', 'Play guitar': 'True', 'Scroll phone': 'False', 'Gym': 'False'}
1: {'Noisy neighbors': 'True', 'Exams incoming': 'True', 'Call center call': 'True', 'Slept bad': 'False', 'Sunny weather': 'True', 'Gig incoming': 'False', 'Study': 'True', 'Bad day start': 'True', 'Play guitar': 'False', 'Scroll phone': 'False', 'Gym': 'False'}
2: {'Noisy neighbors': 'False', 'Exams incoming': 'False', 'Call center call': 'False', 'Slept bad': 'True', 'Sunny weather': 'False', 'Gig incoming': 'False', 'Study': 'False', 'Bad day start': 'False', 'Play guitar': 'True', 'Scroll phone': 'False', 'Gym': 'True'}
3: {'Noisy nei

### Calculating obtained occurrences for each node

Let's write a function that calculates the percentages of each observation having value "True" over the samples

In [18]:
print(f"{daily_net.get_occurrences(samples)} \n")

{'Noisy neighbors': 0.27, 'Exams incoming': 0.09, 'Bad day start': 0.21, 'Call center call': 0.19, 'Slept bad': 0.17, 'Gig incoming': 0.07, 'Sunny weather': 0.5, 'Study': 0.28, 'Play guitar': 0.6, 'Scroll phone': 0.49, 'Gym': 0.46} 



Now let's compare these values with the expected conditional probabilities (or priors). 
We can define an eps and see how many Ancestral Sampling iterations are needed to get the expected distribution. 


In [23]:
iterations = daily_net.expected_probabilities()

Topological sort: ['Noisy neighbors', 'Exams incoming', 'Call center call', 'Slept bad', 'Sunny weather', 'Gig incoming', 'Study', 'Bad day start', 'Play guitar', 'Scroll phone', 'Gym']


KeyboardInterrupt: 

# Final considerations



## Possible analyis improvements

To further enhance the possibilities offered by this kind of structure, it would be useful to have the ability to retrieve the probability of a conditioned observation given a conditioning one. (e.g. "What is the probability of me spending the morning Scrolling the phone by knowing that I got waken up by a call center?");
